Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
An analytical and experimental study of evolving 3D deformation fields using vision-based approaches
(USC Thesis Other)
An analytical and experimental study of evolving 3D deformation fields using vision-based approaches
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
An Analytical and Experimental Study of Evolving 3D Deformation
Fields Using Vision-Based Approaches
by
Yulu Luke Chen
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements of the Degree
DOCTOR OF PHILOSOPHY
(Astronautical Engineering)
December 2017
Copyright 2017 Yulu Luke Chen
Dedicated to
my parents, wife, and children.
ii
Acknowledgements
I would like to express the deepest gratitude to my thesis adviser, Professor Sami F.
Masri, for his guidance. Under his supervision, I learned how to narrow a topic into a
research problem, how to formulate and solve the problem, and how to publish the results.
Without his advice, support, patience, and continuous encouragement throughout my
doctoral studies, this dissertation would not have materialized.
I would also like to thank Dr. Mohammad Reza Jahanshahi for inviting me to partic-
ipate the exciting projects of utilizing cost-eective vision-based approaches to quantify
evolving 3D deformation elds. I appreciate the assistance for the projects given by my
labmates: Mohamed Abdelbarr, Preetham Manjunatha, and Dr. Miguel R. Hernandez-
Garcia.
I am deeply grateful to Professor Joseph Kunc and Professor Dan A. Erwin for serving
on my dissertation committee. Special thanks to Dr. Charles W. Liu at California State
University, Los Angeles for reviewing my dissertation in detail and providing valuable
comments.
iii
Table of Contents
Abstract x
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Contact-Type Displacement Measurement . . . . . . . . . . . . . . 6
1.3.2 Non-contact Displacement Measurement . . . . . . . . . . . . . . . 8
1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Using the RGB-D Camera: Depth Accuracy, Calibration, and 3D Re-
construction 17
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Depth Accuracy and Calibration . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Error Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Depth Measurement Model . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Intrinsic Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Extrinsic Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.3 IR-Depth Oset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 3D Reconstruction and RGB-D Registration . . . . . . . . . . . . . . . . . 36
2.4.1 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 Depth to 3D Conversion . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.3 Depth to Color Registration . . . . . . . . . . . . . . . . . . . . . . 45
2.4.4 3D Point Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Measuring 3D Dynamic Displacement Fields Using an RGB-D Camera 54
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
iv
3.2.3 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.4 Initial Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.1 Harmonic Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.2 Random Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Evaluating Performance of an RGB-D Camera for Image Acquisition in
Outdoor and Dynamic Environments 79
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.1 Sunlight Infrared Interference . . . . . . . . . . . . . . . . . . . . . 81
4.2.2 Motion Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.3 Rolling Shutter Distortion . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5 Integrating Multiple Inexpensive Sensors and Fusing Heterogeneous
Data for a Vision-Based Automated Pavement Condition Survey 98
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 Sensor Components . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.2 The Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Multi-Sensor Integration and Fusion . . . . . . . . . . . . . . . . . . . . . 106
5.3.1 Multiple Kinects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.2 Data Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.3 Multi-Image Stitching . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4 Field Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.1 Field Testing Routes . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.2 Defect Detection and Quantication . . . . . . . . . . . . . . . . . 113
5.4.3 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6 Conclusion 123
BIBLIOGRAPHY 128
v
List of Figures
1.1 The spatial resolution of deformation measurement can aect accuracy of
reconstructing the deformation response of a structure: (a) sparse sensor
nodes, (b) dense sensor nodes, and (c) full-eld measurement. . . . . . . 4
1.2 The comprehensive experimental investigation at a glance. . . . . . . . . 15
2.1 Evaluated depth sensors: (a) Microsoft Kinect v1, (b) Asus Xtion Pro
Live, (c) SoftKinetic DS325, (d) Microsoft Kinect v2, and (e) Mesa Swis-
sRanger SR4000 (Heptagon, 2016). . . . . . . . . . . . . . . . . . . . . . 20
2.2 The geometrical model of depth measurement. . . . . . . . . . . . . . . . 27
2.3 The curves show the Kinect depth measurement versus the depth resolu-
tion (blue) and the depth error(red) (Khoshelham and Elberink, 2012). . 31
2.4 The camera calibration target taken from dierent camera positions/orientations:
(a) calibration images for the RGB camera and (b) calibration images for
the IR camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Camera calibration procedure: (a) corner prediction, (b) corner extraction. 34
2.6 The 3D plot of the spatial conguration for the stereo calibration. . . . 35
2.7 Detecting the circular disk and estimating its center in the IR and depth
images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Shift between two detected circles in the IR and depth images. . . . . . 37
2.9 The geometry of the pinhole camera model. . . . . . . . . . . . . . . . . 38
2.10 Transformation from the image plane to the pixel coordinate system. . . 40
2.11 The color camera and the IR camera of a Kinect sensor. . . . . . . . . . 46
2.12 3D coordinate transformation between the IR and the color coordinate
systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.13 The color-depth registration examples: (a) and (b) are the unregistered
depth and color images; (c) and (d) are the registered depth and color
images aligned with the camera calibration approach; (e) and (f) are the
registered color and depth images produced by OpenNI. . . . . . . . . . 48
2.14 The 3D point clouds: (a) the raw point clouds, and (b) the colored point
clouds which were displayed using the CloudCompare software package
(Girardeau-Montaut, 2017). . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.15 The step-by-step procedure used to align the color and depth image, and
to reconstruct the 3D scene from the depth image. . . . . . . . . . . . . 52
2.16 3D reconstruction using the OpenNI software package. . . . . . . . . . . 53
vi
3.1 The methodology of using an inexpensive RGB-D camera to measure 3D
dynamic displacements includes four steps: 1) calibrating camera and
registering color and depth images; 2) detecting target points in the color
image and mapping target points in the depth image; 3) tracking target
points in the captured video; and 4) calculating displacement time histories. 56
3.2 Test apparatus: (a) the experimental setup for the dynamic displacement
measurement, (b) the aluminum plate on the shaker, and (c) the Microsoft
Kinect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Extracting points of interest from (a) a color image and mapping to (b)
a depth image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 Kinect data post-processing: (a) raw displacement data, (b) PSD of dis-
placement data, (c) ltered Kinect and LVDT displacements, and (d)
aligned Kinect and LVDT displacements. . . . . . . . . . . . . . . . . . . 62
3.5 Displacement measurements obtained by the Kinect and LVDT for dif-
ferent excited frequencies: (a) 0.5 Hz, (b) 1.0 Hz, and (c) 1.5 Hz. . . . . 64
3.6 Displacement measurements obtained by the Kinect and LVDT for dif-
ferent peak amplitutes: (a) 5 mm, (b) 10 mm, and (c) 20 mm. . . . . . . 64
3.7 Dierent motion orientations of the target plate (with respect to the
Kinect sensor): (a) motion perpendicular to the Kinect sensor, (b) mo-
tions parallel to the Kinect sensor, (c) motion angled to the Kinect camera. 65
3.8 Sample depth-based measurements (1.0 Hz, 20 mm): (a) x-direction, (b)
y-direction, and (c) z-direction. . . . . . . . . . . . . . . . . . . . . . . . 67
3.9 Sample pixel-based measurements (1.0 Hz, 20 mm): (a) x-direction, (b)
y-direction, and (c) z-direction. . . . . . . . . . . . . . . . . . . . . . . . 67
3.10 Sample depth-and-pixel-based measurements (1.0 Hz, 20 mm): (a) x-
direction, (b) y-direction, and (c) z-direction. . . . . . . . . . . . . . . . 67
3.11 Depth-based measurements errors: (a) normalized errors, and (b) peak
errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.12 Pixel-based measurements errors: (a) normalized errors, and (b) peak
errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.13 Depth- and pixel-based measurements errors: (a) normalized errors, and
(b) peak errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.14 Depth-based measurements for three RMS levels: (a) RMS = 1.94 mm,
(b) RMS = 8.44 mm, and (c) RMS = 13.81 mm. . . . . . . . . . . . . . 72
3.15 Pixel-based measurements for three RMS levels: (a) RMS = 1.94 mm,
(b) RMS = 8.44 mm, and (c) RMS = 13.81 mm. . . . . . . . . . . . . . 72
3.16 PDFs of the estimated Kinect displacements based on depth-based mea-
surements: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS =
13.81 mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.17 PDFs of the estimated Kinect displacements based on pixel-based mea-
surements: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS =
13.81 mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
vii
3.18 PDFs of the estimated Kinect peak displacements based on depth-based
measurements: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS
= 13.81 mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.19 PDFs of the estimated Kinect peak displacements based on pixel-based
measurements: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS
= 13.81 mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.20 Mean error and standard deviation error for two PDFs of peaks. . . . . 75
4.1 Sunlight interference: (a) color image shows shadow regions under strong
sunlight, and (b) only shadow areas can be detected by the Kinect sensor. 81
4.2 Various shading systems used in this study: (a) top-cover sun shade, and
(b) full-cover sun shade. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Comparison of two images captured from the same road segment under
dierent lighting conditions: (a) high luminance, (b) low luminance. . . 84
4.4 Strobe light setup for pavement data acquisition: (a) the full-cover sun
shade sensor platform, (b) representation of a Kinect, and LED strobe
lighting mounted inside the full-cover sun shade, (c) the scheme of the
strobe light controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5 Road test results for Kinect image acquisition using the strobe lighting:
(a) and (b) were captured at 15 to 25 mph, (c) and (d) were taken at 25
to 35 mph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6 Rolling shutter distortion of a camera and image samples: (a) rolling
shutter image acquisition and distortion; (b) the color image of a square
brick captured by a Kinect's color camera at 10 mph (16 km/h) became
a parallelogram; (c) the depth image of a square brick captured by a
Kinect's depth camera at 10 mph (17 km/h) became a parallelogram. . 86
4.7 Relation between rolling shutter distortions for depth images and motion
speeds according to Table 4.1. . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1 Overview of the 3D scanning system for pavement inspection. Box com-
ponents are the software modules, and the right-hand components are the
hardware outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Components of the data acquisition system: (a) RGB-D cameras com-
bined with accelerometers and USB data acquisition module, (b) a 7-inch
LCD touchscreen monitor for displaying and handling graphics, (c) a high
performance computer and its power supply. . . . . . . . . . . . . . . . . 105
5.3 Data synchronization uses a heuristic approach based on the external
LED cue marks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Laboratory results of data alignment for Kinect, accelerometer, and LVDT
(ground truth). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5 Displacement measurements obtained from Kinect and accelerometers for
the eld experiment on California Highway SR-110. . . . . . . . . . . . . 109
viii
5.6 Overlapping areas of two stitching color images under (a) 50% for 48
km/h (30 mph) and (b) 30% for 80 km/h (50 mph). . . . . . . . . . . . 111
5.7 Overlap percentages of the sequential color frames under various vehicle
speeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.8 Multiple image stitching (a) images were taken on California State Route
SR-110 at speed of 80 km/h (50 mph), (b) results of image stitching,
and (c) the location (34
6'37.7"N 118
11'5.4"W) of the road segment is
shown on Google Street View. . . . . . . . . . . . . . . . . . . . . . . . . 112
5.9 An example of eld tests: data collection route with dierent vehicle
speeds on California State Route SR-110. . . . . . . . . . . . . . . . . . 114
5.10 Application of defect detection approach proposed by Jahanshahi et al.
(2012) to sample pavements acquired via Kinect system: (a) a pothole im-
age; (b) the corresponding depth data; (c) estimated road surface plane;
(d) relative depth obtained from subtracting the road surface plane from
the depth values; (e) histogram of relative depth values and the defective
depth threshold (the red dash line); and (f) the depth colormap of the
detected defect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.11 Defect detection and depth quantication: (a), (c), (e) and (g) are the
images of the defects, and (b), (d), (f) and (h) are the corresponding depth
maps, respectively. The colormaps indicate the quantied defective depths.117
5.12 (a) The RIEGL VZ-400 LIDAR and the pothole, (b) the length of the
pothole is about 480 mm measured by a tape ruler, and (c) the length
for the 3D model of the pothole captured by the LIDAR and rendered by
the CloudCompare software is about 481 mm. . . . . . . . . . . . . . . . 118
5.13 Eect of speed on defect quantication: (a) depth map acquired by the
stationary Kinect; and (b) depth map captured by the moving Kinect at
25 km/h (15 mph); (c) cloud-to-cloud (C2C) errors (mm) between two
point clouds obtained by the VZ-400 LIDAR and the stationary Kinect;
(d) C2C errors between two point clouds taken by the LIDAR and the
moving Kinect; (e) the distribution of C2C errors in (c), and (f) the
distribution of C2C errors in (d). . . . . . . . . . . . . . . . . . . . . . . 119
ix
Abstract
This dissertation focuses on a comprehensive evaluation of using a class of inexpensive
o-the-shelf vision-based sensors { the RGB-D cameras { to quantify the evolving 3D dis-
placement eld for a continuous structural system. Measuring evolving 3D displacement
eld is an important yet challenging task in many engineering and scientic applications.
Although there are several sensors for direct or indirect displacement measurements at
specic point in an uni-axial direction or multi-component deformations, these sensors
either suer from fundamental limitations based on their operation principles and/or are
relatively quite expensive. Compare to these sensors, the RGB-D cameras have several
attractive features including: the non-contact displacement measurement, rapid full-eld
3D data acquisition, aordable price, lightweight and compact size, and easy to oper-
ate. These characteristics make the RGB-D cameras a promising technology to quantify
evolving 3D displacement eld.
This comprehensive investigation, including the sensor-level and system-level experi-
ments, is composed of four experimental stages to assess the accuracy and performance of
the representative RGB-D camera { the Microsoft Kinect sensor. In the stage I, this study
focused on sensor calibration for accurate 3D displacement measurements. In the stage
II, the laboratory experiments were performed to quantify the accuracy of the RGB-D
camera to acquire dynamic motions of a test structure under varying amplitude and spec-
tral characteristics, and with dierent congurations of the position and orientation of the
sensor with respect to the target structure. In the stage III, the eld tests were conducted
to evaluate the performance of the sensor in the outdoor and dynamic environments. In
the stage IV, a novel, relatively inexpensive, vision-based sensor system was built using
x
cost-eective o-the-shelf devices (RGB-D cameras, accelerometers, and a GPS), which
can be mounted on a vehicle for enabling the automated 2D and 3D image acquisition of
road surface condition.
It is shown that the sensor under investigation, when operated under the performance
envelope discussed in this dissertation (i.e., measuring range is about 1 m), can provide
with acceptable accuracy (i.e., an error of about 5% for displacement larger than 10
mm), a very convenient and simple means of quantifying 3D displacement elds that are
dynamically changing at relative low-frequency rate typically encountered in the structural
dynamics eld. Several issues related to the hardware limitations which can produce noisy
data were also investigated through the eld tests including: sunlight interference, motion
blur, and rolling shutter distortion. Corresponding solutions were proposed to improve
the data quality.
This dissertation results in the development of a fairly inexpensive proof-of-concept
prototype of a multi-sensor pavement condition survey system which costs under $6,000.
It is shown that the proposed multi-sensor system, by utilizing data-fusion approaches
of the type developed in this study, can provide a cost-eective road surface monitoring
technique with sucient accuracy to satisfy typical maintenance needs, in regard to the
detection, localization, and quantication of potholes and similar qualitative deterioration
features where the measurements are acquired via a vehicle moving at normal speeds on
typical city streets. The proposed system can be easily mounted on vehicles to enable
frequent data collection and help monitoring the conditions of defects in time. Suggestions
for future research needs to enhance the capabilities of the proposed system are included.
xi
Chapter 1
Introduction
1.1 Background
Quantitative measurement of an evolving deformation eld for a continuous structural
system plays an important role in the researches of many engineering and science elds.
Ideally, a deformation eld represents a spatial distribution of displacement data for all
points on the surface of a structure. Hence, an evolving deformation eld illustrates how
these surface points travel in two dimensional (2D) or three dimensional (3D) space over
time, due to a change in shape or size of a structure caused by applied forces (mechanical
deformation) or temperature variation (thermal deformation). The time evolution of a
deformation eld may vary from high-speed motion such as wing deformation in insect
ight (Koehler et al., 2012) to very slow progress such as long-term static deformation of
an infrastructure (Burdet, 1998; Kao and Loh, 2013). The spatial scale of a deformation
eld spans over a wide ranges such as the movement of landslides (Malet et al., 2002) or a
small size such as the vibration of a membrane in the Micro-Electro-Mechanical Systems
(MEMS) microphone (Wang et al., 2002). In the elds of aerospace, civil, and mechanical
engineering, static and/or dynamic deformation eld measurement provides intuitive and
1
useful information to investigate fundamental characteristics of structural systems (Fraser
and Riedel, 2000; Barazzetti and Scaioni, 2010), validating and calibrating numerical
models (Blandino et al., 2003; Barrows, 2007; Feng and Feng, 2015), monitoring structural
responses under applied loads (Detchev et al., 2013; Baqersad et al., 2015), controlling
deformed shapes (Maji and Starnes, 2000), assessing structural condition after disaster
(Dai et al., 2011), etc.
To quantify an evolving deformation eld is quite challenging. For example, ultra-
lightweight,
exible, gossamer space structures are composed of ultra-thin membranes
and in
atable tubes to be packed tightly for launch, and then expand to large-scale (tens
or hundreds of meters) structures in space (Jenkins, 2001). It is important to monitor
full-eld deformation behaviors for multiple components of gossamer space structure dur-
ing testing, deployment, and in-service conditions. But attaching sensors on thin-lm
membranes can add mass and stiness to the structures, increase power consumption,
and raise cost (Lichter and Dubowsky, 2005). Another example is measuring the dy-
namic behavior of large structures such as high-rise buildings, long-span bridges, or wind
power turbines for structural health monitoring purposes. Although acelerometers are
very common sensors for structural health monitoring applications, in order to determine
multi-component displacements, a double integration of the acceleration data is required.
It needs sucient knowledge and careful judgment to select suitable digital lters for noise
and DC bias removal to obtain an accurate estimation (Smyth and Pei, 2000). Moreover,
it is impractical to install a large number of accelerometers on a structure to measure
deformation elds.
2
As technology of digital imaging advances, image sensors are gradually inexpensive
and can provide higher image resolution and faster sampling rate. Dierent image for-
mats such as 2D, 3D, and multispectral images are also supported by the newly devel-
oped sensors. To monitor a structural dynamic behavior, vision-based techniques become
more promising to preform contactless, full-eld, multi-component, static and dynamic
deformation measurements responding to arbitrary dynamic loads. The idea of using
vision-based technology to determine structural response and its advantage is illustrated
in Figure 1.1, which compares the spatial resolutions of dierent deformation measuring
congurations and explains their in
uence on accuracy of reconstructing the deformation
response of a structure. In this gure, the sensor nodes are indicated with red color and
the position measurements are marked with blue color. If there are few sensor nodes
embedded on a structure such as the case in Figure 1.1a, a lot of signicant information
about structural behavior will be lost. When the density of sensor nodes increases (Fig-
ure 1.1b), more details of the structural responses can be determined. However, installing
highly dense sensor nodes on a structure may create many problems regarding weight,
cabling, power, and cost. On the contrary, using vision-based technology can acquire
full-eld deformation response remotely with few sensor nodes (Figure 1.1c).
1.2 Motivation
This dissertation is focused on a comprehensive experimental study to assess the perfor-
mance characteristics of an o-the-shelf vision-based 3D sensor | an RGB-D camera for
quantication measurement of evolving deformation eld. An RGB-D camera is com-
posed of a regular color camera and a depth sensor to capture point-wise color and depth
3
Sensor
node
Data
(a) (b)
Image
Sensor
(c)
Figure 1.1: The spatial resolution of deformation measurement can aect accuracy of
reconstructing the deformation response of a structure: (a) sparse sensor nodes, (b) dense
sensor nodes, and (c) full-eld measurement.
4
data at the sampling rate of several frame per second (fps), utilizing various techniques
including structured light, time-of-
ight, or stereo vision. It is quite benecial to obtain
color and depth data simultaneously using such a single imaging device. The color images
can provide dierent types of \features" (e.g., edges, corners, color, texture, intensity gra-
dients, etc.) to facilitate object recognition in the images. The depth images containing
pixel-wise range measurements can be used to easily reconstruct the 3D world coordinates
from the 2D pixel coordinates.
Data acquired by a single or multiple RGB-D cameras can be presented using com-
puter vision techniques such as object recognition and tracking for accurate measurement
of evolving multi-component deformation elds in monitored structures. Eciency can be
achieved by automation of such processes. This approach compares favorably to similar
measurements achieved by much more elaborate and sophisticated sensors. The latter
may be orders-of-magnitude more expensive or suer from fundamental limitations re-
garding their theory of operation and ability to track the motion of discrete points on the
monitored structure.
With the above in mind, the following sections of this dissertation rst review the
state-of-the-art techniques for the dierent classes of contact-type and non-contact-type
displacement sensors, and then discusses the theory of operation and performance enve-
lope of a new breed of vision-based sensors, which is subsequently followed by a com-
prehensive quantitative assessment of a representative sensor { Microsoft Kinect (rst
generation).
5
1.3 Literature Review
1.3.1 Contact-Type Displacement Measurement
In this thesis, contact-type sensors refer to those sensing devices physically attached on
the structural systems to acquire displacement measurement, which can be classied into
direct-measurement devices and indirect-measurement devices. In this review, the direct-
measurement sensors include linear position sensor and global positioning systems (GPS),
and the indirect-measurement sensors include accelerometers.
Linear position sensors
Linear position sensors, which include linear variable dierential transformers (LVDTs),
linear potentiometers, linear encoders, etc., are utilized to determine either absolute or
relative displacement. The linear position sensors are robust position-to-electrical trans-
ducers providing highly accurate one-dimensional (1D) displacement measurements at a
specic location. Their operation requires a xed platform as a reference point to measure
the relative displacement between the reference point and a point on the structure (Mills
et al., 2001). However, this becomes a major disadvantage of linear position sensors, since
it is not easy to nd a rigid platform close to the structure in many cases. Additionally,
they are constrained to measure uniaxial displacement along their operation axis.
GPS
GPS devices have been used to measure dynamic displacement directly for large-scale
structures since 1990s (Lovse et al., 1995; Breuer et al., 2002; Im et al., 2011; Yi et al.,
6
2013). Conventional GPS has accuracy limited to1 cm horizontally and2 cm verti-
cally, with up to 20 Hz sampling rates. The highly precise real time kinematic (RTK)
GPS consists of a base station on a known ground point and multiple receivers called
"rovers" installed on a structure. The base station sends correction information to the
rovers via wireless transmission to enhance accuracy (Tamura et al., 2002). The RTK
GPS has a resolution of5 mm horizontally and10 mm vertically, with 10 Hz sample
rate. The drawback of GPS technology is that the GPS signals can not be received in-
doors. The other issue is multipath interference, which is caused by re
ective GPS signals
o surrounding surfaces such as water, metal, and glass, adding error on true GPS signals
(Kijewski-Correa and Kochly, 2007).
Accelerometer
Accelerometers are very common sensors, widely used to monitor structural dynamics.
Accelerometers use a seismic spring-mass-damper system to measure the force acting on
the mass, caused by gravity or motion. Nowadays, the technology of accelerometers is
very mature. One-axis, two-axis, and tri-axis miniature micro-electro-mechanical systems
(MEMS) accelerometers with dierent performance and price levels can be found in many
applications of structural displacement(Gindy et al., 2007; Xu et al., 2014; Sekiya et al.,
2016; Hester et al., 2017). Several low-power cost-eective tiny wireless sensor nodes
equipped with tri-axis MEMS accelerometers have been developed and can be congured
as a wireless sensor network (WSN) to be distributed over a structure for structural
dynamics monitoring, with dense spatial resolution, in three dimensions (Lynch and Loh,
2006; Park et al., 2013). However, it is very challenging to estimate displacements using
accelerometers. To calculate displacements from recorded acceleration data, numerical
7
integration has to be implemented twice to obtain velocity rst, and then displacement.
During the numerical integration process, the drift error and DC bias will be amplied.
The technical discussions of the serious challenges and potential pitfalls in the use of digital
signal processing and resulting large errors inherent in the derivation of displacement time
histories from directly-measured acceleration records are available in the works of Trujillo
and Carter (1982); Smyth and Wu (2007); Gindy et al. (2008); Lee et al. (2010).
Most of the contact-type sensors are well-developed and can acquire precise data with
very high sampling rate. A contact-type sensor can only obtain displacement for one
single point on a structure. To measure full-eld deformation, using contact-type sensors
require one or several intelligent sensor networks consisting of large amounts of sensor
nodes. However, it will raise the complexity of the related cabling, power consumption,
data communication, synchronization, and computation needs. The sensor networks also
add mass on the instrumented structure, and probably change the structure's property. If
the measuring structure undergoes damage during its monitoring phase, it could damage
the contact-type sensors and degrade the performance of the monitoring system. To
overcome these problems, the technology for deformation measurement is trending toward
developing cost-eective contactless sensors that can be quickly deployed on remote sites
to measure the full-eld deformation dynamics in three dimensions.
1.3.2 Non-contact Displacement Measurement
Optical methods are suitable to develop contactless techniques for deformation measure-
ments. Optical methods based on laser technology can provide very precise data. Cur-
rently, laser-based optical devices, including the Doppler vibrometers and terrestrial laser
scanners, are used to survey the de
ection of structures.
8
Vibrometer
The operation of laser Doppler vibrometers (LDV) is based on optical interferometry
to measure velocity and one-dimensional displacement at a xed point on a vibrating
surface, from a remote site (Castellini et al., 2006). The continuous scanning laser Doppler
vibrometer (CSLDV) sweeps the laser beam across the surface continuously to obtain
full-eld vibration data. However, this instrument is quite expensive. The representative
publication that deal with the application of laser Doppler vibrometers can be found in
the work of Nassif et al. (2005).
Terrestrial Laser Scanner
Terrestrial laser scanners (TLS) can produce a 3D model for the scanned object with two
dierent working principles: phase-shift and pulsed time-of-
ight technologies Lemmens
(2011). The time-of-
ight scanners measure the travel time of a laser pulse emitted onto
an object and re
ected back. Distance is calculated by taking half of the multiplication
of the travel time and the speed of light. The phased-shift scanners continuously emit a
modulated laser beam with amplitude in a sinusoidal form onto an object and receive the
re
ected signal. The phase shift is measured by comparing the phase of the incoming signal
with the outgoing light. The travel time of the laser beam can be determined by dividing
the phase shift (in radians) by the product of 2 and the modulation frequency. Phase-
shift laser scanners can acquire dense data at higher speed but have shorter measurement
range; conversely, the time-of-
ight laser scanners have longer range but lower acquisition
rate. Terrestrial laser scanner can scan still objects and generate precise 3D points (called
point clouds), but due to hardware limitations, conventional terrestrial laser scanners are
not applicable for dynamic measurements (Petrie and Toth, 2009). Recently, Kim and
9
Kim (2015) used a specic model time-of-
ight terrestrial laser scanner (RIEGL VZ-
400) to measure the 2D dynamic displacement of a cantilever beam with line scan mode
(repetitively moving the laser beam along a line). However, only a few commercial models
support this function. Overall, using the terrestrial laser scanners to measure the dynamic
deformation elds is impractical because of their hardware limitations and exorbitant cost.
Digital Camera
The alternative optical method is using the digital cameras equipped with the CCD
(charge couple device) or CMOS (complementary metal oxide semiconductor) image sen-
sors to acquire the time-sequenced images of structural vibrations. For images taken by
the digital cameras, the digital image processing plays an important role to perform o-
line or on-line image analysis such as object detection and tracking. The in-plane (2D)
motion can be determined using a single xed camera to track the targets or the region-
of-interest (ROI) , such as the high-contrast patterns or the light-emitting diodes (LEDs),
on the planar surfaces of a structure. Then the pixel coordinates of the tracking targets
are converted into the 2D metric coordinates with respect to a specied coordinate system
according to the calibration scale which denes the relationship between a predened dis-
tance and its pixel size in the digital image. Many creative approaches based on the digital
image processing techniques can be found in the works of Stephen et al. (1993); Olaszek
(1999); Wahbeh et al. (2003); Lee and Shinozuka (2006); Choi et al. (2011); Chen et al.
(2015); Feng and Feng (2016); Rajaram et al. (2017). Digital image correlation (DIC) is
a specic image processing technique using the cross-correlation approach to determine
2D/3D full-eld displacement by analyzing a series of images containing deformation of
the natural or articial random patterns on a structure taken by the conventional digital
10
camera (Yoneyama et al., 2007; Pan et al., 2009; Helfrick et al., 2011). For the DIC
methods, the accuracy of displacement estimation is mainly aected by the quality of the
random pattern (Lecompte et al., 2006; Crammond et al., 2013).
Other than using a single xed camera platform, stereo photogrammetry (or videogram-
metry) can establish a 3D geometric measurement from images taken by a hand-held
computer vision system composed of one or multiple digital cameras. Two photogram-
metric techniques can be used to reconstruct 3D models from 2D images. One of the
methods, known as structure from motion, can determine quasi-static 3D displacement
(i.e., very slow displacement) by processing a series of images captured from dierent
positions and orientations by a single camera (Lucieer et al., 2014). The other method
is utilizing a stereo vision system composed of two or more synchronized cameras to
take time-sequenced images of structural vibrations and then generate time histories of
dynamic 3D displacement elds according to epipolar geometry (Chang and Ji, 2007).
Although using conventional digital image sensors is an economical way for non-contact
deformation measurement; however, illumination on the measuring structures may create
some issues; for example, a shadow on the structure can confuse image processing software
(Finlayson et al., 2002).
Active 3D Sensor
Unlike stereo photogrammetry employing CCD or CMOS image sensors to acquire scenes
illuminated by ambient light, the active 3D sensing technology requires an additional light
emitter (e.g., infrared, laser, or LED) to project a light pattern on the measuring objects
and receive the re
ected light patterns with a camera (Pears et al., 2012). There are
11
two major active 3D sensing approaches { time-of-
ight, and structured-light. The time-
of-
ight 3D sensor determine the range value for each point in an image by estimating
the
ight time of the emitted lights. The distance is computed by multiplying the
ight
time to the known speed of light. The structured-light 3D sensor projects a formatted
light pattern on a 3D object and generate the 3D model of the measuring object by
observing the distortion of the projected pattern in the image (Geng, 2011). In general,
the time-of-
ight sensors have a higher frame rate but lower spatial resolution, but the
structured-light sensors have higher spatial resolution but a lower frame rate.
Recently, the rapid development of low-cost, o-the-shelf, active 3D sensors have at-
tracted researchers' attention to the potential applications of using this type of 3D sensing
devices in various science and engineering elds, even though some of the depth sensors
were originally designed to be a natural user interface (NUI) for entertainment purposes.
In 2010, Microsoft released the rst-generation Kinect using a patented technology (simi-
lar to structured-light) which can acquire 640480 depth images at 30 frames per second
(fps). In 2013, the second-generation Kinect was launched which uses time-of-
ight tech-
nology but the depth resolution is 512424, at 30 fps. In addition to depth-sensing
capability, both the rst and second version of the Kinect sensors are equipped with a
regular color camera; therefore, they are also known as the RGB-D camera. Although
both sensors have a relatively low price (currently under $200 dollars), they provide rea-
sonable performance and depth accuracy. Because of the 3D measurement features, the
inexpensive Kinect sensors have been used for a wide range of applications, including
gesture recognition (Ren et al., 2013), augmented reality (Benko et al., 2012), human
pose estimation (Shotton et al., 2013), 3D scanning and printing (Izadi et al., 2011), and
robotic navigation (Oliver et al., 2012).
12
For full-eld dynamic deformation measurements, the active 3D sensor (also known
as the depth sensor) have several advantages over other sensors including: 1) non-contact
tracking technique; 2) rapid 3D measurement; 3) high spatial resolution (versus discrete
points); 4) aordable price (currently, hundreds to thousands US dollars); 5) lightweight
and compact size; and 6) easy-to-use system. These features make the depth sensors a
very promising technology to track and quantify evolving deformation eld.
1.4 Contribution
Several studies have been conducted to analyze the depth accuracy by capturing station-
ary objects using various 3D sensors (Khoshelham and Elberink, 2012; Rabakhsh et al.,
2012; Haggag et al., 2013; Smisek et al., 2013); however, there is a lack of in-depth quan-
titative assessment to evaluate a 3D sensor for dynamic 3D displacement measurements.
This dissertation presents the procedures and results of a comprehensive experimental
investigation regarding the performance evaluation of the selected inexpensive active 3D
sensor { the rst-generation Microsoft Kinect RGB-D camera, to acquire and quantify
the dynamic 3D deformation eld. This dissertation also describes the development of
a proof-of-concept system using multiple Kinect sensors to collect 3D road surface data
in trac
ow. The comprehensive investigation involving the sensor-level and system-
level experiments is divided into four stages: 1) calibration; 2) laboratory experiments; 3)
eld experiments; and 4) proof-of-concept experiments. The structure diagram of the re-
search process is shown in Figure 1.2. In the calibration phase, rst, the characteristics of
the error sources containing ambient light, temperature drift, material properties, imag-
ing geometry, systematic errors, and multiple sensor interference were studied. Next,
13
the mathematical models of the depth errors including the depth error and the depth
measurement resolution were derived. Lastly, the Kinect calibration procedures were per-
formed. The depth-disparity calibration was employed to convert raw disparity to the
depth measurement. The camera calibration was required to align the color and depth
images generated by the color and depth cameras, and to reconstruct the 3D scenes from
the depth images. In the stage of the laboratory experiment, the Kinect sensor was used
to quantify 3D dynamic displacement of a rigid plate mounted on a shaking table. The
time histories of the harmonic oscillations and the random vibrations were acquired by
the Kinect sensor and compared with a linear position sensor (e.g., LVDT) to verify the
measurement accuracy. After the indoor test, this research conducted a eld experiment
with a vehicle-mounted Kinect sensor to scan the 3D objects on the road surface at the
speeds of up to 35 mph. Through the outdoor test, three problems related to the hard-
ware limitations including sunlight interference, rolling shutter distortion, and motion
blurry were discovered. The corresponding solutions were also proposed in the stage of
the eld experiment. Finally, a proof-of-concept system was built with multiple inexpen-
sive sensors (Kinects, accelerometers, and a GPS) for automated pavement distress data
collection. The estimated cost for the hardware components of the prototype is about
$5,900. In this experimental phase, a scheme of time synchronization was proposed to
fuse heterogeneous data from multiple sensors. Several eld tests were performed on local
streets and a state highway in the Los Angeles metropolitan area. From the eld tests,
this research demonstrated the capability of the proof-of-concept system to detect and
quantify large pavement distresses (e.g., potholes) in the 3D point clouds generated from
the collected color and depth images.
14
Figure 1.2: The comprehensive experimental investigation at a glance.
15
1.5 Scope
According to the structure of the research process, this dissertation is organized as follows.
Chapter 2 introduces the error sources and error models related to the depth accuracy of
a Kinect sensor, and addresses the calibration methods to improve the depth accuracy,
to align the color and depth images, and to reconstruct the 3D scenes from depth data.
Chapter 3 illustrates the experimental design for the 3D dynamic displacement quanti-
cation using a Kinect sensor to measure harmonic and random vibrations of a rigid plate
in an indoor environment. The approaches employed to process acquired depth data and
compute time histories of displacement are also presented in this chapter. Finally, this
chapter summarizes the experimental results and provides guidelines for the next test
phase based on the data analysis. Chapter 4 discusses the issues (sunlight interference,
motion blur, and rolling shutter distortion) observed when utilizing a Kinect sensor in out-
door and dynamic environments, and proposes the related solutions. Chapter 5 presents
the development of a cost-eective, vision-based, multi-sensor system using commercially
available o-the-shelf devices (Kinects, accelerometers, and GPS) for automated data col-
lection of road surface conditions. The techniques used to integrate multiple sensors and
synchronized heterogeneous data are explained in this chapter. In the end of this chap-
ter, the results of a defect detection and quantication are displayed. The conclusion is
provided in Chapter 6.
16
Chapter 2
Using the RGB-D Camera: Depth Accuracy,
Calibration, and 3D Reconstruction
2.1 Overview
In the initial phase of this research, several commercially available depth sensors were eval-
uated, which included the rst-generation Kinect v1 (Figure 2.1a), Asus Xtion Pro Live
(Figure 2.1b), SoftKinetic DS325 (Figure 2.1c), Mesa SwissRanger SR4000 (Figure 2.1e),
and the second-generation Kinect v2 (Figure 2.1d). The comparison of the specications
for some of these sensors is listed in Table 2.1. The technology behind the Kinect v1 and
Asus Xtion Pro Live is essentially identical. The dierences lie in the sensors employed.
The Kinect v1 sensor has a tri-axis accelerometer, a microphone array, and a tilt motor.
The Kinect v1 sensor requires an extra 12-volt power supply, while Asus Xtion Pro uses
5-volt USB power only. Compared to the Kinect v1 sensor, the SoftKinect DS325 and
the SwissRanger SR4000 have lower depth resolutions but cost more. The Kinect v2
sensor needs higher minimum system requirement to work correctly (Microsoft, 2017).
The Kinect v1 can run on dierent platforms (Windows, Linux, Mac, etc.) using various
17
development software packages (Microsoft SDK, OpenNI, OpenKinect, etc.). Moreover,
multiple Kinects v1 sensors can be integrated in a single computer with common hard-
ware congurations. Therefore, this research is focused on the use of the Kinect v1 sensor.
The Kinect v1 sensor consists of a color CMOS sensor, an infrared CMOS sensor, and an
infrared projector, to capture three-channel (RGB) image data and generate a depth map
in a 640480 resolution, at a frame rate of 30 fps using the PrimeSense's light coding
technology (Freedman et al., 2013). The Kinect v1 sensor has two versions: "Kinect for
Xbox 360" and "Kinect for Windows". The major dierence between the two versions is
that the Kinect for Windows has the "near mode" to obtain valid depth values as close
as 400 mm, while the minimum range of Kinect for Xbox 360 is 800 mm. The Microsoft
Kinect sensor used in this study is the Kinect for Xbox 360.
The Microsoft Kinect v1 sensor have been utilized by researchers to measure static or
dynamic displacement in many dierent elds. In the eld of robotics, Stowers et al. (2011)
used Kinect as a visual control system to estimate the altitude of an unmanned quadrotor
helicopter. To obtain accurate altitude measurement, depth-disparity calibration (see
Section 2.2.2) was implemented before depth acquisition. Fabian and Clayton (2014)
installed a Kinect sensor on a wheeled mobile robot to work as a visual odometer. They
established an accurate error propagation model for a calibrated Kinect sensor to improve
the performance of visual odometry. In the eld of healthcare, Stone and Skubic (2011)
developed a marker-less vision-based system using calibrated Kinects to assess fall risk
in a home by measuring a person's gait in detail. In this research, the gait parameters,
including number of steps, walking speed, and left/right stride length, were estimated from
accurate 3D point clouds obtained by multiple Kinects. Dutta (2012) and Rabakhsh
et al. (2012) evaluated the feasibility of using one or multiple Kinects to monitor the risk
18
Table 2.1: Comparison of specication for inexpensive depth sensors
Feature Microsoft SoftKinetic SwissRanger Microsoft
Kinect v1 DS325 SR4000 Kinect v2
Depth:
Technology SL
y
TOF
y
TOF TOF
Range 0.8 - 4.0 m 0.15 - 1.0 m 0.8 - 5.0 m 0.5 - 4.5 m
FOV (HV) 62:0
48:6
74
58
43
34
70
60
Resolution 640 480
z
320 240 176 144 512 424
Frame rate 30 fps 25 - 60 fps 50 fps 30 fps
Color:
FOV (HV) 58:5
45:6
63:2
49.3
N/A 84
54
Resolution 640 480 720p HD N/A 1080p HD
Frame rate 30 fps Not specied N/A 30 fps
Dimension 27:94 7:62 7:62 10:5 3:1 2:7 6:5 6:5 6:8 24:9 6:6 7:7
(WHD cm)
PC Interface USB 2.0 USB 2.0 USB 2.0 USB 3.0
Price $100 $249 $4295 $150
y
SL: structured ligtht, TOF: time of
ight.
z
According to Kinect for Windows SDK 1.8 (DepthImageFormat Enumeration).
19
(a) (b) (c)
(d) (e)
Figure 2.1: Evaluated depth sensors: (a) Microsoft Kinect v1, (b) Asus Xtion Pro Live,
(c) SoftKinetic DS325, (d) Microsoft Kinect v2, and (e) Mesa SwissRanger SR4000 (Hep-
tagon, 2016).
20
of injury in a large workplace (e.g. construction site). These studies focused on Kinect 3D
measurement for longer distance ( 3 m). Rabakhsh et al. (2012) also compared a Kinect
sensor to a high-denition terrestrial laser scanner. Even though the depth accuracy of a
Kinect sensor is lower than the laser scanner (average distance error between two sensors
is 3.49 cm), the Kinect sensor can scan a large scene much faster than the laser scanner.
Alnowami et al. (2012) used a calibrated Kinect sensor to track respiratory motion. This
research reported that the Kinect had measurement error of millimeter levels when the
measuring range was between 85 and 115 cm. In the eld of structural engineering, Qi
et al. (2014) used Kinect sensor to measure vertical dynamic de
ection of concrete beams.
Two calibration techniques may be applied on the Microsoft Kinect v1 sensor to
improve accuracy of 3D measurement before using the sensor. One is depth-disparity cal-
ibration (Section 2.2) and the other is camera calibration (Section 2.3). Depth-disparity
calibration creates a relationship between a raw disparity value and a measured depth
value. Dierent proposed disparity-depth relations can be found in the paper pub-
lished by Khoshelham and Elberink (2012), ROS.org (http://wiki.ros.org/kinect_
calibration/technical), and OpenKinect.org (http://openkinect.org/wiki/Imaging_
Information). Camera calibration is a process to determine intrinsic parameters (e.g.,
focal length, principal point, and lens distortions) and extrinsic parameters (e.g., rotation
and translation) of a camera by solving the relationship between a set of known points
in a 3D space and their projections on a set of 2D images. Zhang (2004) indicated that
the camera calibration techniques can be categorized into 3D-object-based calibration,
2D-plane-based calibration, 1D-line-based calibration, and self-calibration, according to
the dimensions of the calibration target used. Currently, the 2D-plane-based calibration
21
is the popular technique for easy implementation (Zhang, 2000). Based on the 2D-plane-
based calibration technique, Bouguet (2015) provided a camera calibration toolbox for
MATLAB to process calibration patterns and estimate the metric information for a cam-
era.
2.2 Depth Accuracy and Calibration
A Kinect v1 sensor can generate time-sequenced pixel-wise depth data; hence, noise in-
volving in the depth measurement can be classied into spatial noise and temporal noise
(Mallick et al., 2014). Spatial noise causes incorrect depth values in a single depth image
also known as depth accuracy, and temporal noise results in unstable depth measurements
in a sequence of depth images also called depth precision. Besides incorrect depth and
unstable depth, a Kinect sensor may fail to measure the depth in a point and assign a zero
value to the relative pixel in a depth image, which is usually called a hole in the depth
image. The error sources related to depth accuracy, depth precision, and the zero depth
including the environmental situations, the object materials, the operation settings, the
hardware limitations, etc. are reviewed rst in this section based on several correspond-
ing publications. Then, this section introduces a geometric model which can explain the
depth-disparity calibration, the depth resolution, and depth error related to the range
measurement. Through learning the knowledge of the noise characteristics and the error
model, the Kinect sensor can be utilized appropriately in this research.
22
2.2.1 Error Sources
Ambient light
A structured-light kinect v1 sensor will generate invalid depth data (i.e., depth value
is zero) when it is operating outdoor under strong sunlight because the solar radiation
contains the near infrared with the wavelength of 830 nm, which is identical to the wave-
length of the infrared emitter used by a Kinect sensor (Langmann, 2014). Sarbolandi
et al. (2015) conducted an experiment to quantify the power level of solar irradiance
which can blind a Kinect sensor. This study used a Kinect sensor to take the depth
images of a diuse white wall located at 1.2 m away from the sensor, and illuminated by a
controlled lighting system with three 400W halogen lamps, which simulated the solar ir-
radiance. The Kinect sensor failed to produce the valid depth images when the irradiance
was higher than 1W/cm
2
. However, the low-power solar irradiance (1W/cm
2
) may
make a Kinect sensor generate some invalid depth data but barely aects the accuracy
and precision of the valid depth measurement. Chow et al. (2012) tested the Kinect depth
acquisition in under on-o indoor
uorescent illumination. This study reported that the
accuracy and precision of the Kinect depth measurement is hardly aected by the changes
in the common indoor lighting conditions.
Temperature drift
Fiedler and M uller (2013) discovered that the changes in the surrounding temperature can
slightly perturb optical properties (i.e., intrinsic parameters) of the color and IR cameras
of a Kinect v1 sensor. The signicant correlation between the temperature variations and
the depth precision was also found in this study. The temperature dierence of 1
C can
23
lead to maximum deviations of 1.88 mm in a depth time history when measuring an object
located at 1.5 m in front of a Kinect sensor. Sarbolandi et al. (2015) reported that the
Kinect depth measurement was unstable in the rst 10 minutes after the sensor turned on.
But the uncertainty decreased after the sensor's internal temperature was stabilized by
the thermal control components, e.g., the fan and the Peltier cooler (OpenKinect, 2011).
Chow et al. (2012) suggested that a Kinect v1 sensor requires at least 60 minute warm-up
time to measure stable depth data.
Material properties
The structured-light Kinect v1 sensor estimates depth values by reading the projected
infrared sparkle pattern on an object (Freedman et al., 2013). Re
ection of infrared light
due to smooth and highly re
ective surface may create overexposed spots in the infrared
images and produce invalid depth data (the holes) in the depth maps (Khoshelham and
Elberink, 2012; Mallick et al., 2014). On the other hand, lowly re
ective material may
re
ect insucient intensity of infrared light back to the IR camera and generate invalid
depth data as well (Andersen et al., 2012; Dal Mutto et al., 2012). For a transparent
object, the emitted infrared beams is not able to deliver back to the IR camera; hence,
it can not be seen in a depth image (Mallick et al., 2014). For a translucent object, an
experiment conducted by Sarbolandi et al. (2015) shown that a Kinect sensor is able to
obtain the hole-free depth images from a transparent plastic tube lled with diluted milk
as lonn as penetration of light is less than 74%.
24
Imaging geometry
Several researches (Menna et al., 2011; Khoshelham and Elberink, 2012; Nguyen et al.,
2012; Smisek et al., 2013) had discovered that the depth accuracy is signicantly aected
by the measuring range between the object and the Kinect v1 sensor. All of these studies
reported that the depth error rises quadratically as the measuring range increases. The
scale of depth error varies from several millimeters to centimeters corresponding to the
nearest to the farthest operation range (the default is 800 mm to 4000 mm) of a Kinect
sensor. In addition, if an object locates outside of the operation range, i.e., too near or
too far, the Kinect sensor cannot recognize the object in the depth image. Chow et al.
(2012); Nguyen et al. (2012); Gonzalez-Jorge et al. (2013) examined if the incident angle
between the optical axis of a Kinect sensor and the normal vector of an object plane would
cause depth error. These studies reported that the depth accuracy is hardly aected by
the incident angle. But if the angle is close to 90
(i.e., the optical axis of a Kinect sensor
is parallel to the object plane), invalid depth data will be obtained from the plane.
Systematic errors
Inaccurate estimation of the depth values from the disparity measurements can result in
the systematic errors (Khoshelham and Elberink, 2012). The disparity is represented by
an 11-bit integer containing raw data of the oset between the reference and the observed
pattern in the IR image (see Figure 2.2), which needs to be converted to the metric depth
measurement. Dierent depth-disparity calibration models have been proposed by Burrus
(2010); Magnenat (2010); Park et al. (2012). Kinect software packages such as Microsoft
SDK and OpenNI also provide pre-calibrated depth data for easy use in practice. The
accuracy evaluation for these software packages can be seen in the paper published by
25
Smisek et al. (2013). It is worth mentioning that the operation range provided by the
Microsoft SDK is 800 mm to 4000 mm, but the OpenNI can deliver depth from 500 mm to
more than 9000 mm Andersen et al. (2012). Other systematic errors such as inadequate
disparity-depth calibration, lens distortion of the IR camera, and depth quantization
may also aect the the spatial depth accuracy (Andersen et al., 2012; Khoshelham and
Elberink, 2012; Smisek et al., 2013; Mallick et al., 2014; Choo et al., 2014; Sarbolandi
et al., 2015).
Multiple sensors
When pointing multiple Kinect v1 sensors at the same object simultaneously, a depth cam-
era can be confused by overlapping infrared sparkle patterns projected by other Kinects'
infrared emitters and produced invalid depth data for the overlapping regions (Kramer
et al., 2012; Maimone and Fuchs, 2012). However, infrared interference from multiple
Kinect sensors has no signicant in
uence on depth accuracy and depth precision (Mart n
et al., 2014; Sarbolandi et al., 2015; Andersen et al., 2012).
2.2.2 Depth Measurement Model
Khoshelham and Elberink (2012) proposed a geometrical model, illustrated in Figure 2.2,
to explain the relation between the depth Z
k
and the raw disparity d
k
, and to model the
depth resolution and the depth accuracy. In Figure 2.2, the depth valueZ
k
is the distance
between the object plane and the image plane of the IR camera. The object plane is an
imaginary plane parallel to the image plane and passes the sparkle dot k on the object
surface which is projected by the laser beam pk. The depth value Z
r
is the predened
parameter representing the distance between two parallel planes { the image plane and
26
the reference plane. The sparkle dots on the reference plane (i.e., the point r projected
by the laser beam pk) can be captured by the IR camera previously and stored in the
rmware of the Kinect sensor (i.e., the point m
0
in the image plane). The raw disparity
d
k
in the image plane is the shift between two intersection points m andm
0
in which two
re
ective beams {qk from the object plane andqr from the reference plane pass through
the image plane.
f
d
k
D
IR
camera
Object plane
Reference plane
IR
projector
k
r
b
Z
k
Z
r
Focal
length
Raw
disparity
Baseline
Image plane
q
p
k ’
Object
Surface
pk
qk
qr
m
m ’
Figure 2.2: The geometrical model of depth measurement.
27
Depth-disparity relation
According to the proportions of the similar triangles, the ratios of the lengths for the
triangles4qkk
0
and4qmm
0
is:
D
d
k
=
Z
k
f
(2.1)
and for the triangles4rpq and4rkk
0
, the ratios are:
D
b
=
Z
r
Z
k
Z
r
(2.2)
where f is focal length of the IR camera and b is the baseline (distance) between the IR
camera and the laser projector. Substituting D from Equation (2.1) to Equation (2.2)
can obtain the relation between the depth value Z
k
and the corresponding raw disparity
d
k
:
Z
k
=
Z
r
1 +
Zr
fb
d
k
(2.3)
In Equation (2.3), the constant parameters f, b, and Z
r
can be determined through
calibration procedures.
The Kinect v1 sensor normalizes the raw disparity d
k
(in pixels) into the 11-bit in-
teger with range between 0 and 2047 called the normalized disparity d
n
, which can be
represented as:
d
n
=
d
k
d
k;min
d
k;max
d
k;min
(2.4)
28
In Equation (2.3), the raw disparity d
k
can be replaced as the following term:
d
k
=md
n
+n (2.5)
wherem =d
k;max
d
k;min
andn =d
k;min
. Therefore, the depth valueZ
k
can be rewritten
as:
Z
k
=
1
(
m
fb
)d
n
+ (Z
1
r
+
n
fb
)
(2.6)
or its inverse form:
Z
1
k
= (
m
fb
)d
n
+ (Z
1
r
+
n
fb
) (2.7)
which shows that the inverse depth Z
1
k
and its corresponding normalized disparity d
n
have a linear relationship. According to the depth-disparity calibration result reported
by Khoshelham and Elberink (2012), the depth-disparity relation in Equation (2.7) can
be determined as the following formula:
Z
1
k
= (2:8571e5)d
n
+ 0:0311 (2.8)
Depth resolution
The measurement resolution species the capability of an instrument to distinguish and
indicates the smallest change in the measured quantity (AIAG, 2002; Morris and Langari,
2012). The depth resolution, also known as the quantization step size (Smisek et al.,
29
2013), can be dened as the minimum dierence between two depth values produced by
two successive levels of the disparities:
4z =Z
k
(d
n
)Z
k
(d
n
1) (2.9)
The minimum depth dierence can be calculated approximately by taking derivative of
Z
k
respect to d
n
(Mallick et al., 2014):
4z
@Z
k
@d
n
=(
m
fb
)Z
k
2
(2.10)
It should be mentioned that the answer appears in the original paper written by Khoshel-
ham and Elberink (2012) is 4z = (m=fb)Z
k
2
without a minus sign. However, to
avoid confusion, this thesis consistently uses the negative constant factor (m=fb) =
2:8571e5 from the calibration result mentioned previously, and yields positive4z =
(2:8571e5)Z
k
2
from Equation (2.10) to show the quadratical relation between the depth
resolution and the depth measurement (in centimeters).
Depth error
If the normalized disparity contains the random noise with a normal distribution, the
uncertainty of disparity can propagate to depth data as represented in the following
equation:
Z
2
=
@Z
k
@d
n
d
2
(2.11)
30
where
d
and
Z
are the standard deviation of the normalized disparity and the standard
deviation of the corresponding depth estimation. Then, the standard deviation of depth
can be written as the following form:
Z
= (
m
fb
)Z
2
k
d
(2.12)
Similar to the depth resolution, when the distance between the Kinect v1 sensor and
the measuring structure increases, the depth error raises quadratically. Khoshelham and
Elberink (2012) suggests
d
= 1=2 according to the experimental result. Therefore, for
the measuring range of 100 cm, the depth error is1.4 mm.
Figure 2.3: The curves show the Kinect depth measurement versus the depth resolution
(blue) and the depth error(red) (Khoshelham and Elberink, 2012).
31
2.3 Camera Calibration
A Kinect v1 sensor consists of a color camera and a depth sensor. The depth sensor uses
an IR camera to capture the images of the speckle patterns projected by the IR emitter.
The camera calibration procedure is required to estimate the intrinsic parameters of the
color and the IR cameras, and the extrinsic parameters to represent the transformation
between the two camera coordinate systems. Intrinsic and extrinsic parameters can be
used for 3D reconstruction and color-depth registration when unitizing a Kinect sensor to
quantify evolving deformation elds. In this study, to determine these parameters, several
images of a chessboard were captured by the color and the IR cameras simultaneously,
from dierent orientations. The IR emitter should be covered to prevent the speckle
pattern projecting on the chessboard which can be captured by the IR camera and shown
in the IR images. A series of chessboard pictures taken by the color camera and the IR
camera is shown in Figure 2.4a and Figure 2.4b.
Calibration images
(a)
Calibration images
(b)
Figure 2.4: The camera calibration target taken from dierent camera posi-
tions/orientations: (a) calibration images for the RGB camera and (b) calibration images
for the IR camera.
32
2.3.1 Intrinsic Calibration
The intrinsic calibration is utilized to estimate optical properties of a camera, such as the
focal length, the principal point, and the coecients of lens distortion (Eitzinger et al.,
2015). This research used the Camera Calibration Toolbox for MATLAB (Bouguet, 2015)
to perform the intrinsic and extrinsic calibration procedures. For the intrinsic calibration
procedure, the Camera Calibration Toolbox read a series of chessboard images taken by a
camera in dierent orientations (Figures 2.4a and 2.4b as examples). Then the program
predicted the grid corners of a chessboard image after the four edges of the chessboard
pattern were selected by the user to dene the boundary of the chessboard image and to
give the sizes of each grid square (see Figures 2.5a and 2.5b). If there is no considerable
lens distortion, the predicted grid corners will be close to the real corners shown in the
chessboard image. Hence, the coordinates of the predicted grid corners can be extracted
to be a set of known inputs for the program. However, if there are obvious dierences
between the predicted grid corners and the real corners in the image, which is caused by
the lens distortion, an additional procedure is required to estimate distortion coecients
to minimize the dierences. Because both the color and the IR cameras of a Kinect
sensor are equipped with the low-distortion lenses, estimation of distortion coecients is
usually not necessary. After the corner extraction procedures were completed for all the
input chessboard images, the Camera Calibration Toolbox used a nonlinear optimization
algorithm to compute the intrinsic parameters. An example of the intrinsic parameters
for a Kinect v1 sensor is shown in Table 2.2.
33
X
Y
O
The red crosses should be close to the image corners
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450
(a)
O
dX
dY
Xc (in camera frame)
Yc (in camera frame)
Extracted corners
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450
(b)
Figure 2.5: Camera calibration procedure: (a) corner prediction, (b) corner extraction.
2.3.2 Extrinsic Calibration
The extrinsic calibration, as known as the stereo calibration, is generally performed to ob-
tain the translation and rotation matrices between two (or multiple) cameras for a stereo
camera system. Although a Kinect sensor does not adopt the dual-camera technique
to produce a 3D scene, the extrinsic calibration was utilized to align a pair of unregis-
tered color and depth images taken by the color camera and the depth sensor, which are
congured side-by-side and have dierent eld of views. Before performing the extrinsic
calibration for the two cameras, the intrinsic calibration procedure for each camera should
be accomplished rst. The Camera Calibration Toolbox took the two sets of the intrinsic
parameters to estimate the extrinsic parameters { the rotation matrix and the translation
matrix (see Table 2.2). The estimated spatial relation between the color camera and the
IR camera, and dierent orientations of the chessboard are shown in Figure 2.6.
34
(a)
Figure 2.6: The 3D plot of the spatial conguration for the stereo calibration.
Table 2.2: An example of the intrinsic and extrinsic parameters for a Kinect v1 sensor
RGB Camera IR Camera
Focal length
f
x
520.44 586.48
f
y
520.27 586.06
Principal point
c
x
325.84 319.80
c
y
267.23 245.38
Distortion Coecients
k
1
0.2185 -0.1334
k
2
-0.4600 0.4458
k
3
0.0026 0.0029
k
4
0.0032 0.0004
k
5
0 0
Rotation Matrix R
2
4
1 0:0046 0:0044
0:0046 1 0:0034
0:0044 0:0033 1
3
5
Translation Matrix T
252:2644 3:8256 10:0831
T
y
The mathematical symbols of the intrinsic and extrinsic parameters can be seen in the
Camera Calibration Toolbox for MATLAB (Bouguet, 2015).
35
2.3.3 IR-Depth Oset
There is an oset between the IR camera and the IR projector. So, an IR image and
its depth image are slightly misaligned (Smisek et al., 2013). In this research, the shift
was estimated by comparing the pixel coordinates of the same point (e.g., the center of a
circular disk) shown in the IR image (Figure 2.7a) and in the depth image (Figure 2.7b).
The pixel coordinate of the disk center in the IR image and in the depth image were
determined with the MATLAB code imfindcircles. The edges and centers of the two
detected circles were plotted in Figure 2.8. The pixel coordinate of the disk center shown
in the IR image was (308:1; 291:9) and the pixel coordinate of the disk center shown in
the depth image was (304:5; 289:1). Hence, the oset between the IR image and the depth
image was about 4 pixels in the horizontal direction and about 3 pixels in the vertical
direction. This results was similar to the estimation reported by Smisek et al. (2013).
2.4 3D Reconstruction and RGB-D Registration
This section presents a 3D reconstruction approach to convert the acquired depth image
into a 3D scene based on the projective geometry of a pinhole camera model. This
approach also includes the color-depth registration procedure by transferring the 3D scene
from the coordinate system of the IR camera to the coordinate system of the color camera
and then re-projecting the 3D scene back to the pixel domain of the color camera. The
intrinsic and extrinsic camera parameters introduced in Section 2.3 are required for the
computation of 3D coordinates and the coordinate transformation.
36
Figure 2.7: Detecting the circular disk and estimating its center in the IR and depth
images.
Figure 2.8: Shift between two detected circles in the IR and depth images.
37
2.4.1 Pinhole Camera Model
A camera is an optical instrument which can project a 3D scene onto a 2D image plane.
The mathematical representation of the projection from the 3D coordinates to 2D coordi-
nates can be simply illustrated by the pinhole camera model (Cyganek and Siebert, 2011).
The pinhole camera model imitates the imaging principle of an ideal pinhole camera which
reduces the camera aperture to a point, and neglects the characteristics of lens. Figure 2.9
shows the geometry of the pinhole camera model. Two important components | focal
length f and principal point O
f
are the intrinsic properties of a camera. The focal length
is the distance between the camera center (i.e., the pinhole) and the image plane. The
principal point is the intersection point on the image plane in which a line (the principal
axis) passing through the camera center intersects the image plane perpendicularly.
Camera
Center
(x
c
, y
c
, z
c
)
Image
plane
Focal
length
Y
c
X
c
Z
c
Y
f
X
f
u
i
v
i
O
c
O
i
f
Principal
Axis
(x
f
, y
f
)
O
w
X
w
Y
w
Z
w
f
y
c
y
f
z
c
f
x
c
x
f
z
c
R, t
P
p
O
f
Principal
point
Figure 2.9: The geometry of the pinhole camera model.
There are four coordinate systems shown in the gure of the pinhole camera model in-
cluding the world coordinate system (X
w
; Y
w
; Z
w
), the camera coordinate system (X
c
; Y
c
; Z
c
),
38
the image coordinate system (X
f
; Y
f
), and the pixel coordinate system (u
i
; v
i
) (W ohler,
2012). First, considering the projection of a 3D point P (x
c
;y
c
;z
c
) in the camera co-
ordinate system into a 2D point p(x
f
;y
f
) in the image coordinate system, the relation
between the 3D camera coordinates and the 2D image coordinates can be formulated
as the following equations according to the scale factor of the similar triangles shown in
Figure 2.9:
x
f
=f
x
c
z
c
(2.13a)
y
f
=f
y
c
z
c
(2.13b)
Modern cameras consist of digital image sensors. Hence, the image coordinate system
with metric units (e.g., millimeters) should be converted into the pixel-based coordinate
system (Ma et al., 2012). The transformation from the image coordinates system to the
pixel coordinates system is illustrated in Figure 2.10. Let (x
s
;y
s
) be the coordinates of
a point p with respect to the coordinate system (X
s
; Y
s
) which are mapped from the
metric coordinates (x
f
;y
f
) in the image plane. The metric-to-pixel conversion can be
represented as the following equations:
x
s
=s
u
x
f
(2.14a)
y
s
=s
v
y
f
(2.14b)
39
where s
u
and s
v
are the scale factors related to the column-wise and the row-wise pixel
density (e.g., pixels per millimeter or pixels per inch).
u
i
v
i
O
i
X
s
Y
s
(c
x
, c
y
)
s
u
s
v
(u, v)
X
f
Y
f
O
f
(x
f
, y
f
)
1
1
Normalized
image plane
Pixel
coordinates
Figure 2.10: Transformation from the image plane to the pixel coordinate system.
In a conventional pixel coordinate system (u
i
; v
i
), the origin point is usually located
in the top-left corner. The pixel coordinates (u;v) of the point p in the pixel coordinate
system can be expressed as:
u =x
s
+c
x
(2.15a)
v =y
s
+c
y
(2.15b)
40
where (c
x
;c
y
) are the the principal point in the pixel coordinate system. Then, by com-
bining Equation (2.14) and Equation (2.15), the mathematical representation of the pro-
jection from the 3D camera coordinate system to the 2D pixel coordinate system can be
written as:
u =f
x
x
c
z
c
+c
x
(2.16a)
v =f
y
y
c
z
c
+c
y
(2.16b)
where f
x
= s
u
f and f
y
= s
v
f are the focal lengths in pixel units. The matrix form of
Equation (2.16) can be expressed in the homogeneous representation as:
s
2
6
6
6
6
6
4
u
v
1
3
7
7
7
7
7
5
=
2
6
6
6
6
6
4
f
x
0 c
x
0 f
y
c
y
0 0 1
3
7
7
7
7
7
5
| {z }
K
2
6
6
6
6
6
4
x
c
y
c
z
c
3
7
7
7
7
7
5
(2.17)
where s is a scale factor and K is called the camera matrix. For a more general term of
the camera matrix K, the skew coecient due to non-rectangular pixels can be added
in the matrix:
K =
2
6
6
6
6
6
4
f
x
c
x
0 f
y
c
y
0 0 1
3
7
7
7
7
7
5
(2.18)
41
If a signicant lens distortion is observed, especially for the wide-angle lens, the distorted
coordinates can be modeled as the following equation (Kaehler and Bradski, 2016):
0
B
@
x
d
y
d
1
C
A = (1 +k
1
r
2
+k
2
r
4
+k
3
r
6
)
| {z }
Radial distortion
0
B
@
x
0
c
y
0
c
1
C
A +
0
B
@
2k
4
x
0
c
y
0
c
+k
5
(r
2
+ 2x
0
c
2
)
k
4
(r
2
+ 2y
0
c
2
) + 2k
5
x
0
c
y
0
c
1
C
A
| {z }
Tangential distortion
(2.19)
where k
1
, k
2
, k
3
are the radial distortion coecients, k
4
, k
5
are the tangential distortion
coecients, x
0
c
= x
c
=z
c
and y
0
c
= y
c
=z
c
are the undistorted normalized coordinates, and
r
2
=x
0
c
2
+y
0
c
2
. After employing the lens distortion model, Equation (2.16) can be rewritten
as:
u =f
x
x
d
+c
x
(2.20a)
v =f
y
x
d
+c
y
(2.20b)
Considering (x
w
;y
w
;z
w
) is the coordinates of a 3D point P in the world coordinate
system. The transformations from the world coordinate system to the camera coordinate
system is expressed by a 3-by-1 translation matrix t and a 3-by-3 rotation matrix R. The
homogeneous representation of the transformation between the world coordinates system
42
and the camera coordinate system for the 3D point P can be denoted as the following
matrix form:
2
6
6
6
6
6
4
x
c
y
c
z
c
3
7
7
7
7
7
5
=
2
6
6
6
6
6
6
6
4
r
11
r
12
r
13
r
21
r
22
r
23
r
31
r
32
r
33
| {z }
R
t
1
t
2
t
3
|{z}
t
3
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
4
x
w
y
w
z
w
1
3
7
7
7
7
7
7
7
7
7
5
(2.21)
Combining Equation (2.17) with the skew coecient and Equation (2.21) together, the
complete pinhole camera model used to project the point P in the 3D world coordinate
system to the 2D pixel coordinate system can be described as:
s
2
6
6
6
6
6
4
u
v
1
3
7
7
7
7
7
5
=
2
6
6
6
6
6
4
f
x
c
x
0 f
y
c
y
0 0 1
3
7
7
7
7
7
5
| {z }
Intrinsic parameters
2
6
6
6
6
6
4
r
11
r
12
r
13
t
1
r
21
r
22
r
23
t
2
r
31
r
32
r
33
t
3
3
7
7
7
7
7
5
| {z }
Extrinsic parameters
2
6
6
6
6
6
6
6
6
6
4
x
w
y
w
z
w
1
3
7
7
7
7
7
7
7
7
7
5
(2.22)
or
sp = K[Rj t]P (2.23)
Equation (2.22) contains two important parameters of the camera model{the intrinsic
and extrinsic parameters. The intrinsic parameters describe the geometric properties
of the camera, which includes the focal lengths f
x
and f
y
, the principal point (c
x
;c
y
),
and the skew coecient . The lens distortion coecients (k
1
;k
2
;k
3
;k
4
;k
5
) also can
be considered as part of the intrinsic parameters. The extrinsic parameters dene the
43
position and orientation of the camera in the world coordinate system. The extrinsic
parameters include the rotation matrix R and the translation vector t which denote
the transformation of the origin point from the world coordinate system to the camera
coordinate system. The intrinsic parameters and extrinsic parameters of a camera can be
determined through the camera calibration procedures introduced in Section 2.3.
2.4.2 Depth to 3D Conversion
A depth map contains the pixel-wise distance data measured along the Z-axis to the IR
camera of a Kinect sensor. It is convenient to reconstruct a 3D scene from a 2D depth map
because of the known depth value of z
c
which is shown in Equation (2.16). The formula
of the 3D reconstruction can be derived by taking the inverse form of Equation (2.16)
(Kaehler and Bradski, 2016), which is displayed as below:
x
IR
=
(u
IR
c
x;IR
)z
IR
f
x;IR
(2.24a)
y
IR
=
(v
IR
c
y;IR
)z
IR
f
y;IR
(2.24b)
z
IR
=z
D
(2.24c)
where (x
IR
;y
IR
;z
IR
) are the 3D coordinates of a pointP with respect to the IR camera of
a Kinect sensor. The parameters (c
x;IR
;c
x;IR
) are the principal point, and f
x;IR
, f
y;IR
are
the focal lengths of the IR camera. Because there is an oset (u
o
;v
o
) existing between the
44
depth map and the IR image (Section 2.3.3), to use the depth map generating a 3D scene
in the IR coordinate system, it needs to align the depth image to the IR images with
u
IR
=u
D
+u
o
(2.25a)
v
IR
=v
D
+v
o
(2.25b)
where (u
D
;v
D
) are the pixel coordinates of the point P in the depth image. After per-
forming the IR-depth registration, the variable z
IR
is equal to the known depth value z
D
measured by the Kinect sensor.
2.4.3 Depth to Color Registration
The color camera and the IR camera are located side by side on a Kinect sensor (Fig-
ure 2.11); therefore, the eld-of-view of the color camera are dierent from the eld-of-view
of the IR cameras. It causes misalignment between a pair of depth and color images taken
by the Kinect sensor (Burrus, 2010; Konolige and Mihelich, 2011). An example of a pair
of unregistered color and depth images is shown in Figures 2.13a and 2.13b. Thus, if an
object is identied in the color image, it is impossible to use the same pixel coordinates
from the color image to obtain its depth measurements in the depth image.
Figure 2.12 shows the IR coordinate system (X
IR
; Y
IR
; Z
IR
) and the color coordinate
system (X
C
; Y
C
; Z
C
). It should be mentioned that the IR coordinate system and the
depth coordinate system become identical because the oset between the IR image and
the depth image has been corrected as described in Section 2.3.3. The depth to color
45
Figure 2.11: The color camera and the IR camera of a Kinect sensor.
IR
camera
Color
camera
R, t
X
IR
Y
IR
Z
IR
X
C
Y
C
Z
C
P
O
C
u
IR
v
IR
u
C
v
C
O
IR
Figure 2.12: 3D coordinate transformation between the IR and the color coordinate sys-
tems.
46
image registration based on the coordinate transformation follows a three-step proce-
dure: rst, back-projecting the pixel coordinates (u
IR
;v
IR
) of a 2D point in the IR image
plane to its corresponding 3D coordinates (x
IR
;y
IR
;z
IR
) with respect to the IR camera
using Equation (2.24); second, transferring the 3D coordinates (x
IR
;y
IR
;z
IR
) from the IR
coordinate system to the color coordinates (x
C
;y
C
;z
C
) with the following formula:
2
6
6
6
6
6
4
x
C
y
C
z
C
3
7
7
7
7
7
5
= R
2
6
6
6
6
6
4
x
IR
y
IR
z
IR
3
7
7
7
7
7
5
+ t (2.26)
where the rotation matrix R and the translation matrix t represent the transformation
from the IR coordinate system to the color coordinate system, and third, projecting the
3D coordinates (x
C
;y
C
;z
C
) with respect to the color camera to the pixel coordinates
(u
C
;v
C
) in the color image plane. The third step can be expressed with the following
equations:
u
C
=f
x;C
x
C
z
C
+c
x;C
(2.27a)
v
C
=f
y;C
y
C
z
C
+c
y;C
(2.27b)
where f
x;C
;f
y;C
;c
x;C
;c
y;C
are the intrinsic parameters of a Kinect's color camera.
Applying the color-depth registration approach on an unregistered depth image (Fig-
ure 2.13a) can obtain a pair of aligned color and depth images (Figures 2.13c and 2.13d).
47
(a) (b)
(c) (d)
(e) (f)
Figure 2.13: The color-depth registration examples: (a) and (b) are the unregistered
depth and color images; (c) and (d) are the registered depth and color images aligned
with the camera calibration approach; (e) and (f) are the registered color and depth
images produced by OpenNI.
48
The Kinect's development software packages such as OpenNI and Windows SDK also pro-
vide programming functions using manufactures' calibration data stored in the rmware
to generate aligned color and depth images, with reasonable accuracy in the range of 1 to
1.5 m (Smisek et al., 2013; Xiang et al., 2015). An example of a pair of registered color
and depth images produced by the OpenNI is shown in Figures 2.13e and 2.13f.
2.4.4 3D Point Cloud
When the depth value Z
D
of a pixel (u
D
;v
D
) is obtained in the depth image, the pixel
coordinates can be converted into a 3D coordinates (X
D
;Y
D
;Z
D
) with respect to the IR
camera using Equations (2.24a) to (2.24c). If a pair of color and depth images are already
registered, the color information (R;G;B) of the pixel can also be added to the point.
The collection of all coloring 3D points is called the point cloud, which looks like a group
of unconnected points
oating in a 3D space (Figure 2.14).
(a) (b)
Figure 2.14: The 3D point clouds: (a) the raw point clouds, and (b) the colored
point clouds which were displayed using the CloudCompare software package (Girardeau-
Montaut, 2017).
49
2.5 Summary
This chapter surveyed the characteristics of noise sources, which can interfere in depth
measurement of a Kinect v1 sensor, based on dierent published papers. The noise sources
studied here include the ambient light, the temperature variation, the optical properties
of measuring objects, the imaging geometry, the systematic errors, and the congurations
of multiple kinects. These noise sources may make a Kinect sensor generate inaccurate
depth data, zero depth values, or cause depth measurement unstable, that are summarized
in Table 2.3.
In general, inaccurate depth measurement is related to the measuring range and the
systematic errors. The accuracy of depth values can be improved by performing the
calibration procedures (e.g., depth-disparity calibration) to correct the systematic errors
and/or using factory calibration data provided by the software development kits (e.g.,
Kinect SDK and OpenNI). According to the depth error model (Section 2.2.2), if this
research focused on measuring short-range distance of about 1 m, the random error of
depth measurement for a calibrated Kinect is about 3 mm. Moreover, to avoid the issue of
zero depth, this research protected the Kinect sensors from exposing to the corresponding
noise sources. Such approaches will be presented in the following chapters. To acquire
stable depth data in this research, the Kinect sensors started operating after passing 60
minutes warm-up phase.
This chapter also introduced the method to align a pair of color and depth images, and
to reconstruct the 3D point clouds from the aligned color and depth images. Figure 2.15
displays the step-by-step
ow diagram of this approach, which includes: 1) the very
beginning stage of acquiring raw disparity from a Kinect sensor; 2) the depth-disparity
50
Table 2.3: Noise sources and their aects on Kinect's depth data
Inaccurate Depth Unstable Depth Zero Depth
Strong Sunlight
Temperature Variation
Material Optical Properties
Measuring Distance
Measuring Angle
Systematic Errors
Multiple Kinect
calibration to produce accurate depth data; 3) the oset correction to align the depth
image with the IR image; 4) the 3D reconstruction approach to generate the 3D scene in
the IR coordinate system; 5) the 3D coordinate transformation to transfer the 3D scene
from the IR coordinate system to the RGB coordinate system; and 6) the 2D projection
to map the 3D scene back to the 2D image. Through this method, a pair of color and
depth images will have identical pixel coordinates, that is very useful to identify an object
in the color image and index its range measurement in the depth image.
The software packages for Kinect programming such as OpenNI and Microsoft Kinect
SDK also provide functions using predened calibration parameters in the rmware to
generate aligned color and depth images with reasonable accuracy. In this research,
OpenNI was used to convert raw disparity to depth data, and to align the color and
depth images. A 2D-plane-based camera calibration technique was adopted to estimate
intrinsic parameters of the color camera, and then the intrinsic parameters and depth
values were utilized to generate the 3D point clouds (see Figure 2.16). For the short-
range measurements, this approach is easy to use in practice.
51
Depth-Disparity
Calibration
IR Camera
Calibration
IR-RGB Stereo
Calibration
IR-Depth Offset
Correction
3D
Reconstruction
3D Coordinate
Transformation
2D Projection
U
D
, V
D
, D
X
IR
, Y
IR
, Z
IR
X
RGB
, Y
RGB
, Z
RGB
f
IR
, c
IR
R, t
U
D
, V
D
, Z
D
U
IR
, V
IR
, Z
IR
U
RGB
, V
RGB
, Z
RGB
RGB-Depth
Registration
Kinect
Raw Disparity
RGB Camera
Calibration
f
RGB
, c
RGB
U
IR
V
IR
X
IR
Y
IR
Z
IR
f
IR
, c
IR
U
D
V
D
Z
U
D
V
D
D
U
IR
V
IR
V
D
U
D
X
IR
Y
IR
Z
IR X
RGB
Y
RGB
Z
RGB
R, t
U
RGB
V
RGB
Z RGB
U
RGB
V
RGB
X
RGB
Y
RGB
Z
RGB
f
RGB
, c
RGB
Disparity
Depth
Figure 2.15: The step-by-step procedure used to align the color and depth image, and to
reconstruct the 3D scene from the depth image.
52
3D
Reconstruction
X
RGB
, Y
RGB
, Z
RGB
U
RGB
, V
RGB
, Z
RGB
3D Point Clouds
OpenNI
RGB-Depth
Registration
RGB Camera
Calibration
f
RGB
, c
RGB
Figure 2.16: 3D reconstruction using the OpenNI software package.
53
Chapter 3
Measuring 3D Dynamic Displacement Fields
Using an RGB-D Camera
3.1 Overview
This chapter reports on the laboratory study that was conducted to precisely quantify the
performance envelope of the representative inexpensive RGB-D camera { the Microsoft
Kinect v1 sensor. The methodology of using the o-the-shelf RGB-D camera to measure
3D dynamic displacement elds is proposed in this chapter. It is summarized as follows:
1. The color and depth videos containing the oscillating motions of a rigid plate
mounted on a shaking table, were captured by a calibrated Kinect sensor. The
calibration of the sensor can help producing aligned color and depth images (see
Section 2.4.3); therefore, if a target point is identied in the color image, its pixel
coordinates from the color image can be used to index the depth value of the target
point in the depth image.
54
2. The color video was employed to recognize the markers painted on the rigid plate
and to track the pixel coordinates of these markers in the video. The depth measure-
ments in the depth video indexed by the color pixels were utilized to calculate the
3D coordinates of the tracking markers (see Section 2.4.2). Combining the tracking
3D coordinates and the recorded timestamp can generate the time history of the 3D
dynamic displacements.
This approach can be divided into four steps: 1) camera calibration; 2) target detec-
tion; 3) target tracking; and 4) displacement calculation. Figure 3.1 illustrates the step-
by-step
ow diagram. In order to quantify the accuracy of estimating 3D dynamic dis-
placement using the proposed approach, the displacements generated from Kinect depth
data were veried by the high accuracy LVDT displacement sensor. The accuracy of the
displacement time history typically encountered in the vibration measurement of
exible
structures undergoing complex dynamic response. The results of this experimental study
also evaluate some key matrices used in the structural dynamic eld. The assessment
reported herein includes the investigation of noise eects, amplitude bounds, amplitude
accuracy relative to frequencies, moving direction of the target with respect to sensor,
etc.
3.2 Experimental Design
3.2.1 Data Acquisition
To perform the proposed approach for the non-contact measurement of 3D dynamic dis-
placement elds using an inexpensive RGB-D camera, a rigid aluminum plate, served as
a target object measured by a Kinect sensor, was mounted on a linear long-stroke shaker
55
Depth video
Z1
Z3
RGB video
Target Detection
Target Tracking
Displacement
Calculation
(u1,v1)
(u2,v2)
(u3,v3)
RGB image Depth image
Calibration
(RGB-D alignment)
RGB image Depth image
u
v
u
v
Z
Y
X
(u1,v1)
(u2,v2)
(u3,v3)
Z2
Time
Figure 3.1: The methodology of using an inexpensive RGB-D camera to measure 3D
dynamic displacements includes four steps: 1) calibrating camera and registering color
and depth images; 2) detecting target points in the color image and mapping target points
in the depth image; 3) tracking target points in the captured video; and 4) calculating
displacement time histories.
56
(APS 400 ELECTRO-SEIS
r
). The target plate was covered by some standard printing
papers to minimize infrared re
ection from the shiny aluminum surface, which could cause
invalid depth values (holes) in the depth images. A black-and-white checkerboard pattern
was attached on the target plate. The inner corners of the black squares represented the
points of interest to be detected in the color images. The displacement of the shaker
stroke was recorded by an LVDT (Schaevitz
TM
Sensors HR 4000), which served as the
reference sensor to validate accuracy of the displacement measurements calculated from
Kinect's depth data. The Kinect sensor (without a plastic outer case) fastened on a rigid
plate was mounted on a stable structure. The experimental setup is shown in Figure 3.2.
To minimize depth error related to image geometry (see Section 2.2.1), the distance
between the target plate and the Kinect sensor was established to about 1000 mm, which
is close to the default minimum range of 800 mm, since the depth error was reported rising
quadratically as the measuring range increasing (Khoshelham and Elberink, 2012). For
the measuring range of 1000 mm, the estimated depth resolution (i.e., the quantization
step size) is about 3 mm, which is calculated using Equation (2.10); hence, the lower
bound of the peak amplitude for harmonic excitation generated by the APS 400 shaker
was set to 5 mm. On the other hand, the upper bound of the frequency for the harmonic
excitation was set to 2 Hz to make the peak amplitude reach up to 20 mm, which was
because of the limitation of the experimental instrument (the shaker and amplier). It
should be noted that for large
exible structures which is composed of multiple lightweight
components, the low-frequency modes often dominate most of the system responses (Ni
et al., 2009; Psimoulis et al., 2008), hence this study concentrated on the dynamic testings
with the lower frequency range (0 Hz to 2 Hz).
57
(a)
(b) (c)
Figure 3.2: Test apparatus: (a) the experimental setup for the dynamic displacement
measurement, (b) the aluminum plate on the shaker, and (c) the Microsoft Kinect.
58
This study used two independent data acquisition systems: one was used to control
the shaker and collect LVDT displacement data, and the other was for Kinect data ac-
quisition. In order to ensure a stable sampling rate (30 Hz) generated by the Kinect
sensor, an RGB-D data acquisition code was developed using the C/C++ programming
language, with the OpenNI library, and operated on a Microsoft Windows 7 computer
This computer was equipped with high performance components including an 8-core 4.0
GHz CPU (AMD FX-8350), a 16GB DDR3 1333 memory, an 1 GB graphic card (NVIDIA
GeForce 8400 GS), and a solid state hard drive. By using the OpenNI library, the Kinect
data acquisition program recorded a series of aligned color and depth image frames into
a video le with the OpenNI le format (.oni). The frame numbers and timestamps were
also saved into a separate text le (.txt). Because Kinect v1 sensor does not support
hardware synchronization, there is a small time shift between a pair of color and depth
frames. Programming libraries such as OpenNI and Microsoft Kinect SDK minimize the
time shift to a few milliseconds (up to 16 ms) (Mihelich, 2012). The small time shift can
be considered as part of the measurement error.
3.2.2 Data Extraction
To quantify 3D dynamic displacement using a Kinect sensor, the ONI videos, which
contain displacement time histories of a rigid plate, were recorded. The pattern of a black-
and-white checkerboard on the rigid plate can help identifying and tracking the points of
interest in the color images. The robust corner detection algorithm proposed by Geiger
et al. (2012) was adopted to extract the corners of the black-and-white checkerboard
pattern and locate their pixel coordinates (u
C
;v
C
) in each color image. For a pair of
aligned color and depth images, the pixel coordinates of a depth map are identical to the
59
(a) (b)
Figure 3.3: Extracting points of interest from (a) a color image and mapping to (b) a
depth image.
pixel coordinates of a color image, that is, (u
D
;v
D
) = (u
C
;v
C
). Figure 3.3 shows that the
corners are extracted from a color image and mapped to a depth image.
Consequently, for a pair of aligned color and depth images, the depth value z
i
of the
target pointP
i
in the color image can be directly obtained from the depth image using its
pixel coordinates (u
i
C
;v
i
C
). After the depth valuez
i
of a target pointP
i
was measured, the
3D coordinates (x
i
;y
i
;z
i
) of the point P
i
with respect to the color camera of the Kinect
sensor can be computed using Equation (3.1), which are derived from the pinhole camera
model (Bradski and Kaehler, 2008).
x
i
=
(u
i
C
c
x
)z
i
f
x
y
i
=
(v
i
C
c
y
)z
i
f
y
(3.1)
z
i
=z
i
(u
i
C
;v
i
C
)
60
where (c
x
;c
y
) is the principal point, andf
x
andf
y
are the focal lengths of the color cam-
era. The intrinsic parametersf
x
;f
y
;c
x
;c
y
can be estimated through a camera calibration
procedure using the Camera Calibration Toolbox for MATLAB (Bouguet, 2015).
The data extraction procedure was performed using MATLAB to: 1) extract each color
and depth image from an ONI video le; 2) read timestamp for each frame; 3) identify
target points in the color image; 4) locate their pixel coordinates in the color image; 5)
obtain their depth values in the depth image; and 6) compute their 3D coordinates with
respect to the color camera of the Kinect sensor. The time histories of the positions for
the target points were stored in accordance with the format shown in Table 3.1. The
displacements in X, Y, or Z direction of a target point were plotted according to the time
history record (see Figures 3.9 to 3.10).
Table 3.1: Formatting displacement time history for target points
Time Point 1 Point 2 Point n
t
1
x
11
y
11
z
11
x
12
y
12
z
12
x
1n
y
1n
z
1n
t
2
x
21
y
21
z
21
x
22
y
22
z
22
. . . x
2n
y
2n
z
2n
.
.
.
.
.
.
.
.
.
.
.
.
t
m
x
m1
y
m1
z
m1
x
m2
y
m2
z
m2
x
mn
y
mn
z
mn
3.2.3 Data Processing
Data Filtering
After extracting the raw data of displacement measurement from the recorded ONI video
les, the next step was to lter raw data to remove DC bias, reduce noise, and smooth
the signal. Two Fast Fourier Transform (FFT) digital lters (Ribeiro et al., 2001; Slifka,
2004) were applied on the raw data of displacement: the high-pass FFT lter was used
61
to remove the DC bias, and the low-pass FFT lter was used to clean high-frequency
noises. An example of zero-mean noisy raw depth data (a harmonic signal with 20 mm
amplitude and 1.25 Hz frequency) and its frequency spectrum estimated with the power
spectral density (PSD) are shown in Figures 3.4a and 3.4b. The ltering results are
displayed in Figure 3.4d.
22 22.5 23 23.5 24 24.5 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(a)
0.5 1 1.5 2 2.5
0
5
10
15
20
Frequency (Hz)
|PSD|
(b)
0 10 20 30 40
−30
−20
−10
0
10
20
30
Time (s)
Displacement (mm)
Kinect
LVDT
(c)
22 22.5 23 23.5 24 24.5 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(d)
Figure 3.4: Kinect data post-processing: (a) raw displacement data, (b) PSD of dis-
placement data, (c) ltered Kinect and LVDT displacements, and (d) aligned Kinect and
LVDT displacements.
62
Data Alignment
Kinect video recording and LVDT data acquisition were unsynchronized. Before compar-
ing the two displacement measurements for error analysis, a data alignment procedure
was performed. The displacement data produced from the Kinect sensor and the LVDT
share similar waveforms; hence, a cross-correlation technique (MATLAB xcorr) was used
to estimate the time delay between the two data sets by measuring their similarity. Sub-
sequently, the two signals were aligned by shifting one of them according to the time
delay. It should be mentioned that both Kinect and LVDT data were obtained at the
same sampling rate of 30 Hz. To enhance accuracy of the synchronization procedure using
the cross-correlation approach, the two data were resampled (MATLAB resample) at a
higher rate of 1000 Hz with spline interpolation before data alignment. An example of
data synchronization is illustrated in Figures 3.4c and 3.4d. In Figure 3.4c, the plot shows
the shift between Kinect (red line) and LVDT (blue line) data. After applying the cross-
correlation technique on the two unaligned signals, the result of data synchronization is
displayed in Figure 3.4d.
3.2.4 Initial Tests
The initial tests were conducted to verify if the Kinect data acquisition system can measure
dynamic displacements at a stable sampling rate (30 Hz) using the designed hardware
and software architecture. The results are displayed in two groups of gures based on
oscillation frequencies and amplitudes. Figure 3.5 shows the ltered Kinect displacements
for the harmonic signals of 20 mm peak amplitude excited with dierent frequencies (0.5
Hz, 1 Hz, and 1.5 Hz), which are compared to LVDT data. Figure 3.6 illustrates the
63
acquired 1 Hz harmonic displacements with dierent amplitudes (5 mm, 10 mm, and 20
mm). The test results indicate that the Kinect sensor has a fundamental capability to
quantify low-frequency vibrations. More results regarding the comprehensive experiment
to verify the accuracy of the Kinect measurements are presented in the following sections.
20 22 24
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(a)
20 22 24
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(b)
20 22 24
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(c)
Figure 3.5: Displacement measurements obtained by the Kinect and LVDT for dierent
excited frequencies: (a) 0.5 Hz, (b) 1.0 Hz, and (c) 1.5 Hz.
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(a)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(b)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
Kinect
LVDT
(c)
Figure 3.6: Displacement measurements obtained by the Kinect and LVDT for dierent
peak amplitutes: (a) 5 mm, (b) 10 mm, and (c) 20 mm.
64
3.3 Data Analysis
3.3.1 Harmonic Excitation
Data Collection
The experiment utilizing a Kinect RGB-D camera to quantify harmonic displacements
involved dierent combinations of frequencies (0 Hz, 0.25 Hz, 0.5 Hz, 0.75 Hz, 1 Hz, 1.25
Hz, 1.5 Hz, 1.75 Hz, 2 Hz) and peak amplitudes (5 mm, 10 mm, 15 mm, 20 mm). The
harmonic motions were excited by sinusoidal signals except the tests of 0 Hz frequency.
To perform the test of 0 Hz frequency, square waves with very low frequency of 0.03 Hz
and long period of 300 sec were generated to simulate constant displacements.
(a) (b) (c)
Figure 3.7: Dierent motion orientations of the target plate (with respect to the Kinect
sensor): (a) motion perpendicular to the Kinect sensor, (b) motions parallel to the Kinect
sensor, (c) motion angled to the Kinect camera.
The harmonic motions of the rigid plate were taken by the Kinect sensor in three
dierent orientations | Case I: the moving direction of the rigid plate was parallel to the
depth axis (Z-axis) of the Kinect (see Figure 3.7a); Case II: the moving direction of the
65
rigid plate was horizontal perpendicular to the depth axis of the Kinect (see Figure 3.7b);
and Case III: there was a 45
angle between the moving direction of the rigid plate and the
depth axis of the Kinect (see Figure 3.7c). In Case I, the displacements in theZ-direction
were variable, but the displacements in the X-direction were constant (see Figure 3.8).
In Case II, the displacements in the Z-direction remained steady, but the displacements
in the X-direction were variable (see Figure 3.9). In Case III, both X- and Z-direction
displacements varied in time (see Figure 3.10). Since the rigid plate had no (or very
small) vertical motions for all the three cases, all of the Y -direction measurements were
nearly constant. To indicate the three testing scenarios, case I was called \depth-based
measurement" because the displacement was dominated by the depth value, case II was
named \pixel-based measurement" for the displacement measurement was governed by
the pixel coordinate, and case III was called \depth-and-pixel-based measurement".
Error Analysis
The harmonic displacements measured by a Kinect sensor were compared to corresponding
LVDT records. For each test, the normalized error and the normalized peak error between
Kinect and LVDT data were computed for the 12 corner points (see Figure 3.3) of the
target plate pattern using the following equations:
Normalized error =
kX
LVDT
X
Kinect
k
2
kX
LVDT
k
2
(3.2a)
Normalized peak error =
kP
LVDT
P
Kinect
k
2
kP
LVDT
k
2
(3.2b)
66
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(a)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(b)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(c)
Figure 3.8: Sample depth-based measurements (1.0 Hz, 20 mm): (a) x-direction, (b)
y-direction, and (c) z-direction.
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(a)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(b)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(c)
Figure 3.9: Sample pixel-based measurements (1.0 Hz, 20 mm): (a) x-direction, (b) y-
direction, and (c) z-direction.
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(a)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(b)
22 23 24 25
−20
−10
0
10
20
Time (s)
Displacement (mm)
(c)
Figure 3.10: Sample depth-and-pixel-based measurements (1.0 Hz, 20 mm): (a) x-
direction, (b) y-direction, and (c) z-direction.
67
wherek:k
2
is the Euclidean norm, X
LVDT
is the array corresponding to the time history
of the sampled LVDT displacement, X
Kinect
is the processed displacement time history
for the Kinect sensor, P
LVDT
is the LVDT displacement peaks, and P
Kinect
is the Kinect
displacement peaks.
It was evident that the estimated normalized errors and normalized peak errors are
similar for the 12 target points. For the sake of brevity, the results for only one representa-
tive point (i.e., center point) are shown. Figures 3.11 to 3.13 show six sets of tting curves
representing the relationship between the accuracy of the displacement measurements and
the frequencies of harmonic excitation, which were estimated using Equation (3.2a) (nor-
malized error) and Equation (3.2b) (normalized peak error), under three testing scenarios
discussed below.
Depth-based measurements: Figures 3.11a and 3.11b illustrate error analysis for the
depth-based measurements. In the cases of the peak displacements of 10 mm, 15 mm,
and 20 mm, the normalized errors remain at the same level of about 5% as the frequency
increases. However, in the peak displacement measurement of 5 mm, the normalized error
percentage raises when the frequency level increases. The depth resolution (i.e., quantiza-
tion step size) of Kinect v1 is approximately 3 mm for the distance of 1 m (Khoshelham
and Elberink, 2012; Smisek et al., 2013). Consequently, higher quantization errors occur
when a smaller peak amplitude (e.g., 5 mm) of the sinusoid wave is selected. Such a sinu-
soid wave, while being measured, is projected to a smaller number of quantization steps,
yielding a lower quantization resolution, and hence, higher quantization errors (Feher,
1997). Furthermore, as the frequency of the vibration increases for the 5-mm peak ampli-
tude case, there is a higher probability of noise contamination of the signal which yields
68
0 0.5 1 1.5 2
0
5
10
15
Frequency (Hz)
Normalized error (%)
5 mm
10 mm
15 mm
20 mm
(a)
0 0.5 1 1.5 2
0
5
10
15
Frequency (Hz)
Normalized peak Error (%)
5 mm
10 mm
15 mm
20 mm
(b)
Figure 3.11: Depth-based measurements errors: (a) normalized errors, and (b) peak errors.
0 0.5 1 1.5 2
0
5
10
15
Frequency (Hz)
Normalized error (%)
5 mm
10 mm
15 mm
20 mm
(a)
0 0.5 1 1.5 2
0
5
10
15
Frequency (Hz)
Normalized peak Error (%)
5 mm
10 mm
15 mm
20 mm
(b)
Figure 3.12: Pixel-based measurements errors: (a) normalized errors, and (b) peak errors.
0 0.5 1 1.5 2
0
5
10
15
Frequency (Hz)
Normalized error (%)
5 mm
10 mm
15 mm
20 mm
(a)
0 0.5 1 1.5 2
0
5
10
15
Frequency (Hz)
Normalized peak Error (%)
5 mm
10 mm
15 mm
20 mm
(b)
Figure 3.13: Depth- and pixel-based measurements errors: (a) normalized errors, and (b)
peak errors.
69
to even higher errors. Basically, the tting curves of normalized error and normalized
peak error for the depth-based measurement have similar trend.
Pixel-based measurements: Figures 3.12a and 3.12b display the tting curves of
normalized errors and peak errors for pixel-based measurements. In Figure 3.12a, for all
the displacement measurements, the normalized errors increase when the frequencies rise;
however, the normalized peak errors stay at the same level below 5%, which is shown
in Figure 3.12b. This disagreement could be explained by the rolling shutter distortion
of CMOS image sensors (Liang et al., 2008) when capturing images of moving objects.
The in-depth discussion and quantication of the rolling shutter distortion problem can
be seen in Section 4.2.3. When the frequency of the harmonic motions increased (i.e.,
average speed increased), severe image distortion diminished the accuracy of horizontal
(X-direction) displacement measurements. But, for the normalized peak errors analysis,
the estimation only considered the upper peaks and the lower peaks of the displacements,
where the instantaneous velocity of the target plate was zero, there was no rolling shutter
distortion occurred in the peak points.
Depth-and-pixel-based measurements: In this case, there was a 45
angle between
the moving direction and the depth axis of the Kinect sensor; hence, the velocity can
be divided into X-component and Z-component. Figure 3.13a shows that this setup will
reduce the velocity in the X direction, which can minimize measurement error caused by
rolling shutter distortion.
70
3.3.2 Random Excitation
Data Collection
In the practical applications, the measuring displacement does not include simple har-
monic only; hence, additional tests were conducted to quantify low-frequency random
vibrations using the Kinect sensor. Three distinct input random signals with dierent
levels of maximum amplitude were generated to excite the shaker. The three input sig-
nals were ltered by a low-pass lter to remove high frequency components above 5 Hz;
therefore, they can be used to excite the shaker at the low frequency range of 0 to 5 Hz.
The random displacements measured by the LVDT and the Kinect sensor were sampled
from Gaussian distributions. The experiments could be classied into three groups ac-
cording to their root mean square (RMS) levels: 1.94 mm, 8.44 mm, and 13.81 mm, which
were estimated based on the LVDT displacement measurements produced by the shaker
with the three input signals.
The random motions of the target plate were captured by the Kinect sensor in two
dierent scenarios: the depth-based measurement and the pixel-based measurement. Fig-
ure 3.14 illustrates the depth-based measurements under three dierent RMS levels and
Figure 3.15 shows the pixel-based measurements under three dierent RMS levels.
Error Analysis
The dierences between sampled Kinect and LVDT measurements were compared using
probability density functions (PDFs) for three dierent RMS amplitude levels (1.94 mm,
8.44 mm, and 13.81 mm). The probability density functions were smoothed based on the
normal kernel density estimation approach.
71
25 30 35
−40
−20
0
20
40
Time (s)
Displacement (mm)
Kinect
LVDT
(a)
25 30 35
−40
−20
0
20
40
Time (s)
Displacement (mm)
Kinect
LVDT
(b)
25 30 35
−40
−20
0
20
40
Time (s)
Displacement (mm)
Kinect
LVDT
(c)
Figure 3.14: Depth-based measurements for three RMS levels: (a) RMS = 1.94 mm, (b)
RMS = 8.44 mm, and (c) RMS = 13.81 mm.
25 30 35
−40
−20
0
20
40
Time (s)
Displacement (mm)
Kinect
LVDT
(a)
25 30 35
−40
−20
0
20
40
Time (s)
Displacement (mm)
Kinect
LVDT
(b)
25 30 35
−40
−20
0
20
40
Time (s)
Displacement (mm)
Kinect
LVDT
(c)
Figure 3.15: Pixel-based measurements for three RMS levels: (a) RMS = 1.94 mm, (b)
RMS = 8.44 mm, and (c) RMS = 13.81 mm.
Figure 3.16 shows the comparisons of the PDFs estimated with the Kinect and LVDT
data for depth-based measurement. Figure 3.17 illustrates the comparisons of the PDFs
computed with the Kinect and LVDT data for pixel-based measurement. Figure 3.18 and
Figure 3.19 display the comparisons of the PDFs for all sampled peak values (positive and
negative) of the Kinect and LVDT data, under depth-based measurement and pixel-based
measurement, respectively.
Three metrics were utilized to quantify the accuracy of random displacement measure-
ments for the Kinect sensor with respect to the LVDT data: the normalized error, the
72
−50 0 50
0
0.05
0.1
Displacement (mm)
RMS = 1.938 mm
kinect
LVDT
(a)
−50 0 50
0
0.05
0.1
Displacement (mm)
RMS = 8.442 mm
kinect
LVDT
(b)
−50 0 50
0
0.05
0.1
Displacement (mm)
RMS = 13.811 mm
kinect
LVDT
(c)
Figure 3.16: PDFs of the estimated Kinect displacements based on depth-based measure-
ments: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS = 13.81 mm.
−50 0 50
0
0.05
0.1
Displacement (mm)
RMS = 1.938 mm
kinect
LVDT
(a)
−50 0 50
0
0.05
0.1
Displacement (mm)
RMS = 8.442 mm
kinect
LVDT
(b)
−50 0 50
0
0.05
0.1
Displacement (mm)
RMS = 13.811 mm
kinect
LVDT
(c)
Figure 3.17: PDFs of the estimated Kinect displacements based on pixel-based measure-
ments: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS = 13.81 mm.
−50 0 50
0
0.05
0.1
Peak (mm)
RMS = 1.938 mm
kinect
LVDT
(a)
−50 0 50
0
0.05
0.1
Peak (mm)
RMS = 8.442 mm
kinect
LVDT
(b)
−50 0 50
0
0.05
0.1
Peak (mm)
RMS = 13.811 mm
kinect
LVDT
(c)
Figure 3.18: PDFs of the estimated Kinect peak displacements based on depth-based
measurements: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS = 13.81 mm.
73
−50 0 50
0
0.05
0.1
Peak (mm)
RMS = 1.938 mm
kinect
LVDT
(a)
−50 0 50
0
0.05
0.1
Peak (mm)
RMS = 8.442 mm
kinect
LVDT
(b)
−50 0 50
0
0.05
0.1
Peak (mm)
RMS = 13.811 mm
kinect
LVDT
(c)
Figure 3.19: PDFs of the estimated Kinect peak displacements based on pixel-based
measurements: (a) RMS = 1.94 mm, (b) RMS = 8.44 mm, and (c) RMS = 13.81 mm.
mean error for the two PDFs of sampled peaks, and standard deviation error for the two
PDFs of sampled peaks (see Figure 3.20), which are dened in the following equations:
Normalized error =
kX
LVDT
X
Kinect
k
2
kX
LVDT
k
2
(3.3)
Mean error (PDFs of peaks) =
LVDT
Kinect
LVDT
(3.4)
Standard deviation error (PDFs of peaks) =
LVDT
Kinect
LVDT
(3.5)
where X
LVDT
is the time history of the LVDT displacements, X
Kinect
is the time history
of the Kinect measurements,
LVDT
and
LVDT
are the mean and standard deviation for
the PDFs of all sampled peaks for the LVDT data, and
Kinect
and
Kinect
are the mean
and standard deviation for the PDFs of all sampled peaks of the Kinect measurements.
The results of the quantitative error analyses are summarized in Table 3.2.
On the whole, the normalized error, mean error, and standard deviation error decrease
when the RMS amplitude levels (1.94 mm, 8.4 mm, and 13.81 mm) increase. For the ran-
dom vibration tests, the normalized error analysis shows relatively low performance for
74
s
KINECT
s
LVDT
m
LVDT
m
KINECT
MEAN ERROR
Figure 3.20: Mean error and standard deviation error for two PDFs of peaks.
the pixel-based measurements, especially for the small RMS level (error of 124.74% for
RMS = 1.94 mm). The error could be mainly caused by the rolling shutter distortion,
which is mentioned in Section 4.2.3. By comparison, the Kinect sensor has better perfor-
mance for the depth-based measurements, particularly for the large RMS level (error of
8.6% for RMS = 13.81 mm).
For the error analyses using data of sampled peaks (positive and negative), mean errors
and standard deviation errors are less than normalized errors due to no rolling shutter
distortion at peak locations (instant velocity equals to zero). For larger RMS level (13.81
mm), the mean error (2.8% for depth, 2.02% for pixel) and standard deviation error
(8.07% for depth, 5.50% for pixel) indicate that the Kinect sensor has relatively good
Table 3.2: Comparison of normalized error, mean error, and standard deviation error
under depth-based and pixel-based measurements
RMS = 1.94 mm RMS = 8.44 mm RMS = 13.81 mm
Depth (z) Pixel (x) Depth (z) Pixel (x) Depth (z) Pixel (x)
Normalized error (%) 18.21 124.74 11.80 47.70 8.60 20.02
\Peaks" Mean error (%) 5.60 65.27 3.66 6.50 2.80 2.02
\Peaks" Std error (%) 14.52 80.46 8.16 10.76 8.07 5.50
75
performance to detect the peak displacements under random excitation. Overall, the
error analyses of the random tests agree with what has been discussed in the section of
the harmonic tests.
3.4 Summary
This chapter presents the methodology of using an inexpensive o-the-shelf RGB-D cam-
era (a Kinect v1 sensor as the representative) to measure dynamic 3D displacement elds.
The approach of measurements including camera calibration, target detection, target
tracking, and displacement calculation was implemented and evaluated through a labo-
ratory experimental study. The assessment involved testing the fundamental capability
of Kinect hardware and software to acquire vibration data at a stable sampling rate of
30 Hz, and conducting the quantitative evaluation of performance through measuring the
3D dynamic displacement of a rigid plate under harmonic and random excitation. The
measurement accuracy of the proposed method, which was estimated by comparing the
precise LVDT data, was examined in the combinations of three schemes: 1) the amplitude
level was from 5 mm to 20 mm; 2) the frequency range was from 0 Hz to 2 Hz; and 3)
the angles between the optic axis (i.e., the Z axis) of the sensor and the normal vector of
the target plate was 0
, 45
, and 90
.
The results of error analysis for the tests of harmonic and random excitation can be
categorized into three scenarios according to relative motions between the plate and the
Kinect sensor: 1) the depth-based measurement related to the motion direction parallel
to the Z axis of the sensor; 2) the pixel-based measurement related to the motion di-
rection parallel to the X axis (i.e., perpendicular to the Z axis) of the sensor; and 3)
76
Table 3.3: Summary of data analysis for depth-, pixel-, and depth-pixel-based measure-
ments.
Test Motion Direction Result
Depth-based
Measurement accuracy is mainly aected
by depth resolution, especially at the very
small displacement of 5 mm.
Pixel-based
Normalized error is related to rolling
shutter distortion, but peak error is not
aected, since the velocity is zero (i.e., no
distortion) at the peak points.
Depth-Pixel-based
Both normalized error and peak error
maintain at the level of about 5% for all
the testing scenarios. This setting re-
duces impact of rolling shutter distortion.
the depth-and-pixel-based measurement has a 45
angle between the motion direction and
the Z axis of the sensor. The results and data analysis of the harmonic experiments are
summarized in Table 3.3. The tests of random excitation also reveals similar phenomena:
the measurement error is mainly aected by depth error and rolling shutter distortion.
Overall, the measurement error is between 5% to 10% when using a Kinect v1 sensor for
a short-range measurement (0.8 to 1 m) to acquire the dynamic displacement larger than
10 mm with low frequency motion (0 to 2 Hz). This research showed that the Kinect sensor
is a convenient, feasible, and cost-eective tool to measure the evolving 3D displacement
elds of the structural dynamics problems indoor. The results of quantitative analysis
presented in this chapter can provide the guidelines regarding the in
uences of several
77
important issues (hardware and software limitation, rolling shutter distortions, etc.) that
would arise in eld implementations.
78
Chapter 4
Evaluating Performance of an RGB-D Camera for
Image Acquisition in Outdoor and Dynamic
Environments
4.1 Overview
Many RGD-D cameras including the Microsoft Kinect v1 sensor are closed-source sys-
tems; it is very dicult to modify patent-protected hardware or software to meet the
demand for a particular application. Therefore, understanding hardware characteristics
and limitations of a Kinect sensor and providing solutions for the related challenges are
the important tasks in this research. Three major hardware components of a Kinect
v1 sensor for 3D dynamic displacement acquisition are the IR projector, the IR camera,
and the color camera. The IR projector uses a non-modulation laser diode to emit near-
infrared laser beam with wavelength of 830 nm and power of 60 mW (OpenKinect, 2011).
The laser beam passes through several layers of diractive optical element (DOE) and
is split into multiple beams (Shpunt and Pesach, 2013; Zalevsky et al., 2013; Freedman
et al., 2014), which produces a projection of a sparkle pattern on a measured object. This
79
approach reduces output power from the scale of milliwatts (mW) to microwatts (W) to
meet laser safety regulation (class 1 laser product). However, the low power sparkle pat-
terns is easily polluted by ambient illumination containing the high power near-infrared
light source with the identical wavelength of 830 nm.
The RGB camera is equipped with the Aptina
R
MT9M112 megapixel color CMOS
digital image sensor and the IR camera is composed of the Aptina
R
MT9M001 megapixel
monochrome CMOS digital image sensor. Both MT9M112 and MT9M001 CMOS digital
image sensors use the electronic rolling shutters to control exposure time, which can
introduce distortion in the color and depth images when capturing dynamic scenes. The
Kinect v1 sensor also has the feature of automatic exposure provided by the CMOS
digital image sensors to compensate backlight. In a low-light condition, the Kinect sensor
will extend the exposure time to receive sucient amount of light. Increasing exposure
time may cause the motion blur problem when acquiring images of moving objects. Fast
relative motion between a Kinect sensor and a measuring object also create motion blur
and rolling shutter distortion together in a image.
Several eld experiments were conducted to evaluate performance of the low-cost
Kinect sensor for image acquisition in outdoor and dynamic environments. The Kinect
sensor was mounted on a vehicle driving at speed from 0 mph to 35 mph (56 km/h) to
scan objects on road surface. The three major problems related to the hardware limita-
tions including sunlight infrared interference, motion blur, and rolling shutter distortion
were discovered in the eld tests. These problems and their corresponding solutions are
presented in this chapter.
80
4.2 Performance Evaluation
4.2.1 Sunlight Infrared Interference
Problem Description
An o-the-shelf Kinect range sensor was originally developed for indoor entertainment
purposes. Its depth sensor emits harmless low-power invisible near-infrared laser dots on
objects. Sunlight containing near-infrared rays of identical wavelength can interfere with
a Kinect's dot pattern to generate invalid depth data. Based on outdoor experiments
conducted by this study, it was found that a Kinect sensor is still able to obtain good
depth images under indirect sunlight (i.e., in shadow), which can be seen in Figure 4.1.
To improve the quality of depth images acquired by a Kinect sensor under sunlight, two
types of sun shades were developed.
(a) (b)
Figure 4.1: Sunlight interference: (a) color image shows shadow regions under strong
sunlight, and (b) only shadow areas can be detected by the Kinect sensor.
81
Solution
Top-Cover Sun Shade: A top-cover sun shade, which minimizes measuring region
exposure to sunlight, was designed and tested (see Figure 4.2a). However, depending on
the direction of the sunlight ray, the shadow area may shift away from the scanning area
as viewed by the Kinect cameras. The sensor systems mounted on the top-cover sun shade
are shown in Figure 5.2a in Section 5.2.
Full-Cover Sun Shade: The full-cover sun shade shown in Figure 4.2b was made to
create an indoor environment for the Kinect sensors, while operating in outdoor envi-
ronments. It is a bottom-open box-like structure to carry Kinect sensors inside the box.
When the box-like sun shade was mounted on a vehicle, there was a space between the
structure and ground to prevent the structure from bumping onto the roadway. The space
was covered by
exible rubber sheets to block sunlight that saturates the scanning area.
(a) (b)
Figure 4.2: Various shading systems used in this study: (a) top-cover sun shade, and (b)
full-cover sun shade.
82
LED lights were installed inside the box to illuminate the scanning area, so that the color
cameras could take adequate pictures. However, it was observed during the eld tests that
there were motion blur problems on the RGB images when the vehicle speed exceeded 25
km/h (15 mph). Because the Kinect uses auto exposure control for its RGB camera, it
extends the exposure time automatically due to insucient LED illumination, and thus
long exposure time causes motion blur.
4.2.2 Motion Blurring
Problem Description
From eld experiments, it was determined that the low luminance triggered a Kinect's
color camera to extend its exposure time, and consequently, introduced motion blurs in
the captured frames. Figure 4.3 shows the dierence between image frames captured of
the same road segment, under high and low luminance, on California State Route 110 in
the Los Angeles region. These images were captured by Kinect sensors mounted on the
top-cover sun shade. The sharper image in Figure 4.3a was taken at noon time under
bright sunshine, while the blurry image in Figure 4.3b was taken in late afternoon before
sunset.
Solution
A stroboscopic
ash photography was used to \freeze" motions in time, by generating a
large amount of light in a very short period. This approach was used to solve the motion-
blur problem for the Kinect color image acquisition. Three LED lights were mounted
inside the full-cover sun-shade platform as shown in Figures 4.4a and 4.4b. The energy-
ecient LED is an ambient light source which does not interfere with the depth sensor of
83
(a) (b)
Figure 4.3: Comparison of two images captured from the same road segment under dif-
ferent lighting conditions: (a) high luminance, (b) low luminance.
a Kinect; consequently, it is a desirable illumination option for this application. Moreover,
LEDs are easily controlled by digital signals. A pulse signal sent from an Arduino board
was used to trigger a transistor to switch the electrical current
owing into the LEDs to
enable strobe lighting. The setup of the strobe controller is shown in Figure 4.4c. The
road test results corresponding to motion speeds less than 35 mph using a Kinect with
strobe light assistance are shown in Figure 4.5. The
ash rate of the strobe light was set
to 60 Hz, and the pulse duration was 1 millisecond. The defects, pavement marking, or
objects on the road surface shown in the color images can be observed when using the
stroboscopic technique described above.
4.2.3 Rolling Shutter Distortion
Problem Description
Both the RGB camera and depth camera of a Kinect sensor consist of rolling shutter
CMOS image sensors. When each frame of a video stream is recorded by a rolling shutter
84
(a) (b) (c)
Figure 4.4: Strobe light setup for pavement data acquisition: (a) the full-cover sun shade
sensor platform, (b) representation of a Kinect, and LED strobe lighting mounted inside
the full-cover sun shade, (c) the scheme of the strobe light controller.
(a) (b)
(c) (d)
Figure 4.5: Road test results for Kinect image acquisition using the strobe lighting: (a)
and (b) were captured at 15 to 25 mph, (c) and (d) were taken at 25 to 35 mph.
85
CMOS sensor, an image is not captured at an instant and stored in the entire pixel
array simultaneously, but rather, is progressively scanned and stored row by row to the
pixel array. The rolling shutter CMOS sensors provide low-noise, low-power, fast data
processing, and an inexpensive solution for most commercial cameras; however, this image
acquisition method creates distortions when shooting moving objects. This phenomenon
is illustrated in Figure 4.6a.
Kinect moving direction
Rolling shutter
Rolling shutter
distortion
Kinect
Square
brick
(a)
Color image
0 mph 10 mph
(b)
Depth image
0 mph 10 mph
(c)
Figure 4.6: Rolling shutter distortion of a camera and image samples: (a) rolling shutter
image acquisition and distortion; (b) the color image of a square brick captured by a
Kinect's color camera at 10 mph (16 km/h) became a parallelogram; (c) the depth image
of a square brick captured by a Kinect's depth camera at 10 mph (17 km/h) became a
parallelogram.
86
To study and solve this problem, an experiment was conducted to scan a square
brick with a Kinect sensor, at dierent vehicular speeds, followed by a rectication of
the distortion using a rolling shutter camera model (Ait-Aider et al., 2006a, 2007). The
moving direction of the Kinect was perpendicular to the scanning direction of the rolling
shutter; consequently, the color and depth image of the square brick would deform into
the shape of a parallelogram (see Figures 4.6b and 4.6c). As the speed of motion increases,
the acute angle of the acquired parallelogram decreases. The plot of the relation between
speed and the resulting acute angle is shown in Figure 4.7. The measurements for acute
angles of distorted depth images regarding dierent speeds are listed in Table 4.1.
Solution
The algorithm used in this study to rectify the rolling shutter distortion is derived from
a classical pinhole camera model described in Section 2.4.1. The pinhole camera model is
also considered as a global shutter camera model, which exposes all pixels simultaneously
to project 3D points on a 2D image plane.
Global Shutter Camera Model: Considering the global shutter (GS) camera model
shown in Equation (4.1) and its matrix form is shown in Equation (4.2).
sm
GS
= K [Rjt] P
w
(4.1)
87
Figure 4.7: Relation between rolling shutter distortions for depth images and motion
speeds according to Table 4.1.
Table 4.1: Distortion and rectication of depth images for dierent speeds
Speed Distortion Speed Distortion
0 mph
20 mph
10 mph 25 mph
15 mph 30 mph
88
s
2
6
6
6
6
6
4
u
GS
v
GS
1
3
7
7
7
7
7
5
=
2
6
6
6
6
6
4
f
x
0 c
x
0 f
y
c
y
0 0 1
3
7
7
7
7
7
5
2
6
6
6
6
6
4
r
11
r
12
r
13
t
1
r
21
r
22
r
23
t
2
r
31
r
32
r
33
t
3
3
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
4
x
w
y
w
z
w
1
3
7
7
7
7
7
7
7
7
7
5
(4.2)
Note that m
GS
= [u
GS
;v
GS
; 1]
T
are the homogeneous 2D pixel coordinates for a global
shutter (GS) camera; P
w
= [x
w
;y
w
;z
w
; 1]
T
are the homogeneous 3D world coordinates
of an object; R is the rotation matrix; t is the translation matrix; [Rjt] represents the
matrix of extrinsic parameters, and K is the matrix of intrinsic parameters. The in-
trinsic parameters include focal length (f
x
,f
y
) and principal point (c
x
,c
y
), which can be
obtained through a camera calibration procedure using the Camera Calibration Toolbox
for MATLAB (Bouguet, 2015).
Rolling Shutter Camera Model: The simplied algorithm to recitify rolling shutter
distortion adopts the uniform rolling shutter camera model (Ait-Aider et al., 2006b). For
the uniform rolling shutter camera model, the speed of the camera is assumed constant
during a short image acquisition time interval, which is applicable to the case of this
study. Since a rolling shutter CMOS sensor scans an entire scene row by row rapidly, the
time delay between two successive rows can be calculated with = 1=(fpsv
max
), where
fps (frame per second) is the frame rate and v
max
is the image height in pixels. When
the CMOS sensor scan to the ith-row, the cumulative time delay from the rst row to
89
the ith-row is
i
=v
i
, in which the projection of 3D point P
i
w
onto the 2D image plane
of a rolling shutter (RS) camera m
i
RS
can be expressed as the following equation:
sm
i
RS
= K [R
i
Rjt +t
i
] P
i
w
i = 1; 2;:::;v
max
(4.3)
where the rotation matrix R and the translation matrix t are instantaneous camera pose
at time
0
. During this acquisition interval, the rotation motion with instantaneous an-
gular velocity
around an instantaneous axis of unit vector a can be obtained from the
Rodrigues formula shown in Equation (4.4), and the translation motion with instanta-
neous linear velocity V can be dened in Equation (4.5).
R
i
= aa
T
(1 cos (v
i
) + I cos (v
i
) + ^ a sin (v
i
)) (4.4)
t
i
=v
i
V (4.5)
Rolling Shutter Camera Model for Pure Translation: When a Kinect sensor
is used to scan pavement distress, the instantaneous kinematics of the camera can be
assumed to be the pure translation motion. Hence, the rolling shutter camera model
can be simplied by replacing the rotation term with an identity matrix for the pure
90
translation case. Then, the rolling shutter camera model for the pure translation motion
is expressed in the following equation:
sm
i
RS
= K [Ijt +t
i
] P
i
w
i = 1; 2;:::;v
max
(4.6)
The matrix form of Equation (4.6) can be written as Equation (4.7) to represent the
row-by-row projection of a real-world 3D scene into a 2D image frame acquired by the
moving rolling shutter camera with linear velocity vector [V
x
;V
y
;V
z
]
T
. After composing
each row of pixels, it will generate a distorted 2D image (u
RS
;v
RS
).
s
2
6
6
6
6
6
4
u
i
RS
v
i
RS
1
3
7
7
7
7
7
5
=
2
6
6
6
6
6
4
f
x
0 c
x
0 f
y
c
y
0 0 0
3
7
7
7
7
7
5
2
6
6
6
6
6
4
1 0 0 t
x
+v
i
V
x
0 1 0 t
y
+v
i
V
y
0 0 1 t
z
+v
i
V
z
3
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
4
x
i
w
y
i
w
z
i
w
1
3
7
7
7
7
7
7
7
7
7
5
i = 1; 2;:::;v
max
(4.7)
91
Rolling Shutter Rectication for Pure Translation: The mathematical expression
of row-by-row projection for the rolling shutter camera model to transform coordinates
from the 3D camera system to the 2D pixel system is shown in the following equation:
s
2
6
6
6
6
6
4
u
i
RS
v
i
RS
1
3
7
7
7
7
7
5
=
2
6
6
6
6
6
4
f
x
0 c
x
0 f
y
c
y
0 0 0
3
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
4
~ x
i
c
~ y
i
c
~ z
i
c
1
3
7
7
7
7
7
7
7
7
7
5
i = 1; 2;:::;v
max
(4.8)
The 3D reconstruction based on Equation (4.8) can be represented as following:
~ x
c
=
(u
RS
c
x
) ~ z
c
f
x
~ y
c
=
(v
RS
c
y
) ~ z
c
f
y
(4.9)
~ z
c
=Z
D
where Z
D
is the range measurements obtained by the Kinect sensor. However, the 3D
scene (~ x
c
; ~ y
c
; ~ z
c
) is also deformed because it is generated by the distorted 2D image
(u
RS
;v
RS
). Comparing Equation (4.7) and Equation (4.8) can obtain the conversion
between the distorted 3D scene (~ x
i
c
; ~ y
i
c
; ~ z
i
c
) respected to the camera coordinate system
92
and the undistorted 3D scene (x
i
w
;y
i
w
;z
i
w
) respected to the world coordinate system. The
conversion between the two 3D coordinates is shown in the following equations:
~ x
i
c
=x
i
w
+v
i
V
x
+t
x
~ y
i
c
=y
i
w
+v
i
V
y
+t
y
i = 1; 2;:::;v
max
(4.10)
~ z
i
c
=z
i
w
+v
i
V
z
+t
z
Let (x
i
c
;y
i
c
;z
i
c
) are the coordinates of the undistorted 3D scene respected to the camera
coordinate system, which are acquired by the static rolling shutter camera, that is,V
x
= 0,
V
y
= 0, andV
z
= 0; therefore, the conversion between the undistorted 3D scene (x
i
c
;y
i
c
;z
i
c
)
respected to the camera coordinate system and the undistorted 3D scene (x
i
w
;y
i
w
;z
i
w
)
respected to the world coordinate system can be represented by the following equations:
x
i
c
=x
i
w
+t
x
y
i
c
=y
i
w
+t
y
i = 1; 2;:::;v
max
(4.11)
z
i
c
=z
i
w
+t
z
93
Comparing Equation (4.10) and Equation (4.11) can result in the relation between the
undistorted 3D scene (x
i
c
;y
i
c
;z
i
c
) and the distorted 3D scene (~ x
i
c
; ~ y
i
c
; ~ z
i
c
):
x
i
c
= ~ x
i
c
v
i
V
x
y
i
c
= ~ y
i
c
v
i
V
y
i = 1; 2;:::;v
max
(4.12)
z
i
c
= ~ z
i
c
v
i
V
z
The undistorted 3D coordinates (x
c
;y
c
;z
c
) can be used to rectify the 2D pixel coordinates
(u
RC
;v
RC
) with the following formula:
u
RC
=f
x
x
c
z
c
+c
x
v
RC
=f
y
y
c
z
c
+c
y
(4.13)
z
c
=Z
D
Finally, the distorted color and depth images can be reshaped using an interpolation
technique according to the rectied pixel coordinates (u
RC
;v
RC
).
Results of Rolling Shutter Rectication: A Kinect sensor was mounted inside the
full-cover sun shade carried by a vehicle for the experiment to study the rolling shutter
problem. The vehicle was driven in a straight line across the test object | a square
brick, at dierent testing speeds; therefore, the assumption made for the motion of the
Kinect is a one-direction pure translation, and there is no consideration of rotations.
94
Table 4.2: Distortion and rectication of depth images for dierent speeds
Speed Distortion Rectication
0 mph
10 mph
15 mph
20 mph
25 mph
30 mph
95
The vehicle speed measured by a GPS was used to generate the undistorted 3D scene
expressed in Equation (4.12). After applying the rectication algorithm on the distorted
depth images that were acquired by a Kinect sensor, the results displayed in Table 4.2
were obtained. The results demonstrate that the algorithm under discussion was able to
restore the distortion in this test.
4.3 Summary
This chapter reported three major challenges, including sunlight interference, motion blur,
and rolling shutter distortion, of using the Kinect v1 sensor to acquire images in outdoor
and dynamic environments. These problems were carefully evaluated in this research
using a vehicle-mounted Kinect sensor to scan objects on road surface. To overcome
these challenges due to the sensor's hardware limitations, the corresponding solutions
using the auxiliary components or a software technique to minimize the in
uences of
these problems were also described in this chapter, which are summarized below:
Sunlight interference: According to outdoor testing results, a Kinect's depth sensor
can be interfered by strong sunlight re
ected by road surface, especially at noon on
a sunny day. But a Kinect sensor was still able to generate depth images on a cloudy
day or in a shadow area. To block strong sunlight shining into the scanning area, a
top-cover sun-shade and a full-cover sun-shade were designed and tested. Using the
full-cover sun-shade can minimize sunlight interference and allow the capture of a
good depth image; however, insucient illumination inside the full-cover sun-shade
causes the motion blur problem.
96
Motion blur: A stroboscopic technique is commonly used to \freeze" fast motions
in time. This approach was adopted to solve the motion-blur problem of Kinect
image acquisition. Energy-ecient LED lights is a suitable illumination source for
this application since they will not interfere with the depth sensor. The LED was
ashed by a pulse signal generated from an Arduino board. The quality of the color
images was improved when using the stroboscopic technique.
Rolling shutter distortion: Both the color and depth cameras of a Kinect sensor
consist of rolling-shutter CMOS image sensors. Such sensors provide low-noise, low-
power, fast data processing, and inexpensive solution for most commercial digital
cameras; however, this image acquisition method causes signicant distortions when
capturing moving objects. This study investigated the mathematical relation be-
tween shape distortion of an object in an image and its motion speed, and used a
rectication algorithm to correct the distorted images.
The three issues due to the Kinect sensor's hardware characteristics are very common
in many Kinect-like RGB-D cameras which utilize the low-power near-infrared light source
and the CMOS sensor to obtain color and depth images. Hence, this chapter provides
a comprehensive understanding of using an inexpensive o-the-shelf RGB-D camera for
realistic applications.
97
Chapter 5
Integrating Multiple Inexpensive Sensors and
Fusing Heterogeneous Data for a Vision-Based
Automated Pavement Condition Survey
5.1 Overview
A large structure system requires periodical inspection to ensure its integrity in time.
Conventional human visual inspection is the most common approach to identify
aws with
the naked eye to look over external surface of a structure. Even though this approach
is widely adopted in civil engineering and aerospace industry, human visual inspection is
still a laborious, time consuming, and risky job; hence, the inspection tasks are usually
scheduled once every few years. However defects observed at discrete points in time is
actually evolve continuously, signicant deterioration or functional failure may happen
during the interval between two inspections (Agdas et al., 2015). Many engineers have
started using the autonomous systems to make the vision-based inspection tasks more
ecient. One of the biggest growth area is automated surveys for condition assessment
of road surfaces. Because the United States has the world's largest road network of total
98
4.1 million miles (CIA, 2017; USDOT, 2017), which includes paved road of 2.7 million
miles and unpaved road of 1.4 million miles, automated technologies employing digital
image processing to identify and quantify pavement distress have been developed and
commercially used for almost three decades (Wang and Gong, 2002; Wang and Smadi,
2011). Pavement image data collection technologies can be classied into the 2D intensity-
based and 3D laser-based approaches. The 2D-intensity-based method is used by most
current systems but is sensitive to dierent lighting conditions (e.g., shadow, low contrast,
etc.) and needs articial lighting to create uniform illumination on the road surface (Tsai
and Li, 2012). Some pavement deterioration such as pothole and pavement deformation
(rutting and shoving) are dicult to be identied and/or even quantied in the 2D images
(Hou et al., 2007). Therefore, using the 3D laser-based technologies to acquire the 3D
road proles is gradually adopted by the recently-developed systems (Chang et al., 2005;
Li et al., 2009; Ouyang and Xu, 2013; Serigos et al., 2016).
In general, an automated pavement condition survey system needs to integrate mul-
tiple sensors to record and fuse dierent types of data in order to produce more useful
information. For example, it is very important to acquire pavement distress images accom-
panied with location data for easily returning to the same site to perform further inspec-
tion or maintenance. Common sensing devices used for pavement data collection contain
imaging sensors, range sensors, inertial sensors, and positioning systems (McGhee, 2004).
Imaging sensors including line-scan and area-scan cameras can generate high resolution
2D images of road surfaces to detect pavement distresses using digital image processing
techniques (Huang and Xu, 2006; Koch et al., 2012). Range sensors can acquire the 3D
road prole which is not only for detecting pavement distress (Laurent et al., 2012) but
also for measuring pavement roughness (Chang et al., 2005). However conventional range
99
sensors are bulky and expensive, some researches used newly-developed inexpensive depth
sensors to detect and quantify large-size pavement distress (e.g., pothole). For example,
Jahanshahi et al. (2012) used a Kinect v1 sensor (cost under $200) to detect and quan-
tify potholes and large cracks. Yuan and Cai (2014) demonstrated a pothole detection
algorithm using SwissRanger SR4000 (cost under $5000). The road prole and roughness
can also be measured by the vehicle-mounted accelerometers (Harris et al., 2010). Some
studies used the low-cost smartphone-based accelerometers to assess road quality (Tai
et al., 2010; Islam et al., 2014).
Commercial systems are very expensive, whether purchase systems from vendors or
outsource data collection and processing jobs to contractors, that will annually cost gov-
ernment agencies over $1 million (Vavrik et al., 2013). Several eorts had been made to
develop the cost-eective multi-sensory systems for an automatic pavement survey, which
include the Digital Highway Data Vehicle (DHDV) developed by University of Arkansas
(Wang and Smadi, 2011), the Multi-Purpose Survey Vehicle (MPSV) used by Florida
Department of Transportation (Mraz and Nazef, 2008), the Versatile Onboard Trac
Embedded Roaming Sensors (VOTERS) project developed by Northeastern University
(Wang et al., 2015), and the StreetScan vehicle (Vines-Cavanaugh et al., 2016). The
StreetScan is the commercialization of the VOTERS project, which aims to lower the
cost of the system to be under $400,000. Compared to the above-mentioned survey vehi-
cles, the inexpensive system proposed by this research using the low-cost Kinect sensors
can even reduce the price ot the system to be under $10,000. Additionally the Kinect
sensor is a light-weight and compact-size device; hence, the proposed system was designed
to be easily mounted on a vehicle to make frequent road image data collection possible.
100
Other than the Kinect sensors, this system also contained accelerometers and GPS. Sev-
eral techniques utilized to incorporate all these sensors into an integrated system are also
reported in this chapter.
5.2 System Design
5.2.1 Sensor Components
RGB-D Camera
A Kinect RGB-D camera itself is an inexpensive multi-sensor system, which includes
an RGB camera to capture color images and a depth sensor to acquire full-eld range
measurements. A Kinect senor also contains a 3-axis accelerometer to read acceleration
in 2g range and a 4-channel linear microphone array to record audio and identify location
and direction of sound wave. In this study, only the RGB camera and the depth sensor
of a Kinect device were used for the reason of reducing high-volume data storage and
processing. The color images are mainly utilized to visualize road surface conditions and
the depth images can help detecting road surface distress (Jahanshahi et al., 2012).
Positioning System (GPS)
A GPS sensor was added to the data collection system to record the location of acquired
images and the speed of the vehicle. A consumer-grade, USB (universal serial bus) sup-
ported, conventional GPS (Garmin 18x) was used in this study. The Garmin 18x is a
high-sensitivity, 12-parallel-channel, WAAS-enabled GPS receiver, which has acquisition
times of 1 minute and the update rate of 1 record per second (1Hz).
101
Accelerometers and Data Acquisition Module
Accelerometers play an important role in allowing the alignment of the sensor plat-
form, calculating the vertical motion of a moving body (e.g., car), computing the three-
dimensional translation and twist of the vehicle chassis, which in turn provide three-
dimensional kinematics of the cameras. Six signal conditioned, single-axis accelerometers
(ICSensors Model 3145) with a dynamic range of10g were used to acquire accelera-
tion measurements. These accelerometers were connected to analog inputs of a National
Instrument Data Acquisition module (NI USB-6008), which was connected to a host
computer via the USB interface, and this module was controlled by a LabVIEW Virtural
Instrument.
5.2.2 The Prototype
A schematic diagram illustrating an overview of the vision-based multi-sensor approach
used in this study for road surface data collection is shown in Figure 5.1. The diagram
brie
y introduces the hardware integration and software architecture for the system. De-
tailed information for the prototype is provided in the following sections.
Hardware
The inexpensive vision-based data collection system used in this study for road surface
condition assessment included four Microsoft Kinects, which capture RGB and depth
videos. If the Kinects are loacted 800 mm (2.6 feet) above the roadway, the arrangement
of four Kinects (in 2-by-2 array) could cover an area of 1.52 m1.21 m (5 ft 4 ft). They
can scan a standard roadway with a vehicle speed up to 97 km/h (60 mph) without any
102
gap between two consecutive frames, which is determined based on the eld of view and
the maximum frame rate (30 fps) of a Kinect sensor.
A Kinect sensor requires high USB throughput; at least 50% of the USB bandwidth
should be reserved for data transmission of a Kinect sensor. Hence, it is impractical to
connect more than one Kinect to a single USB host controller. To control multiple Kinects
with one computer, it was decided to add additional USB cards to dedicate an individual
USB host controller for each Kinect (Webb and Ashley, 2012). The four Kinect sensors
were connected to a desktop computer, which provided four dedicated PCI-E USB cards
for each Kinect, in order to maintain a suitable data acquisition bandwidth.
A high-performance computer used in this study. It consisted of an AMD FX-8350
eight-core processor and 16GB memory. A 250GB shock-resistant solid state hard drive
(SSD) was used to manage and archive the large amount of captured data when the
RGB-D Cameras
LabVIEW
Virtual
Instrument
Kinect Program
GPS Program
GPS
NI USB
DAQ 6008
IMU
USB Card
Location,
Velocity
Acceleration
RGB, Depth
Trigger
Trigger
Software
Hardware Componens
Computer
USB
USB
USB
Timestamp
Timestamp
Timestamp
LED Cue Mark
Figure 5.1: Overview of the 3D scanning system for pavement inspection. Box components
are the software modules, and the right-hand components are the hardware outputs.
103
vehicular platform was moving. A 7-inch LCD touchscreen monitor was employed to
make the data acquisition modules easy to control. The data acquisition system was
also equipped with a GPS to record the location information and six accelerometers to
measure road surface prole and roughness. An LED light was placed in front of one of
the Kinect sensors and controlled by the NI USB-6008 DAQ module to indicate that a
data acquisition procedure starts operating.
The automated data collection system required mobile power supplies to perform data
acquisition tasks when the vehicular platform was moving. The computer needed about
200-watt of power to run the system, and about 300-watt to start up; consequently, it was
connected to a 12-volt automotive battery through a 800-watt DC-AC inverter. Other
devices such as Kinects and accelerometers consumed power from a 12-volt vehicle's outlet
through a 400-watt DC-AC inverter. This power scheme can provide steady electricity
for the operation of the automated data collection system for about two hours. Figure 5.2
shows the various components of the data acquisition system.
Software
For early prototyping, there were three data acquisition software modules, respectively de-
veloped to operate the Kinect sensors, the Garmin 18x GPS sensor, and the NI USB DAQ
module to record acceleration. The Kinect's data acquisition software was programmed
in Microsoft Visual C++ and used the OpenNI library to record color and depth videos
simultaneously, with the resolution of 640480 pixels at the frame rate of 30 fps. The
GPS data acquisition software was developed using C/C++ with the Garmin's software
development kit. The NI USB DAQ module was controlled by the LabVIEW Virtual
Instrument. To manage three dierent programs and make the data acquisition software
104
(a) (b)
(c)
Figure 5.2: Components of the data acquisition system: (a) RGB-D cameras combined
with accelerometers and USB data acquisition module, (b) a 7-inch LCD touchscreen
monitor for displaying and handling graphics, (c) a high performance computer and its
power supply.
105
easy to operate, the three independent data acquisition processes were integrated via the
LabVIEW's programming environment.
5.3 Multi-Sensor Integration and Fusion
5.3.1 Multiple Kinects
Two issues should be addressed when using multiple Kinects: 1) the high USB band-
width requirement of a Kinect sensor as described in Section 5.2.2; and 2) the problem
of infrared interference due to overlapping infrared dot patterns from multiple Kinects
pointing at the same area. The overlapping infrared dots will confuse the Kinects to
generate invalid depth values, or \holes", in the depth images. Several approaches have
been proposed to reduce multiple Kinect interference, including time multiplexing (Berger
et al., 2011), shake'n'sense (Maimone and Fuchs, 2012; Butler et al., 2012), hole lling
algorithm (Maimone and Fuchs, 2011), and avoidance by arrangement (Tong et al., 2012;
Caon et al., 2011; Ahmed and Junejo, 2014). Using time multiplexing will decrease frame
rates, and using shake'n'sense will blur color images. For this study, the optimal solution
is to arrange the position of multiple Kinects. Such an arrangement can minimize the
overlapping regions among multiple infrared dot patterns and reduce the interference from
multiple Kinect depth sensors.
5.3.2 Data Synchronization
To fuse data properly from multiple sensors, time synchronization is necessary to establish
consistency among heterogeneous data. However, it is impossible to perform hardware
synchronization for the multiple Kinects, the Garmin 18x GPS, and the NI USB-6008
106
DAQ module through USB interface triggering. Consequently, the data synchronization
strategy is based on individual timestamps (as shown in Figure 5.1) from each sensor
recorded by the corresponding data acquisition program.
A heuristic approach based on the external cue marks generated using an LED light
was developed to solve the synchronization problem between the Kinect cameras and the
accelerometers. The LED light was wired to the USB DAQ module (NI USB-6008) and
triggered by a digital pulse when the DAQ module began to acquire acceleration data.
During the overall data acquisition process, the Kinect data acquisition procedure was
started rst and, after a delay of several milliseconds, the DAQ module with an accurately
synchronized LED digital pulse was triggered (see Figure 5.3). Therefore, the rst color
frame which captured the rst LED light was detected to indicate the starting point of
the synchronization. Subsequent image frames were corrected based on the timestamp
osets with respect to the timestamps in the starting frames. In other words, the image
frames were aligned in time with respect to the rst image frame and the rst sample of
the DAQ module.
Figure 5.3 shows that the LED ON status may not be acquired immediately by the
Kinect sensor. Since the sampling period of the Kinect sensor is 1/30 seconds, using
the heuristic approach could reduce the time delay (t) of at most 1/30 seconds be-
tween measurements acquired by the Kinect sensors and the accelerometers, which was
feasible for the coarse data alignment. The ne-tuning data alignment procedure using
digital signal processing approaches is required to synchronize Kinect and accelerometer
measurements after the coarse alignment. The cross-correlation technique mentioned in
Section 3.2.3 was used to perform the ne-tuning data alignment. To enhance the accu-
racy of the alignment procedure using the cross-correlation approach, before aligning the
107
Figure 5.3: Data synchronization uses a heuristic approach based on the external LED
cue marks.
Kinect and accelerometer measurements, the two signals were resampled at a higher rate
(i.e., 1000 Hz) with spline interpolation, which also smoothed the peaks to obtain better
alignment results. The developed heuristic approach for multi-sensor synchronization was
successfully implemented in a laboratory environment.
Figure 5.4 shows the synchronization result of a random displacement acquired by a
Kinect sensor and an accelerometer An LVDT measurement is utilized as the ground truth.
The random vibration signal was generated by a shaker in a lab environment. A double
integration procedure was performed to compute the displacements from acceleration data
acquired by the accelerometers.
In addition, the depth data, captured from California State Highway SR-110 with traf-
c
ow, were compared to the displacement computed from the accelerometers. Figure 5.5
shows the road prole with respect to the sensor platform obtained from the accelerome-
ters and the Kinect readings. The average of the depth values over a designated area was
used as the depth values obtained from the Kinect array (i.e., four Kinects). As seen in
108
Figure 5.4: Laboratory results of data alignment for Kinect, accelerometer, and LVDT
(ground truth).
Figure 5.5: Displacement measurements obtained from Kinect and accelerometers for the
eld experiment on California Highway SR-110.
109
Figure 5.5, the Kinect depth data are closely correlated with the displacements obtained
from the accelerometers.
5.3.3 Multi-Image Stitching
After collecting the individual depth and color frames at a given time, these frames could
be stitched together to create a larger image of the road surface for further damage
assessment processing. The image frames captured by the Kinect sensors were tagged
with GPS data (latitude, longitude, and speed) using shared timestamps generated by
the operating system. Before applying an image mosaic method to reconstruct scene of
road surface, the relationship between overlapping regions (i.e., at least 10%) of a sequence
of images and the vehicle speed was investigated. Figure 5.6 shows the overlapping region
on two images acquired by a Kinect at speeds of 48 km/h (30 mph) and 80 km/h (50 mph),
respectively, where the data collection rate was 30 fps. Figure 5.7 shows the relationship
between consecutive frame overlaps and various speeds. Based on this gure, a sequence
of image frames still had about 30% overlap when the motion speed reached 80 km/h (50
mph), which makes scene reconstruction possible by using feature-based image stitching
algorithms.
This research adopted a combined approach of Harris corner detection and SIFT de-
scriptor to perform reliable image stitching of road surface images without consuming too
much computational time (Azad et al., 2009). To implement the combined method, rst,
the corner features were detected by the Harris detector, and then, SIFT descriptor was
only applied on those features extracted by the Harris detector. The feature vectors be-
tween each pair of images were matched based on the nearest neighbors algorithm (Lowe,
2004), and using RANSAC to remove outliers. Next, the homography transformation
110
(a) (b)
Figure 5.6: Overlapping areas of two stitching color images under (a) 50% for 48 km/h
(30 mph) and (b) 30% for 80 km/h (50 mph).
Figure 5.7: Overlap percentages of the sequential color frames under various vehicle
speeds.
111
(a)
(b)
(c)
Figure 5.8: Multiple image stitching (a) images were taken on California State Route
SR-110 at speed of 80 km/h (50 mph), (b) results of image stitching, and (c) the location
(34
6'37.7"N 118
11'5.4"W) of the road segment is shown on Google Street View.
112
matrices between two images were estimated according to these matched features. The
multiple images (Figure 5.8a) were warped and overlapped using homography transfor-
mation to obtain a large view of road segments. Figure 5.8b shows the stitching results
of six images captured by a Kinect sensor at speed of 80 km/h (50 mph) on California
State Route SR-110. The location (340
6'37.7"N 118
11'5.4"W) of the road segment is
displayed on Google Street View (Figure 5.8c).
5.4 Field Test Results
5.4.1 Field Testing Routes
Several eld tests were performed to evaluate the capabilities of the automatic data collec-
tion system. The majority of the tests were conducted, at a low speed (5 mph to 35 mph),
close to the vicinity of the University of Southern California (USC) campus in downtown
Los Angeles, California. The rest were close to the City of Pasadena, California (a suburb
of Los Angeles), at a normal highway speed. Figure 5.9 shows the data collection route
and the corresponding GPS-measured speed from 0 to 60 mph for one of the tests, where
the vehicle was driven along California State Route 110 (SR 110). The complete road
trip was 2.4 miles, as obtained from the Garmin 18x USB GPS system.
5.4.2 Defect Detection and Quantication
In this study, the multi-sensor system was used to capture the 2D and 3D pavement
surface information. The data interpretation modules developed by the collaborative
project (Jahanshahi et al., 2012) within the same research group was utilized to process
acquired depth data from the eld tests conducted by this project (Chen et al., 2016).
113
Figure 5.10 summarizes the step-by-step procedure of employing the data interpretation
modules to detect a pothole from the depth image obtained by the proposed system and
quantify its length, width, depth, area, and volume. First, the road surface is estimated
by tting a plane to the points in the depth data (Figure 5.10(b)) that are within 665
mm from the depth sensor (see Figure 5.10(c)). Next, the estimated plane is subtracted
from the depth values to provide the relative depth (Figure 5.10(d)). Relative depth
values represent the estimated depth values with respect to the road surface (not the
sensor). Using Otsu's method (Otsu, 1979), the defect depth threshold is computed using
the histogram of the relative depth values (Figure 5.10(e)). Finally, the colormap of
the defective region is generated using this threshold (Figure 5.10(f)). Several sample
test results of defect detection and depth quantication where the color images and the
Figure 5.9: An example of eld tests: data collection route with dierent vehicle speeds
on California State Route SR-110.
114
corresponding depth data were collected by the proposed system with trac
ow and
processed by the data interpretation modules are shown in Figure 5.11. In this gure, the
colormaps indicate the quantied defective depths.
5.4.3 Data Quality
In order to evaluate the performance of the system, a set of eld tests was conducted to
quantify the defect by repeatedly scanning a pothole on a road segment. The images (color
and depth) of the pothole were taken while the vehicle was stationary and moving at a
low speed of 15-20 mph (25-32 km/h). In addition, the 3D point clouds generated by the
Kinect data were compared to the 3D point clouds captured by a high-accuracy RIEGL
VZ-400 LIDAR (Figure 5.12a). The point-to-point distance of the pothole is about 480
mm measured by a tape ruler (Figure 5.12b). The corresponding point-to-point distance
for the 3D model of the pothole acquired by the VZ-400 LIDAR and rendered by the
CloudCompare software (Girardeau-Montaut, 2017) is about 481 mm, which indicates
good measurement accuracy when compared to the direct tape measurement.
Figure 5.13 (a) and (b) display the depth maps taken by the stationary Kinect and
the moving Kinect at 15 mph (25 km/h). Obvious shape distortion of the pothole on
the depth images captured by the moving Kinect were noted. The image distortion
was veried due to the rolling shutter distortion of the CMOS sensors used in the depth
camera. Table 5.1 summarizes the quantied parameters and shows the eect of motion on
the quantication process. Figure 5.13 (c) and (d) show the cloud-to-cloud dierences (in
millimeter) with color maps (blue is small, green is moderate, and red is large) between two
point clouds generated by the high-accuracy VZ-400 LIDAR and the inexpensive Kinect
(under stationary and moving conditions). The error analysis was performed using the
115
600
450
X (mm)
300
150
0 0
150
Y (mm)
300
450
-650
-700
Z (mm)
(a) (b)
600
450
X (mm)
300
150
0 0
150
Y (mm)
300
450
-650
-700
Z (mm)
600
450
X (mm)
300
150
0 0
150
Y (mm)
300
450
0
-20
-40
Z (mm)
(c) (d)
(e) (f)
Figure 5.10: Application of defect detection approach proposed by Jahanshahi et al.
(2012) to sample pavements acquired via Kinect system: (a) a pothole image; (b) the
corresponding depth data; (c) estimated road surface plane; (d) relative depth obtained
from subtracting the road surface plane from the depth values; (e) histogram of relative
depth values and the defective depth threshold (the red dash line); and (f) the depth
colormap of the detected defect.
116
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Figure 5.11: Defect detection and depth quantication: (a), (c), (e) and (g) are the images
of the defects, and (b), (d), (f) and (h) are the corresponding depth maps, respectively.
The colormaps indicate the quantied defective depths.
117
open source CloudCompare software (Girardeau-Montaut, 2017) by aligning two point
clouds with the Iterative Closest Point (ICP) algorithm. Figure 5.13 (c) reveals that the
size of the pothole obtained by the stationary Kinect is slightly smaller than the pothole
acquired by the LIDAR. Figure 5.13 (d) displays that the size of the pothole is shrinking
obviously in the longitudinal direction when a moving Kinect is used. Figure 5.13 (e)
(a)
(b)
(c)
Figure 5.12: (a) The RIEGL VZ-400 LIDAR and the pothole, (b) the length of the pothole
is about 480 mm measured by a tape ruler, and (c) the length for the 3D model of the
pothole captured by the LIDAR and rendered by the CloudCompare software is about
481 mm.
118
(a) (b)
(c) (d)
(e) (f)
Figure 5.13: Eect of speed on defect quantication: (a) depth map acquired by the
stationary Kinect; and (b) depth map captured by the moving Kinect at 25 km/h (15
mph); (c) cloud-to-cloud (C2C) errors (mm) between two point clouds obtained by the
VZ-400 LIDAR and the stationary Kinect; (d) C2C errors between two point clouds taken
by the LIDAR and the moving Kinect; (e) the distribution of C2C errors in (c), and (f)
the distribution of C2C errors in (d).
119
displays the distribution of the cloud-to-cloud dierence between the LIDAR and the
stationary Kinect acquisition. The maximum error is 16.35 mm, the mean error is 3.06
mm, and 90% of population is less than 5.4 mm. Figure 5.13 (f) shows the distribution of
the cloud-to-cloud dierence between data acquired by the LIDAR and the moving Kinect.
The maximum error is 23.01 mm, the mean error is 4.08 mm, and 90% of population is
less than 8.7 mm.
Table 5.1: Quantied characteristics of defective regions in Figure 5.13
Kinect Area Volume Length Width Max Depth Mean Depth
Sensing (mm
2
) (mm
3
) (mm) (mm) (mm) (mm)
Static 1164.9 1619.1 396.7 387.7 24.9 13.9
Mobile 887.4 1000.9 412.7 233.0 20.8 11.3
5.5 Summary
This chapter presents the relatively inexpensive vision-based approach using the o-the-
shelf Microsoft Kinect v1 sensor, which costs under $200, to collect color and depth
images of roadway surface. This study also resulted in the building of a compact-size
pavement condition survey system which can be easily mounted on a car, and be able to
collect data with trac
ow. The system was designed to compose the images captured
by multiple Kinect sensors to cover a lane width and operate within the design scanning
speed limitation. The data collection system also included 3-axis accelerometers to record
road proles and orientations of the system, and a GPS to obtain location and velocity.
A heuristic approach using an LED light as an external cue marks was developed to
synchronize data from Kinects, accelerometers, and GPS. Table 5.2 summarizes the cost
of the current hardware conguration.
120
Table 5.2: Hardware cost estimation
Devices/Materials Cost
Microsoft Kinects v1 (4 $100) $400
Accelerometers (6 $400) $2,400
Garmin 18x USB GPS $100
NI USB DAQ 6008 $200
Desktop PC $2,000
Power (Battery & DC-AC Converters) $400
Mounting Structures $300
Other $100
Total $5,900
Several eld tests were performed on local streets and a state highway in the Los
Angeles metropolitan area. From the eld tests, color and depth images of road segments
were acquired and then processed to generate 3D point clouds for further analysis. The
testing results showed that the proposed pavement condition survey system has lower
3D point cloud resolution compared to the high-accuracy LIDAR. But it is sucient to
detect and quantify potholes and large cracks when the vehicle speed is less than 30
mph (48 km/h), the residential speed limit in most states in the United States. This
inexpensive system can be improved by upgrading several hardware components. The
recommendations are described as follows: 1) integrating Kinect sensors with Inertial
Measurement Units (IMU) to correct rolling shutter distortion (Karpenko et al., 2011),
and facilitate the accurate stitching of multiple images (Yang et al., 2011); 2) using a
GPS-Aided Inertial Navigation System (GPS/INS), which is composed of an IMU, a GPS
receiver, and a Kalman lter algorithm, to provide location data with higher bandwidth,
even when the GPS signal is not available; and 3) adding industrial-grade cameras to
acquire high quality color images for detection of thin cracks. The system update may
121
increase $5,000 to the current cost. But compared to most commercial automated systems,
the proposed approach still has the price advantage.
122
Chapter 6
Conclusion
This research focused on a comprehensive evaluation of using the newly developed cost-
eective 3D sensor { the RGB-D camera (e.g., the Microsoft Kinect) { to quantify the
evolving 3D deformation eld for a continuous structural system. Measuring full-eld 3D
dynamic displacements is an important task in many engineering and scientic applica-
tions. Such an eort is essential to study the fundamental characteristics of a structure,
assess its structural conditions under applied loads, monitor structural integrity, etc. Yet
it is quite challenging to monitor eciently a 3D dynamic displacement eld with a low
cost. The conventional and state-of-the-art displacement measurement technologies can
be classied into contact-based and non-contact-based methods. After reviewing these
methods, this research learned that the sensors either suer from fundamental limitations
based on their operation principles and capability to track the motion of few discrete
points on the monitored structure, and/or are an order-of-magnitude more expensive
than the representative RGB-D camera. Compared to these sensors, the RGB-D camera
has several attractive features including: 1) measuring displacement using the non-contact
technique; 2) obtaining 3D measurement rapidly; 3) having the capability of acquiring
full-eld data; 4) aordable price, lightweight and compact size, and easy to operate.
123
These characteristics make the RGB-D camera a very promising sensing approach to
quantify evolving 3D deformation eld.
This research evaluated several o-the-shelf inexpensive RGB-D cameras (see Ta-
ble 2.1) at the beginning phase and selected the Microsoft Kinect v1 as the targeted
sensor because it has an optimal combination of image resolution and the frame rate,
better hardware and software supports, and a relatively low price. The noise character-
istics of the Kinect sensor in the operating environments, the material properties of the
measuring objects, the imaging geometry, and the systemic bias were studied through
reviewing related published papers (Section 2.2.1). According to the results of the sur-
vey, this study adopted several strategies to minimize depth measurement error, which
include: 1) a warm up time of at least 60 minutes should be waited before Kinect data
acquisition; 2) the measuring range was limited to between 800 and 1500 mm; and 3)
camera calibration procedures were performed to minimize systematic errors. It is quite
benecial to utilize a Kinect sensor for tracking and quantifying evolving 3D deformation
of a structural system. The color camera can capture 2D features (color, texture, geom-
etry, etc.) to recognize measuring objects in the color images, and the depth camera can
obtain depth data for reconstructing the 3D scenes of the measuring objects. Because the
two cameras are congured side-by-side, a pair of the color image and a depth image are
not aligned. The camera calibration procedures were preformed to determine intrinsic
and extrinsic parameters of the color and IR cameras. These procedures can be used to
align the color image and its corresponding depth map (Section 2.3).
To assess the performance of the inexpensive Kinect sensor for 3D dynamic displace-
ment measurement and to demonstrate its potential for real-world application, this study
conducted a comprehensive experimental research which involves three phases: 1) the
124
indoor sensor-level laboratory experiment; 2) the outdoor sensor-level performance eval-
uation; and 3) the proof-of-concept system-level demonstration. The indoor laboratory
experiment was conducted to perform a quantitative evaluation of Kinect 3D dynamic
displacement measurement by acquiring oscillation data of a rigid plate under various
testing scenarios (see Figure 3.7) including: 1) harmonic and random excitation; 2) dier-
ent frequency ranges (0 Hz to 2 Hz); 3) dierent amplitude levels (5 mm to 20 mm); and
4) dierent relative motions between the sensor and the rigid plate. The experimental
results reveal several important information for dynamic displacement measurement:
If the direction of the oscillation motion is parallel to the Z axis of the Kinect sensor,
the measurement error is mainly aected by quantization noise and depth accuracy
related to the measuring range.
If the direction of the oscillation motion is perpendicular to the Z axis of the Kiect
sensor, the measurement error increases when the vibration frequency rises due to
the rolling shutter distortion.
If the direction of the oscillation motion and the Z axis of the Kinect sensor is set
to 45
, the measurement error reduces due to less rolling shutter distortion.
Overall, the error of dynamic displacement measurement is between 5% and 10% for
a short-range measurement (0.8 to 1 m), if the amplitude is above 10 mm and the fre-
quency is below 2 Hz (see Figures 3.11 to 3.13). The results of this laboratory experiment
provide the guidelines of using a Kinect sensor to quantify the 3D dynamic displacement
regarding its hardware and software limitation, amplitude and frequency bond, rolling
shotter distortion, etc.
125
Beyond the laboratory experiment, this research went further to evaluate performance
of the Kinect sensor for 3D displacement data acquisition in the outdoor and dynamic en-
vironments. Several tests were performed in real-world situations with a vehicle-mounted
Kinect sensor to scan pavement distress (e.g., cracks, potholes, etc.) or a 3D object on a
road surface. To prevent infrared interference from strong sunlight, the vehicle-mounted
Kinect sensor was protected with a top-cover or full-cover sun shade (see Figure 4.2).
In addition to sunlight interference (Section 4.2.1), this study discovered two other issues
related to the hardware limitations through eld tests: 1) motion blur (Section 4.2.2); and
2) rolling shutter distortion (Section 4.2.3). Corresponding solutions for the two prob-
lems were also investigated. The cause of the motion blur problem is the long exposure
time of a Kinect sensor in a darker environment; hence, a strobe lighting technique was
used to "freeze" instantaneous motion (see Figure 4.4). The rolling shutter distortion is
due to the rolling shutter CMOS image sensors used by the RGB and IR cameras of a
Kinect sensor. A rectication algorithm based on the pinhole camera model was derived
to correct the distorted images (see Table 4.1).
Finally, this research proceeded with developing a proof-of-concept prototype of a
cost-eective multi-sensor automatic pavement condition survey system. This system is
composed of multiple Kinect sensors and the o-the-shelf devices (GPS, accelerometers,
USB DAQ, etc.). The total cost of the hardware components is about $6,000 (see Ta-
ble 5.2). To solve the synchronization problem, this study used a heuristic approach
with an LED cue marking to align data obtained by multiple o-the-shelf sensors (Sec-
tion 5.3). The automated multi-sensor pavement condition survey system (see Figure 5.2)
was mounted on a car to scan road surface in the streets close to the vicinity of the Uni-
versity of Southern California in downtown Los Angeles at the lower speeds (5 mph to
126
35 mph) and on highway SR-110 at the higher speeds (> 40 mph). A defect detection
algorithm was used to identify severe pavement distress (e.g., large cracks and potholes)
in the depth images acquired by the data acquisition system (Figure 5.11). Through the
eld tests, this proof-of-concept pavement condition survey system showed its potential
for automatic road condition assessment. This system could be easily mounted on vehicles
to perform periodic data collection and maintains data in a main server. Such frequent
data gathering approach can help monitoring the formation and evolution of defects in
time, thus may lead to earlier preventive maintenance.
Based on several eld tests on local streets and freeways, the recommendations to
improve the inexpensive vision-based pavement data collection system in the future can
be summarized as follows:
(i). Integrate Kinect sensors with Inertial Measurement Units (IMU) to rectify rolling
shutter distortion and facilitate the accurate stitching of multiple images.
(ii). Use a GPS/INS sensor, which is composed of an IMU, a GPS receiver, and a Kalman
ltering algorithm, to provide location data with higher bandwidth, even when the
GPS signal is not available.
(iii). Improve the data acquisition software to fuse and synchronize dierent sensor data
sets more eciently.
(iv). Develop a pavement distress detection algorithm to identify, localize, and quantify
pavement distress automatically from various acquired data.
127
BIBLIOGRAPHY
Agdas, D., J. A. Rice, J. R. Martinez, and I. R. Lasa (2015). Comparison of visual
inspection and structural-health monitoring as bridge condition assessment methods.
Journal of Performance of Constructed Facilities 30 (3), 04015049.
Ahmed, N. and I. Junejo (2014). Using multiple rgb-d cameras for 3d video acquisition
and spatio-temporally coherent 3d animation reconstruction. International Journal of
Computer Theory and Engineering 6 (6), 447.
AIAG, A. (2002). Measurement systems analysis-reference manual. The Atomotive
Industries Action Group, Troy, MI .
Ait-Aider, O., N. Andre, J.-M. Lavest, and P. Martinet (2006a). Exploiting rolling
shutter distortions for simultaneous object pose and velocity computation using a single
view. In Computer Vision Systems, 2006 ICVS'06. IEEE International Conference on,
pp. 35{35. IEEE.
Ait-Aider, O., N. Andre, J. M. Lavest, and P. Martinet (2006b). Simultaneous object
pose and velocity computation using a single view from a rolling shutter camera. In
European Conference on Computer Vision, pp. 56{68. Springer.
Ait-Aider, O., A. Bartoli, and N. Andre (2007). Kinematics from lines in a single
rolling shutter image. In Computer Vision and Pattern Recognition, 2007. CVPR'07.
IEEE Conference on, pp. 1{6. IEEE.
Alnowami, M., B. Alnwaimi, F. Tahavori, M. Copland, and K. Wells (2012). A quantita-
tive assessment of using the kinect for xbox 360 for respiratory surface motion tracking.
In Proc. SPIE, Volume 8316, pp. 83161T.
Andersen, M. R., T. Jensen, P. Lisouski, A. K. Mortensen, M. K. Hansen, T. Gregersen,
and P. Ahrendt (2012). Kinect depth sensor evaluation for computer vision applications.
Electrical and Computer Engineering Technical Report ECE-TR-6 .
Azad, P., T. Asfour, and R. Dillmann (2009). Combining harris interest points and
the sift descriptor for fast scale-invariant object recognition. In Intelligent Robots and
Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pp. 4275{4280.
IEEE.
128
Baqersad, J., C. Niezrecki, and P. Avitabile (2015). Extracting full-eld dynamic strain
on a wind turbine rotor subjected to arbitrary excitations using 3d point tracking and a
modal expansion technique. Journal of Sound and Vibration 352, 16{29.
Barazzetti, L. and M. Scaioni (2010). Development and implementation of image-based
algorithms for measurement of deformations in material testing. Sensors 10 (8), 7469{
7495.
Barrows, D. (2007). Videogrammetric model deformation measurement technique for
wind tunnel applications. In 45th AIAA Aerospace Sciences Meeting and Exhibit, pp.
1163.
Benko, H., R. Jota, and A. Wilson (2012). Miragetable: freehand interaction on a pro-
jected augmented reality tabletop. In Proceedings of the SIGCHI conference on human
factors in computing systems, pp. 199{208. ACM.
Berger, K., K. Ruhl, Y. Schroeder, C. Bruemmer, A. Scholz, and M. A. Magnor (2011).
Markerless motion capture using multiple color-depth sensors. In VMV, pp. 317{324.
Blandino, J. R., R. S. Pappa, and J. T. Black (2003). Modal identication of membrane
structures with videogrammetry and laser vibrometry. AIAA Paper 1745(11).
Bouguet, J. (2015). Camera Calibration Toolbox for MATLAB. http://www.vision.
caltech.edu/bouguetj/calib_doc/index.html .
Bradski, G. and A. Kaehler (2008). Learning OpenCV: Computer vision with the OpenCV
library. " O'Reilly Media, Inc.".
Breuer, P., T. Chmielewski, P. G orski, and E. Konopka (2002). Application of gps
technology to measurements of displacements of high-rise structures due to weak winds.
Journal of Wind Engineering and Industrial Aerodynamics 90 (3), 223{230.
Burdet, O. (1998). Automatic de
ection and temperature monitoring of a balanced
cantilever concrete bridge. In 5th International Conference of Short and Medium
Span Bridges, Number EPFL-CONF-111633. 5th International Conference of Short and
Medium Span Bridges.
Burrus, N. (2010). Kinect calibration. http://nicolas.burrus.name/index.php/
Research/KinectCalibration .
Butler, D. A., S. Izadi, O. Hilliges, D. Molyneaux, S. Hodges, and D. Kim (2012).
Shake'n'sense: reducing interference for overlapping structured light depth cameras. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp.
1933{1936. ACM.
129
Caon, M., Y. Yue, J. Tscherrig, E. Mugellini, and O. A. Khaled (2011). Context-aware
3d gesture interaction based on multiple kinects. In Proceedings of the rst international
conference on ambient computing, applications, services and technologies, AMBIENT,
pp. 7{12. Citeseer.
Castellini, P., M. Martarelli, and E. Tomasini (2006). Laser doppler vibrometry: Devel-
opment of advanced solutions answering to technology's needs. Mechanical Systems and
Signal Processing 20 (6), 1265{1285.
Chang, C. and Y. Ji (2007). Flexible videogrammetric technique for three-dimensional
structural vibration measurement. Journal of engineering mechanics 133 (6), 656{664.
Chang, J.-R., K.-T. Chang, and D.-H. Chen (2005). Application of 3d laser scanning on
measuring pavement roughness. Journal of Testing and Evaluation 34 (2), 83{91.
Chang, K., J. Chang, and J. Liu (2005). Detection of pavement distresses using 3d laser
scanning technology. In Computing in Civil Engineering (2005), pp. 1{11.
Chen, J. G., N. Wadhwa, Y.-J. Cha, F. Durand, W. T. Freeman, and O. Buyukozturk
(2015). Modal identication of simple structures with high-speed video using motion
magnication. Journal of Sound and Vibration 345, 58{71.
Chen, Y. L., M. R. Jahanshahi, P. Manjunatha, W. Gan, M. Abdelbarr, S. F. Masri,
B. Becerik-Gerber, and J. P. Carey (2016). Inexpensive multimodal sensor fusion sys-
tem for autonomous data acquisition of road surface conditions. IEEE Sensors Jour-
nal 16(21), 7731{7743.
Choi, H.-S., J.-H. Cheung, S.-H. Kim, and J.-H. Ahn (2011). Structural dynamic dis-
placement vision system using digital image processing. NDT & E International 44 (7),
597{608.
Choo, B., M. Landau, M. DeVore, and P. A. Beling (2014). Statistical analysis-based
error models for the microsoft kinecttm depth sensor. Sensors 14 (9), 17430{17450.
Chow, J., K. Ang, D. Lichti, and W. Teskey (2012). Performance analysis of a low-cost
triangulation-based 3d camera: Microsoft kinect system. In Int. Soc. for Photogrammetry
and Remote Sensing Congress (ISPRS), Volume 39, pp. B5.
CIA (2017). Central Intelligence Agency, The world factbook. https://www.cia.gov/
library/publications/the-world-factbook/rankorder/2085rank.html .
Crammond, G., S. Boyd, and J. Dulieu-Barton (2013). Speckle pattern quality assess-
ment for digital image correlation. Optics and Lasers in Engineering 51 (12), 1368{1378.
Cyganek, B. and J. P. Siebert (2011). An introduction to 3D computer vision techniques
and algorithms. John Wiley & Sons.
130
Dai, F., S. Dong, V. R. Kamat, and M. Lu (2011). Photogrammetry assisted measure-
ment of interstory drift for rapid post-disaster building damage reconnaissance. Journal
of Nondestructive Evaluation 30 (3), 201{212.
Dal Mutto, C., P. Zanuttigh, and G. M. Cortelazzo (2012). Microsoft kinect range
camera. In Time-of-Flight Cameras and Microsoft Kinect, pp. 33{47. Springer.
Detchev, I., A. Habib, and M. El-Badry (2013). Dynamic beam deformation measure-
ments with o-the-shelf digital cameras. Journal of Applied Geodesy 7 (3), 147{157.
Dutta, T. (2012). Evaluation of the kinect sensor for 3-d kinematic measurement in the
workplace. Applied ergonomics 43 (4), 645{649.
Eitzinger, C., S. Zambal, and P. Thanner (2015). Robotic inspection systems. In Inte-
grated Imaging and Vision Techniques for Industrial Inspection, pp. 321{350. Springer.
Fabian, J. and G. M. Clayton (2014). Error analysis for visual odometry on indoor,
wheeled mobile robots with 3-d sensors. IEEE/ASME Transactions on Mechatron-
ics 19(6), 1896{1906.
Feher, K. (1997). Telecommunications measurements, analysis, and instrumentation.
SciTech Publishing.
Feng, D. and M. Q. Feng (2015). Model updating of railway bridge using in situ dynamic
displacement measurement under trainloads. Journal of Bridge Engineering 20 (12),
04015019.
Feng, D. and M. Q. Feng (2016). Vision-based multipoint displacement measurement for
structural health monitoring. Structural Control and Health Monitoring 23 (5), 876{890.
Fiedler, D. and H. M uller (2013). Impact of thermal and environmental conditions on
the kinect sensor. In Advances in Depth Image Analysis and Applications, pp. 21{31.
Springer.
Finlayson, G. D., S. D. Hordley, and M. S. Drew (2002). Removing shadows from images.
In European conference on computer vision, pp. 823{836. Springer.
Fraser, C. S. and B. Riedel (2000). Monitoring the thermal deformation of steel beams
via vision metrology. ISPRS Journal of Photogrammetry and Remote Sensing 55 (4),
268{276.
Freedman, B., A. Shpunt, and Y. Arieli (2014, June 24). Distance-varying illumination
and imaging techniques for depth mapping. US Patent 8,761,495.
Freedman, B., A. Shpunt, M. Machline, and Y. Arieli (2013, July 23). Depth mapping
using projected patterns. US Patent 8,493,496.
131
Geiger, A., F. Moosmann,
O. Car, and B. Schuster (2012). Automatic camera and range
sensor calibration using a single shot. In Robotics and Automation (ICRA), 2012 IEEE
International Conference on, pp. 3936{3943. IEEE.
Geng, J. (2011). Structured-light 3d surface imaging: a tutorial. Advances in Optics
and Photonics 3 (2), 128{160.
Gindy, M., H. Nassif, and J. Velde (2007). Bridge displacement estimates from measured
acceleration records. Transportation Research Record: Journal of the Transportation
Research Board (2028), 136{145.
Gindy, M., R. Vaccaro, H. Nassif, and J. Velde (2008). A state-space approach for
deriving bridge displacement from acceleration. Computer-Aided Civil and Infrastructure
Engineering 23(4), 281{290.
Girardeau-Montaut, D. (2017). Cloudcompare. http://www.danielgm.net/cc/ .
Gonzalez-Jorge, H., B. Riveiro, E. Vazquez-Fernandez, J. Mart nez-S anchez, and P. Arias
(2013). Metrological evaluation of microsoft kinect and asus xtion sensors. Measure-
ment 46(6), 1800{1806.
Haggag, H., M. Hossny, D. Filippidis, D. Creighton, S. Nahavandi, and V. Puri (2013).
Measuring depth accuracy in rgbd cameras. In Signal Processing and Communication
Systems (ICSPCS), 2013 7th International Conference on, pp. 1{7. IEEE.
Harris, N. K., A. Gonz alez, E. J. OBrien, and P. McGetrick (2010). Characterisation of
pavement prole heights using accelerometer readings and a combinatorial optimisation
technique. Journal of Sound and Vibration 329 (5), 497{508.
Helfrick, M. N., C. Niezrecki, P. Avitabile, and T. Schmidt (2011). 3d digital image
correlation methods for full-eld vibration measurement. Mechanical Systems and Signal
Processing 25 (3), 917{927.
Heptagon (2016). Swissrange sr4000. http://hptg.com/industrial/ .
Hester, D., J. Brownjohn, M. Bocian, and Y. Xu (2017). Low cost bridge load test:
Calculating bridge displacement from acceleration for load assessment calculations. En-
gineering Structures 143, 358{374.
Hou, Z., K. C. Wang, and W. Gong (2007). Experimentation of 3d pavement imaging
through stereovision. In International Conference on Transportation Engineering 2007,
pp. 376{381.
Huang, Y. and B. Xu (2006). Automatic inspection of pavement cracking distress.
Journal of Electronic Imaging 15 (1), 013017{013017.
Im, S. B., S. Hurlebaus, and Y. J. Kang (2011). Summary review of gps technology for
structural health monitoring. Journal of Structural Engineering 139 (10), 1653{1664.
132
Islam, S., W. Buttlar, R. Aldunate, and W. Vavrik (2014). Measurement of pavement
roughness using android-based smartphone application. Transportation Research Record:
Journal of the Transportation Research Board (2457), 30{38.
Izadi, S., D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton,
S. Hodges, D. Freeman, A. Davison, et al. (2011). Kinectfusion: real-time 3d recon-
struction and interaction using a moving depth camera. In Proceedings of the 24th
annual ACM symposium on User interface software and technology, pp. 559{568. ACM.
Jahanshahi, M. R., F. Jazizadeh, S. F. Masri, and B. Becerik-Gerber (2012). Unsuper-
vised approach for autonomous pavement-defect detection and quantication using an
inexpensive depth sensor. Journal of Computing in Civil Engineering 27 (6), 743{754.
Jenkins, C. (2001). Gossamer Spacecraft: Membrane and In
atable Structures Tech-
nology for Space Applications. Number v. 191 in Gossamer spacecraft: membrane and
in
atable structures technology for space applications. American Institute of Aeronautics
and Astronautics.
Kaehler, A. and G. Bradski (2016). Learning OpenCV 3: Computer Vision in C++ with
the OpenCV Library. " O'Reilly Media, Inc.".
Kao, C.-Y. and C.-H. Loh (2013). Monitoring of long-term static deformation data of
fei-tsui arch dam using articial neural network-based approaches. Structural Control
and Health Monitoring 20 (3), 282{303.
Karpenko, A., D. Jacobs, J. Baek, and M. Levoy (2011). Digital video stabilization and
rolling shutter correction using gyroscopes. CSTR 1, 2.
Khoshelham, K. and S. O. Elberink (2012). Accuracy and resolution of kinect depth
data for indoor mapping applications. Sensors 12 (2), 1437{1454.
Kijewski-Correa, T. and M. Kochly (2007). Monitoring the wind-induced response of
tall buildings: Gps performance and the issue of multipath eects. Journal of Wind
Engineering and Industrial Aerodynamics 95 (9), 1176{1198.
Kim, K. and J. Kim (2015). Dynamic displacement measurement of a vibratory object
using a terrestrial laser scanner. Measurement Science and Technology 26 (4), 045002.
Koch, C., G. M. Jog, and I. Brilakis (2012). Automated pothole distress assessment
using asphalt pavement video data. Journal of Computing in Civil Engineering 27 (4),
370{378.
Koehler, C., Z. Liang, Z. Gaston, H. Wan, and H. Dong (2012). 3d reconstruction
and analysis of wing deformation in free-
ying dragon
ies. Journal of Experimental
Biology 215(17), 3018{3027.
133
Konolige, K. and P. Mihelich (2011). Technical description of kinect calibration. http:
//wiki.ros.org/kinect_calibration/technical .
Kramer, J., N. Burrus, F. Echtler, H. C. Daniel, and M. Parker (2012). Hacking the
Kinect, Volume 268. Springer.
Langmann, B. (2014). Wide Area 2D/3D Imaging: Development, Analysis and Applica-
tions. Springer.
Laurent, J., J. F. H ebert, D. Lefebvre, and Y. Savard (2012). Using 3d laser prol-
ing sensors for the automated measurement of road surface conditions. In 7th RILEM
International Conference on Cracking in Pavements, pp. 157{167. Springer.
Lecompte, D., A. Smits, S. Bossuyt, H. Sol, J. Vantomme, D. Van Hemelrijck, and
A. Habraken (2006). Quality assessment of speckle patterns for digital image correlation.
Optics and lasers in Engineering 44 (11), 1132{1145.
Lee, H. S., Y. H. Hong, and H. W. Park (2010). Design of an r lter for the displace-
ment reconstruction using measured acceleration in low-frequency dominant structures.
International Journal for Numerical Methods in Engineering 82 (4), 403{434.
Lee, J. J. and M. Shinozuka (2006). A vision-based system for remote sensing of bridge
displacement. Ndt & E International 39 (5), 425{431.
Lemmens, M. (2011). Terrestrial laser scanning. In Geo-information, pp. 101{121.
Springer.
Li, Q., M. Yao, X. Yao, and B. Xu (2009). A real-time 3d scanning system for pavement
distortion inspection. Measurement Science and Technology 21 (1), 015702.
Liang, C.-K., L.-W. Chang, and H. H. Chen (2008). Analysis and compensation of rolling
shutter eect. IEEE Transactions on Image Processing 17 (8), 1323{1330.
Lichter, M. D. and S. Dubowsky (2005). Shape, motion, and parameter estimation of
large
exible space structures using range images. In Robotics and Automation, 2005.
ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pp. 4476{4481.
IEEE.
Lovse, J., W. Teskey, G. Lachapelle, and M. Cannon (1995). Dynamic deformation mon-
itoring of tall structure using gps technology. Journal of surveying engineering 121 (1),
35{40.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Interna-
tional journal of computer vision 60 (2), 91{110.
Lucieer, A., S. M. d. Jong, and D. Turner (2014). Mapping landslide displacements using
structure from motion (sfm) and image correlation of multi-temporal uav photography.
Progress in Physical Geography 38 (1), 97{116.
134
Lynch, J. P. and K. J. Loh (2006). A summary review of wireless sensors and sensor
networks for structural health monitoring. Shock and Vibration Digest 38 (2), 91{130.
Ma, Y., S. Soatto, J. Kosecka, and S. S. Sastry (2012). An invitation to 3-d vision: from
images to geometric models, Volume 26. Springer Science & Business Media.
Magnenat, S. (2010). Depth-disparity calibration model. https://groups.google.
com/forum/#!topic/openkinect/AxNRhG_TPHg .
Maimone, A. and H. Fuchs (2011). Encumbrance-free telepresence system with real-
time 3d capture and display using commodity depth cameras. In Mixed and Augmented
Reality (ISMAR), 2011 10th IEEE International Symposium on, pp. 137{146. IEEE.
Maimone, A. and H. Fuchs (2012). Reducing interference between multiple structured
light depth sensors using motion. In Virtual Reality Short Papers and Posters (VRW),
2012 IEEE, pp. 51{54. IEEE.
Maji, A. and M. Starnes (2000). Shape measurement and control of deployable membrane
structures. Experimental mechanics 40 (2), 154{159.
Malet, J.-P., O. Maquaire, and E. Calais (2002). The use of global positioning system
techniques for the continuous monitoring of landslides: application to the super-sauze
earth
ow (alpes-de-haute-provence, france). Geomorphology 43 (1), 33{54.
Mallick, T., P. P. Das, and A. K. Majumdar (2014). Characterizations of noise in kinect
depth images: A review. IEEE Sensors journal 14 (6), 1731{1740.
Mart n, R. M., M. Lorbach, and O. Brock (2014). Deterioration of depth measurements
due to interference of multiple rgb-d sensors. In Intelligent Robots and Systems (IROS
2014), 2014 IEEE/RSJ International Conference on, pp. 4205{4212. IEEE.
McGhee, K. H. (2004). Automated pavement distress collection techniques, Volume 334.
Transportation Research Board.
Menna, F., F. Remondino, R. Battisti, and E. Nocerino (2011). Geometric investigation
of a gaming active device. In Proc. SPIE, Volume 8085, pp. 80850G.
Microsoft (2017).
Mihelich, P. (2012). Ros wiki. http://wiki.ros.org/openni_camera/diamondback .
Mills, J. P., I. Newton, and G. C. Peirson (2001). Pavement deformation monitoring in
a rolling load facility. The Photogrammetric Record 17 (97), 07{24.
Morris, A. S. and R. Langari (2012). Measurement and instrumentation: theory and
application. Academic Press.
135
Mraz, A. and A. Nazef (2008). Innovative techniques with a multipurpose survey vehicle
for automated analysis of cross-slope data. Transportation Research Record: Journal of
the Transportation Research Board (2068), 32{38.
Nassif, H. H., M. Gindy, and J. Davis (2005). Comparison of laser doppler vibrometer
with contact sensors for monitoring bridge de
ection and vibration. Ndt & E Interna-
tional 38(3), 213{218.
Nguyen, C. V., S. Izadi, and D. Lovell (2012). Modeling kinect sensor noise for improved
3d reconstruction and tracking. In 3D Imaging, Modeling, Processing, Visualization
and Transmission (3DIMPVT), 2012 Second International Conference on, pp. 524{530.
IEEE.
Ni, Y., Y. Xia, W. Liao, and J. Ko (2009). Technology innovation in developing the
structural health monitoring system for guangzhou new tv tower. Structural Control
and Health Monitoring 16 (1), 73{98.
Olaszek, P. (1999). Investigation of the dynamic characteristic of bridge structures using
a computer vision method. Measurement 25 (3), 227{236.
Oliver, A., S. Kang, B. C. W unsche, and B. MacDonald (2012). Using the kinect as a
navigation sensor for mobile robotics. In Proceedings of the 27th conference on image
and vision computing New Zealand, pp. 509{514. ACM.
OpenKinect (2011). Hardware info. https://openkinect.org/wiki/Hardware_
info .
Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE trans-
actions on systems, man, and cybernetics 9 (1), 62{66.
Ouyang, W. and B. Xu (2013). Pavement cracking measurements using 3d laser-scan
images. Measurement Science and Technology 24 (10), 105204.
Pan, B., K. Qian, H. Xie, and A. Asundi (2009). Two-dimensional digital image correla-
tion for in-plane displacement and strain measurement: a review. Measurement science
and technology 20(6), 062001.
Park, J.-H., Y.-D. Shin, J.-H. Bae, and M.-H. Baeg (2012). Spatial uncertainty model
for visual features using a kinect sensor. Sensors 12 (7), 8640{8662.
Park, J.-W., S.-H. Sim, H.-J. Jung, et al. (2013). Development of a wireless displacement
measurement system using acceleration responses. Sensors 13 (7), 8377{8392.
Pears, N., Y. Liu, and P. Bunting (2012). 3D imaging, analysis and applications, Vol-
ume 3. Springer.
Petrie, G. and C. K. Toth (2009). Terrestrial laser scanners. Topographic Laser Ranging
and Scanning Principles and Processing, 87{128.
136
Psimoulis, P., S. Pytharouli, D. Karambalis, and S. Stiros (2008). Potential of global
positioning system (gps) to measure frequencies of oscillations of engineering structures.
Journal of Sound and Vibration 318 (3), 606{623.
Qi, X., D. Lichti, M. El-Badry, J. Chow, and K. Ang (2014). Vertical dynamic de
ection
measurement in concrete beams with the microsoft kinect. Sensors 14 (2), 3293{3307.
Rabakhsh, N., J. Gong, M. K. Siddiqui, C. Gordon, and H. F. Lee (2012). Analysis of
xbox kinect sensor data for use on construction sites: depth accuracy and sensor inter-
ference assessment. In Construction Research Congress 2012: Construction Challenges
in a Flat World, pp. 848{857.
Rajaram, S., P. Vanniamparambil, F. Khan, M. Bolhassani, A. Koutras, I. Bartoli,
F. Moon, A. Hamid, P. Benson Shing, J. Tyson, et al. (2017). Full-eld deformation mea-
surements during seismic loading of masonry buildings. Structural Control and Health
Monitoring 24 (4).
Ren, Z., J. Yuan, J. Meng, and Z. Zhang (2013). Robust part-based hand gesture
recognition using kinect sensor. IEEE transactions on multimedia 15 (5), 1110{1120.
Ribeiro, J., J. De Castro, and J. Freire (2001). New improvements in the digital dou-
ble integration ltering method to measure displacements using accelerometers,# 73.
In PROCEEDINGS-SPIE THE INTERNATIONAL SOCIETY FOR OPTICAL ENGI-
NEERING, Volume 1, pp. 538{542. International Society for Optical Engineering; 1999.
Sarbolandi, H., D. Le
och, and A. Kolb (2015). Kinect range sensing: Structured-light
versus time-of-
ight kinect. Computer Vision and Image Understanding 139, 1{20.
Sekiya, H., K. Kimura, and C. Miki (2016). Technique for determining bridge displace-
ment response using mems accelerometers. Sensors 16 (2), 257.
Serigos, P. A., J. A. Prozzi, A. de Fortier Smit, and M. R. Murphy (2016). Evaluation
of 3d automated systems for the measurement of pavement surface cracking. Journal of
Transportation Engineering 142 (6), 05016003.
Shotton, J., R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore,
P. Kohli, A. Criminisi, A. Kipman, et al. (2013). Ecient human pose estimation
from single depth images. IEEE Transactions on Pattern Analysis and Machine In-
telligence 35(12), 2821{2840.
Shpunt, A. and B. Pesach (2013, February 26). Optical pattern projection. US Patent
8,384,997.
Slifka, L. D. (2004). An accelerometer based approach to measuring displacement of a
vehicle body. Ph. D. thesis, University of Michigan{Dearborn.
137
Smisek, J., M. Jancosek, and T. Pajdla (2013). 3d with kinect. In Consumer depth
cameras for computer vision, pp. 3{25. Springer.
Smyth, A. and J. Pei (2000). Integration of measured response signals for nonlinear struc-
tural health monitoring. In 3rd US-Japan Workshop on Nonlinear System Identication
and Health Monitoring. USC.
Smyth, A. and M. Wu (2007). Multi-rate kalman ltering for the data fusion of displace-
ment and acceleration response measurements in dynamic system monitoring. Mechan-
ical Systems and Signal Processing 21 (2), 706{723.
Stephen, G., J. Brownjohn, and C. Taylor (1993). Measurements of static and dynamic
displacement from visual monitoring of the humber bridge. Engineering Structures 15 (3),
197{208.
Stone, E. E. and M. Skubic (2011). Evaluation of an inexpensive depth camera for
passive in-home fall risk assessment. In Pervasive Computing Technologies for Healthcare
(PervasiveHealth), 2011 5th International Conference on, pp. 71{77. Ieee.
Stowers, J., M. Hayes, and A. Bainbridge-Smith (2011). Altitude control of a quadrotor
helicopter using depth map from microsoft kinect sensor. In Mechatronics (ICM), 2011
IEEE International Conference on, pp. 358{362. IEEE.
Tai, Y.-c., C.-w. Chan, and J. Y.-j. Hsu (2010). Automatic road anomaly detection
using smart mobile device. In conference on technologies and applications of articial
intelligence, Hsinchu, Taiwan.
Tamura, Y., M. Matsui, L.-C. Pagnini, R. Ishibashi, and A. Yoshida (2002). Measure-
ment of wind-induced response of buildings using rtk-gps. Journal of Wind Engineering
and Industrial Aerodynamics 90 (12), 1783{1793.
Tong, J., J. Zhou, L. Liu, Z. Pan, and H. Yan (2012). Scanning 3d full human bodies
using kinects. IEEE transactions on visualization and computer graphics 18 (4), 643{650.
Trujillo, D. and A. Carter (1982). A new approach to the integration of accelerometer
data. Earthquake engineering & structural dynamics 10 (4), 529{535.
Tsai, Y.-C. J. and F. Li (2012). Critical assessment of detecting asphalt pavement
cracks under dierent lighting and low intensity contrast conditions using emerging 3d
laser technology. Journal of Transportation Engineering 138 (5), 649{656.
USDOT (2017). United states Department of Transportation, Bureau of
Transportation Statistics, National Transportation Statistics, Table 1-4: Pub-
lic road and street mileage in the united states by type of surface. https:
//www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/publications/
national_transportation_statistics/html/table_01_04.html .
138
Vavrik, W. R., L. D. Evans, J. A. Stefanski, and S. Sargand (2013). Pcr evaluation{
considering transition from manual to semi-automated pavement distress collection and
analysis.
Vines-Cavanaugh, D., S. S. Shamsabadi, Y. Zhao, G. Huang, S. Wadia-Fascetti,
G. Schirner, R. Birken, and M. Wang (2016). City-wide application of the aordable and
rapid streetscan pavement-management system. Journal of Infrastructure Systems 23 (2),
B4016010.
Wahbeh, A. M., J. P. Carey, and S. F. Masri (2003). A vision-based approach for
the direct measurement of displacements in vibrating systems. Smart materials and
structures 12 (5), 785.
Wang, K. C. and W. Gong (2002). Automated pavement distress survey: a review and
a new direction. In Pavement Evaluation Conference, pp. 21{25.
Wang, K. C. and O. Smadi (2011). Automated imaging technologies for pavement distress
surveys. Transportation Research E-Circular (E-C156).
Wang, M., R. Birken, and S. S. Shamsabadi (2015). Implementation of a multi-modal
mobile sensor system for surface and subsurface assessment of roadways. In SPIE
Smart Structures and Materials+ Nondestructive Evaluation and Health Monitoring,
pp. 943607{943607. International Society for Optics and Photonics.
Wang, S., C. J. Tay, C. Quan, and H. M. Shang (2002). Investigation of membrane
deformation by a fringe projection method. Applied optics 41 (1), 101{107.
Webb, J. and J. Ashley (2012). Beginning Kinect Programming with the Microsoft Kinect
SDK. Apress.
W ohler, C. (2012). 3D computer vision: ecient methods and applications. Springer
Science & Business Media.
Xiang, W., C. Conly, C. D. McMurrough, and V. Athitsos (2015). A review and quantita-
tive comparison of methods for kinect calibration. In Proceedings of the 2nd international
Workshop on Sensor-based Activity Recognition and Interaction, pp. 3. ACM.
Xu, A., Z. Xie, J. Fu, J. Wu, and A. Tuan (2014). Evaluation of wind loads on super-
tall buildings from eld-measured wind-induced acceleration response. The Structural
Design of Tall and Special Buildings 23 (9), 641{663.
Yang, Q., C. Wang, Y. Gao, H. Qu, and E. Y. Chang (2011). Inertial sensors aided
image alignment and stitching for panorama on mobile phones. In Proceedings of the 1st
international workshop on Mobile location-based service, pp. 21{30. ACM.
Yi, T.-H., H.-N. Li, and M. Gu (2013). Recent research and applications of gps-based
monitoring technology for high-rise structures. Structural Control and Health Monitor-
ing 20(5), 649{670.
139
Yoneyama, S., A. Kitagawa, S. Iwata, K. Tani, and H. Kikuta (2007). Bridge de
ection
measurement using digital image correlation. Experimental Techniques 31 (1), 34{40.
Yuan, C. and H. Cai (2014). Automatic detection of pavement surface defects using
consumer depth camera. In Construction Research Congress 2014: Construction in a
Global Network, pp. 974{983.
Zalevsky, Z., A. Shpunt, A. Malzels, and J. Garcia (2013, March 19). Method and system
for object reconstruction. US Patent 8,400,494.
Zhang, Z. (2000). A
exible new technique for camera calibration. IEEE Transactions
on pattern analysis and machine intelligence 22 (11), 1330{1334.
Zhang, Z. (2004). Camera Calibration. In G. Medioni and S. B. Kang (Eds.), Emerging
Topics in Computer Vision, Chapter 2, pp. 4{43. Prentice Hall Professional Technical
Reference.
140
Abstract (if available)
Abstract
This dissertation focuses on a comprehensive evaluation of using a class of inexpensive off-the-shelf vision-based sensors - the RGB-D cameras - to quantify the evolving 3D displacement field for a continuous structural system. Measuring evolving 3D displacement field is an important yet challenging task in many engineering and scientific applications. Although there are several sensors for direct or indirect displacement measurements at specific point in an uni-axial direction or multi-component deformations, these sensors either suffer from fundamental limitations based on their operation principles and/or are relatively quite expensive. Compare to these sensors, the RGB-D cameras have several attractive features including: the non-contact displacement measurement, rapid full-field 3D data acquisition, affordable price, lightweight and compact size, and easy to operate. These characteristics make the RGB-D cameras a promising technology to quantify evolving 3D displacement field. ❧ This comprehensive investigation, including the sensor-level and system-level experiments, is composed of four experimental stages to assess the accuracy and performance of the representative RGB-D camera -- the Microsoft Kinect sensor. In the stage I, this study focused on sensor calibration for accurate 3D displacement measurements. In the stage II, the laboratory experiments were performed to quantify the accuracy of the RGB-D camera to acquire dynamic motions of a test structure under varying amplitude and spectral characteristics, and with different configurations of the position and orientation of the sensor with respect to the target structure. In the stage III, the field tests were conducted to evaluate the performance of the sensor in the outdoor and dynamic environments. In the stage IV, a novel, relatively inexpensive, vision-based sensor system was built using cost-effective off-the-shelf devices (RGB-D cameras, accelerometers, and a GPS), which can be mounted on a vehicle for enabling the automated 2D and 3D image acquisition of road surface condition. ❧ It is shown that the sensor under investigation, when operated under the performance envelope discussed in this dissertation (i.e., measuring range is about 1 m), can provide with acceptable accuracy (i.e., an error of about 5\% for displacement larger than 10 mm), a very convenient and simple means of quantifying 3D displacement fields that are dynamically changing at relative low-frequency rate typically encountered in the structural dynamics field. Several issues related to the hardware limitations which can produce noisy data were also investigated through the field tests including: sunlight interference, motion blur, and rolling shutter distortion. Corresponding solutions were proposed to improve the data quality. ❧ This dissertation results in the development of a fairly inexpensive proof-of-concept prototype of a multi-sensor pavement condition survey system which costs under $6,000. It is shown that the proposed multi-sensor system, by utilizing data-fusion approaches of the type developed in this study, can provide a cost-effective road surface monitoring technique with sufficient accuracy to satisfy typical maintenance needs, in regard to the detection, localization, and quantification of potholes and similar qualitative deterioration features where the measurements are acquired via a vehicle moving at normal speeds on typical city streets. The proposed system can be easily mounted on vehicles to enable frequent data collection and help monitoring the conditions of defects in time. Suggestions for future research needs to enhance the capabilities of the proposed system are included.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Experimental and analytical studies of infrastructure systems condition assessment using different sensing modality
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
Vision-based studies for structural health monitoring and condition assesment
PDF
Point-based representations for 3D perception and reconstruction
PDF
An approach to experimentally based modeling and simulation of human motion
PDF
Vision-based and data-driven analytical and experimental studies into condition assessment and change detection of evolving civil, mechanical and aerospace infrastructures
PDF
3D deep learning for perception and modeling
PDF
Feature-preserving simplification and sketch-based creation of 3D models
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Towards building a live 3D digital twin of the world
PDF
Thermal and deformation analysis of multiphase sulfur concrete extrusion for planetary construction
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
3D object detection in industrial site point clouds
PDF
Hybrid vat photopolymerization: methods and systems
PDF
Reduction of large set data transmission using algorithmically corrected model-based techniques for bandwidth efficiency
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Deformable geometry design with controlled mechanical property based on 3D printing
PDF
Analytical and experimental studies in modeling and monitoring of uncertain nonlinear systems using data-driven reduced‐order models
PDF
Analytical and experimental studies in system identification and modeling for structural control and health monitoring
PDF
Machine learning methods for 2D/3D shape retrieval and classification
Asset Metadata
Creator
Chen, Yulu Luke
(author)
Core Title
An analytical and experimental study of evolving 3D deformation fields using vision-based approaches
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Astronautical Engineering
Publication Date
12/01/2017
Defense Date
10/24/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D sensor,deformation field,depth image,multi-sensor,OAI-PMH Harvest,RGB-D camera,vision-based
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kunc, Joseph (
committee chair
), Erwin, Dan (
committee member
), Masri, Sami (
committee member
)
Creator Email
luke2chen@gmail.com,yuluchen@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-459874
Unique identifier
UC11266402
Identifier
etd-ChenYuluLu-5927.pdf (filename),usctheses-c40-459874 (legacy record id)
Legacy Identifier
etd-ChenYuluLu-5927.pdf
Dmrecord
459874
Document Type
Dissertation
Rights
Chen, Yulu Luke
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D sensor
deformation field
depth image
multi-sensor
RGB-D camera
vision-based