Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Vision-based studies for structural health monitoring and condition assesment
(USC Thesis Other)
Vision-based studies for structural health monitoring and condition assesment
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
VISION-BASED STUDIES FOR STRUCTURAL HEALTH MONITORING AND CONDITION ASSESSMENT by Mohammad Reza Jahanshahi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (CIVIL ENGINEERING) August 2011 Copyright 2011 Mohammad Reza Jahanshahi Dedication To Him who taught man what he knew not To my parents, family and fabulous wife ii Acknowledgements I would like to express my utmost respect and appreciation to my thesis adviser, Prof. Sami F. Masri, for his guidance, support, patience, understanding and knowledge. His contribution to my life was not limited to my research; he taught me how a person can be a mentor, a supervisor and a gentleman at the same time. I would also like to thank Prof. Gaurav Sukhatme for his support, guidance and insight, and Prof. Carter Wellford, the former chairman of the Civil Engineering Department at USC, for his great advisement and support from the first day that I was admitted to the graduate program. Thanks to Prof. Jiin-Jen Lee for serving on my dissertation committee and Dr. John Caffrey for reviewing my dissertation proposal. I want to thank ENB Group at Judico. Co. for sharing crack detection data with us for this study. I would also like to thank Dr. Curtis Padgett at Jet Propulsion Laboratory and the following professors at USC for their supports: Prof. Alexander Sawchuk , Prof. Antonio Ortega, Prof. Jay Kuo and Prof. Gerard Medioni. I am also thankful to my former professors Dr. Gholamreza Rakhshandehroo and Dr. Parviz Monadjemi. During the course of my study, I was blessed to interact with extraordinary people, and I am grateful for their advice and friendship: Dr. Firooz Aflatouni, Dr. Ehsan Afshari, Dr. Mehdi Ahmadizadeh, Dr. Farnoush Banaei-Kashani, Ali Bolourchi, Dr. Karthikeyan Chockalingam, Armen Der-Kevorkian, Dr. Alireza Doostan, Ali Habibi, Miguel Ricardo Hernandez, Lance Hill, Reza Jafarkhani, Dr. Mehrdad Jahangiri, Dr. Jonathan Kelly, Vahid Keshavarzzadeh, Hadi Mei- dani, Dr. Reza Dehghan Nayeri, Arash Noshadravan, Dr. Farzad Tasbihgoo and Dr. Hae-Bum Yun. iii Finally, I would like to express my sincere gratitude to my parents and family who worked hard to provide me with all the opportunities possible. Last but not least, I thank my amazing wife Zahra, who has been my source of support, encouragement, and inspiration throughout my graduate career. She has been my personal editor and a great help to me. I share this achievement with her. iv Table of Contents Dedication ii Acknowledgements iii List of Tables viii List of Figures ix Abstract xviii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 Multi-Image Stitching and Scene Reconstruction for Evaluating Defect Evolution in Structures 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Multi-Image Stitching and Scene Reconstruction . . . . . . . . . . . . . . . . . 10 2.2.1 Keypoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Initial Keypoint Matching . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Image Selection and Outlier Exclusion . . . . . . . . . . . . . . . . . . . 15 2.2.4 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.5 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.6 Blending and Exposure Compensation . . . . . . . . . . . . . . . . . . . 26 2.3 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 Comparison of Image Stitching Algorithms . . . . . . . . . . . . . . . . 45 2.3.2 Minimum Measurable Defect . . . . . . . . . . . . . . . . . . . . . . . 46 2.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 3 Components of Image-Based Defect Detection of Structures 51 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.1 Neighborhood Averaging (Mean Filter) . . . . . . . . . . . . . . . . . . 52 3.2.2 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.3 Averaging of Multiple Images . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 v 3.3.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.4 Neuro-fuzzy Classification . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4 Wavelet Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Chapter 4 Adaptive Vision-Based Crack Detection and Quantification by Incor- porating Depth Perception Using 3D Scene Reconstruction 73 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 Image Acquisition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3 3D Scene Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4 Crack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.1 Review of Crack Segmentation Techniques . . . . . . . . . . . . . . . . 84 4.4.2 Morphological Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.4.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.4.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.4.5 Multi-Scale Crack Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.5 Crack Thickness Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.6 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 115 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 5 Effect of Depth Perception on Texture Analysis: An Application to Cor- rosion Detection 127 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.3 Problem Statement and Conducted Research . . . . . . . . . . . . . . . . . . . . 136 5.3.1 Fixed Sub-Image Window . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.3.2 Variable Sub-Image Window . . . . . . . . . . . . . . . . . . . . . . . . 140 5.3.3 Adaptive Sub-Image Window . . . . . . . . . . . . . . . . . . . . . . . 141 5.3.4 Resampling Sub-Image Window . . . . . . . . . . . . . . . . . . . . . . 141 5.4 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5 Corrosion Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Chapter 6 A Study of an Autonomous Image-Based Interpretation System for Con- dition Assessment of Sewer Pipelines 151 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 6.1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.2 Sewer Pipeline Inspection Systems . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.2.1 CCTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.2.2 KARO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 vi 6.2.3 MAKRO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.2.4 PIRAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.2.5 SSET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.2.6 PANORAMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.3 Automatic Detection and Classification of Defects in Sewer Pipeline Systems . . 159 6.4 Summary of state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.5 Proposed Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Chapter 7 Conclusions and Future Work 170 Bibliography 174 vii List of Tables 4.1 The performance of different classifiers on synthetic data . . . . . . . . . . . . . 108 4.2 The overall performance of the proposed system using real data . . . . . . . . . . 115 5.1 The categorization of the training samples based on the depth index . . . . . . . 147 viii List of Figures 2.1 Schematic hardware configuration of the image-based inspection system. . . . . 9 2.2 Schematic overview of the proposed image stitching procedure. First, automatic keypoints are detected in all the images. Keypoint are then matched between the current view and each database image. Database images that have greater num- ber of matching keypoints with the current view and can appropriately recon- struct the scene are selected. Next, keypoints between all the selected images (including the current view) are matched and the outliers are eliminated. Then, the bundle adjustment problem is solved. The selected images are composed and blended. Finally, the reconstructed view is cropped and compared with the current view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Harris detector - keypoints (indicated by the ‘+’ symbol). . . . . . . . . . . . . 13 2.4 SIFT detector - keypoints (indicated by the ‘+’ symbol). . . . . . . . . . . . . . 14 2.5 Harris detector - 75 putative matched keypoints in two different images of a single truss system (matched keypoints are connected by matching lines). . . . 14 2.6 SIFT detector - 75 putative matched keypoints in two different images of a sin- gle truss system (matched keypoints are connected by matching lines). . . . . . 15 2.7 Harris detector - RANSAC is used to exclude the outliers. . . . . . . . . . . . . 17 2.8 SIFT detector - RANSAC is used to exclude the outliers. . . . . . . . . . . . . 18 2.9 Matching SIFT keypoints in two overlapping images (matched keypoints are connected by matching lines). Red (dark) matching lines show the outliers iden- tified by RANSAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.10 A schematic history of bundle adjustment (Triggs et al., 2000). . . . . . . . . . 20 ix 2.11 (a) Image stitching using a linear image blending, (b) image stitching using the Laplacian pyramid blending. The visual artifacts (i.e., the ‘ghosting effect’ at the gusset plate and the column, and blurriness of the diagonal member) in image (a), due to radial distortion and mis-registration, are eliminated in image (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.12 An image database consisting of: (a) 24 images of a truss system, (b) 24 im- ages of a full-scale structural model, (c) 24 images of a typical hospital ceiling support structure, and (d) 32 images of an MR damper with its attached structures. 31 2.13 (a) Current-view image of a truss system, (b), (c), and (d) are three image matches autonomously selected from the database. Note that a yellow tape is attached to the truss gusset plate in the current-view image. This tape is absent in the images from the database. . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.14 The reconstructed scene and the contribution of the selected images in Fig. 2.13. 34 2.16 (a) Current-view image, (b) the reconstructed scene after blending, exposure compensation, and cropping. The yellow tape (shown by red circle) on the truss gusset plate in image (a) is absent in image (b). . . . . . . . . . . . . . . . . . 34 2.15 Six level of the Laplacian pyramids and their corresponding Gaussian weight pyramids for each of the selected images in Fig. 2.13 (see Eqns. (2.22) through (2.27)) . The laplacian pyramids for the images are presented on the left side of each row while their corresponding Gaussian weight pyramids are presented on the right side. Linear combination of the Laplacian pyramids at each level using the corresponding weights yields to the seamless reconstruction of the scene as described in Section 2.2.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.17 (a) The scene reconstruction and the contribution of five selected images from the database (Fig. 2.12), (b) current-view image of a full-scale structural model, (c) scene reconstruction using a linear blending, (d) scene reconstruction using the Laplacian pyramid blending, and (e) scene reconstruction using the Lapla- cian pyramid blending and exposure compensation. There is an aluminum ele- ment in image (e) which is absent in image (b). . . . . . . . . . . . . . . . . . 37 2.18 (a) The reconstruction and the contribution of three selected images from the database (Fig. 2.12), (b) current-view image of a typical hospital ceiling facili- ties, and (c) scene reconstruction using the Laplacian pyramid blending and the exposure compensation. Note that a ceiling tile is missing in image (a). . . . . . 38 2.19 (a) The reconstruction and the contribution of twelve selected images from the database (Fig. 2.12), (b) current-view image of an MR damper (zoomed out), and (c) scene reconstruction using the Laplacian pyramid blending and exposure compensation. Note that two missing nuts in image (b) are shown by red circles in image (c). The lighting condition is different in images (a) and (b). . . . . . 39 x 2.20 (a) Current-view image of an MR damper (zoomed in), (b) the selected image from the database (Fig. (2.12)), (c) reconstructed scene, and (d) difference of images (a) and (c). Note that there are two missing nuts is image (a) in compar- ison with image (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.21 Three image databases of a structural system captured at different time periods: (a), (b), and (c) images of a structural system captured at time periodst 1 ,t 2 , and t 3 , respectively (t 1 <t 2 <t 3 ). . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.22 Change evolution in a structural system: (a), (b), and (c) scene reconstructions of a structural system (beam-column connection) at time periodst 1 ,t 2 , andt 3 , respectively (t 1 <t 2 <t 3 ). (d) Current-view image of the same structural system. 42 2.23 Two image databases of a truss system captured at different time periods: (a) and (b) images of a truss system captured at time periodst 1 andt 2 , respectively (t 1 <t 2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.24 Change evolution in a truss system: (a) current-view image of a truss system, (b) and (c) scene reconstructions of the same truss system at time periodst 2 and t 1 , respectively (t 1 <t 2 ). The changed region is shown with a red circle. . . . . 44 2.25 The scene reconstruction and the contribution of four selected images from the database captured ar rime t 2 (Figure 2.23(b)). The current-view image corre- sponding to this reconstruction is shown in Figure 2.24(a). . . . . . . . . . . . 45 2.26 Comparison of RMS errors using the current study and AutoStitch. Ten real image sets with different image resolutions and different number of stitching images are used to evaluate the registration errors. . . . . . . . . . . . . . . . . 46 3.1 An example of a 5 5 input image convolved by a 3 3 kernel. . . . . . . . . 54 3.2 2D Gaussian distribution with = 1. . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 A typical Gaussian kernel as an approximation of Gaussian distribution with = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 A pattern recognition system scheme. . . . . . . . . . . . . . . . . . . . . . . 56 3.5 (a) Corroded column and its background, (b) segmentation of the corroded-like area, (c) classification of pixels using thek-means classifier into three classes based on the RGB color vector of each pixel. . . . . . . . . . . . . . . . . . . . 58 3.6 Clustering of pixels of figure 3.5(c) in the Red-Green-Blue feature space. . . . 61 3.7 (a) First-order Sugeno fuzzy inference system with two inputs and two rules, (b) the equivalent ANFIS architecture (Jang et al., 1997). . . . . . . . . . . . . . . 65 xi 3.8 (a) Two-input Sugeno-type ANFIS architecture with nine rules, (b) nine fuzzy regions corresponding to the partitioned input space (Jang, 1993). . . . . . . . 67 3.9 2D discrete wavelet transform decomposition scheme. . . . . . . . . . . . . . . 69 3.10 2D discrete wavelet transform decomposition scheme. The approximation com- ponent of the (i) th decomposition stage can be decomposed to (i+1) th approx- imation, horizontal, vertical, and diagonal details. . . . . . . . . . . . . . . . . 69 3.11 Two-stage DWT decomposition of a truss image. . . . . . . . . . . . . . . . . 70 3.12 2D discrete wavelet transform reconstruction scheme. . . . . . . . . . . . . . . 70 4.1 The geometric relation between image acquisition parameters of a simple pin- hole camera model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Resolution, focal length and sensor size relation for 25 m working distance and 0.2 mm crack size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Resolution, focal length and sensor size relation for 35 m working distance and 0.2 mm crack size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4 Schematic overview and components of the SfM problem. . . . . . . . . . . . 81 4.5 3D scene reconstruction: (a) sixteen images of a scene taken from different locations, (b) the 3D reconstructed point cloud and camera poses. Each red cone represents a camera that corresponds to one of the images in (a). . . . . . 83 4.6 The overview scheme of the proposed crack detection approach. . . . . . . . . 84 4.7 Multiresolution decomposition of the image used by Siegel and Gunatilake (1998). 91 4.8 (a) Vertical crack caused by a tensile rupture of a steel strip, (b) horizontal crack caused by a torsional rupture of a steel rebar. . . . . . . . . . . . . . . . . . . . 93 4.9 (a) 1 5 flat structuring element domain used in Figs. 4.10, 4.11, 4.12, 4.13, 4.14, 4.16, and 4.17, (b) 9 1 flat structuring element domain used in Figs. 4.10, 4.11, 4.12, 4.13, 4.14, 4.16, and 4.17. . . . . . . . . . . . . . . . . . . . . 94 4.10 (a) Dilation performed by a 1 5 structuring element, (b) dilation performed by a 9 1 structuring element. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.11 (a) Erosion performed by a 1 5 structuring element, (b) erosion performed by a 9 1 structuring element. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.12 (a) Morphological gradient performed by a 1 5 structuring element, (b) mor- phological gradient performed by a 9 1 structuring element. . . . . . . . . . 96 xii 4.13 (a) Opening performed by a 1 5 structuring element, (b) opening performed by a 9 1 structuring element. . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.14 (a) Closing performed by a 1 5 structuring element, (b) closing performed by a 9 1 structuring element. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.15 One-dimensional scheme of the opening and closing operations. The curve rep- resents the gray-scale level, the circles (structuring elements) beneath the curve pushing it up illustrate the opening operation, and the circles above the curve pushing it down represent the closing operation. . . . . . . . . . . . . . . . . . 98 4.16 (a) Bottom-hat operation performed by a 1 5 structuring element, (b) bottom- hat operation performed by a 9 1 structuring element. . . . . . . . . . . . . . 99 4.17 (a) Vertical dark crack segmented using morphological techniques, (b) horizon- tal dark crack segmented using morphological techniques. . . . . . . . . . . . . 100 4.18 Crack segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.19 Examples of crack segmentation in real structures: (a) original structural mem- ber 1, (b) crack segmentation of image (a) using a 70 70 matrix with (both main and minor) diagonal members of 1 and non-diagonal members of 0 as the structuring element, (c) original structural member 2, (d) crack segmentation of image (c) using a 10 10 matrix with 1’s on the minor diagonal and 0’s every- where else as the structuring element, (e) original structural member 3, (f) crack segmentation of image (e) using a unit matrix of 77 as the structuring element, (g) original structural member 4, (h) crack segmentation of image (g) using a 15 structuring element, (i) magnification of image (g) , (j) crack segmentation of image (i) using a 1 5 structuring element. . . . . . . . . . . . . . . . . . . 102 4.20 Extracted edges using Sobel edge detection technique. . . . . . . . . . . . . . . 103 4.21 (a), (c), (e), (g) and (i) are the original crack images obtained from breaking a concrete beam in laboratory. (b), (d), (f), (h) and (j) show the results of the proposed crack detection methodology. Red, green and blue boxes locate the detected cracks of 2, 4 and 6 pixels, respectively. . . . . . . . . . . . . . . . . 105 4.22 Crack of 0.4 mm detected from a 3 m distance. Red, green and blue boxes locate the detected cracks of 2, 4 and 6 pixels, respectively. . . . . . . . . . . . . . . . 106 4.23 Relationship between structuring element size, camera focal length, working distance, crack size, camera sensor size, and camera sensor resolution for a simple pinhole camera model. . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.24 Effect of decision making threshold on different performance indices for the proposed NN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 xiii 4.25 Kernels representing orientations. From the top left to the bottom right column, kernels of 0 , 5 , 10 , 15 , 20 , 25 , 30 , 35 , 40 , and 45 are shown, respec- tively. Other orientation kernels can be built based on the shown kernels. These kernels are correlated with the thinned crack map to identify the orientation of a crack at each centerline pixel. . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.26 An example of the proposed thickness quantification method. The white squares are crack pixels of a larger crack image. The blue squares represent the center- line obtained by thinning the crack object. The green squares, which correspond to 135 direction, indicate the thickness orientation at the red square. The num- ber of the thickness pixels in the horizontal and vertical directions are both 6 pixels, and the crack thickness at the red square is estimated as 8.5 pixels. . . . 112 4.27 Perspective error: (a) the camera orientation is perpendicular to the plane of the object (perspective error does not exist), (b) the camera orientation is not perpendicular to the plane of the object (perspective error exists), and (c) 2D representation of the perspective error about the camera’s ‘y’ axis. . . . . . . . 113 4.28 Detected cracks from a far distance: (a) a concrete model situated 20 meters away from the image acquisition system, (b) the detected cracks are shown in red. Each black box illustrates the boundaries of a correctly detected crack. False negative alarms are surrounded by dashed lines. A Canon EOS 7D along with an EF 600mm f/4L IS USM super telephoto lens were used to capture this image. The minimum thickness of the detected cracks in this figure was 0.1 mm and the working distance was 20,000 mm. For more details, see Fig. 4.29. . . . 116 4.29 Cracks with 0.1 mm thickness are detected from a 20-meter distance. . . . . . 117 4.30 Mean estimated thicknesses versus actual thicknesses: a total of 94,390 estima- tions (more than 10,000 estimations for each thickness). The mean estimated thickness for each actual thickness is shown by a ‘’ symbol. The lengths of confidence intervals shown in this figure are four times the standard deviation of estimated thicknesses for each point. The green line shows the ideal relationship between the actual and estimated thicknesses. . . . . . . . . . . . . . . . . . . 118 4.31 Histograms of estimated thicknesses. The horizontal axis of each histogram is normalized by subtracting the mean and dividing by the standard deviation. A normal histogram with the same mean and standard deviation as the estimated values is superposed on each histogram for comparison purposes. . . . . . . . . 119 4.32 Mean relative errors of thickness estimations. The mean relative errors are shown by ‘’ symbols. Lengths of confidence intervals shown in this figure are twice the standard deviation of the thickness estimation errors at each point. 121 xiv 4.33 Concrete crack: (a), (b), (c), (d), and (e) are crack images on a concrete sur- face taken from different angles. (f) is the sparse 3D scene reconstruction and recovery of the camera poses. . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.34 Crack detection: from left to right, the images in each column correspond to the structural element sizes 9, 11, 15, and 22 pixels, respectively. (a), (b), (c), and (d) are the extracted patterns in Fig. 4.33 (c) using Eqn. (4.33). (e), (f), (g), and (h) are the binarized images using Otsu threshold. (i), (j), (k), and (l) are the multi-scale crack maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.35 Detected crack: the detected cracks are shown in red. Each blue box illustrates the boundaries of a detected crack. . . . . . . . . . . . . . . . . . . . . . . . . 123 4.36 Thickness measurements at 15 different regions. A ‘pixel-counting’ method is used to obtain the reference measurements at each point. . . . . . . . . . . . . 124 4.37 Crack thickness quantification of 15 different regions indicated in Fig. 4.36. . . 124 5.1 Three-level wavelet decomposition notation used by Gunatilake et al. (1997). . 129 5.2 (a) Two-level wavelet decomposition of an image, (b) three-level wavelet de- composition of an image (notation used by Siegel and Gunatilake (1998)). . . . 131 5.3 HSI color space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4 The result of binarizing the saturation component of original images (a), (c), (e), (g), and (i), in order to extract the corroded area of each image, is presented in images (b), (d), (f), (h), and (j) respectively. . . . . . . . . . . . . . . . . . . . 137 5.5 Adaptive decomposition filters for Daubechies wavelet of order 4 for different sub-image window sizes: (a) low-frequency filters, (b)high-frequency filters. . . 141 5.6 Effect of different color spaces, number of features and different ‘fixed’ sub- image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.7 Effect of different color spaces, number of features and different ‘variable’ sub- image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.8 Effect of different color spaces, number of features and different ‘adaptive’ sub- image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.9 Effect of different color spaces, number of features and different ‘resampling’ sub-image windows on classification performance: (a) mean of AUCs, (b) stan- dard deviation of AUCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 xv 5.10 The performance of different proposed scenarios obtained from ten times train- ing and testing of the NNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.11 Processed images using ‘fixed’ and ‘variable’ sub-image window: (a), (d) and (g) are the original images, (b),(e) and (e) are the processed images using the ‘fixed’ sub-image approach, and (c), (f) and (i) are the processed images using the ‘variable’ sub-image approach. . . . . . . . . . . . . . . . . . . . . . . . . 149 6.1 (a) CCTV underground inspection process, (b) CCTV camera, (c) inside CCTV trailer, (c) CCTV video monitor. . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.2 (a) Front view of the MAKRO robot (Rome et al., 1999), (b) top view of the MAKRO robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.3 SSET sample report prepared by Blackhawk-PAS, Inc. . . . . . . . . . . . . . 158 6.4 (a) PANORAMO robot, (b) inspection of a single point from two different po- sitions by the PANORAMO system (M¨ uller and Fischer, 2007). . . . . . . . . . 159 6.5 (a) A typical snapshot of the CCTV video, (b) opening and thresholding of image (a), (c) gray scale segmented odometer area, (d) the segmented digit 5, (e) intensity values of image (d). . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.6 (a) A joint on a Vitrified Clay Pipe (VCP) captured on a CCTV video frame, (b) extraction of the joint by subtracting the eroded image from the original image, (c) extraction of the joint by applying the morphological top-hat operation on the original image, (d) extraction of the joint by applying the morphological op- eration described in Eqn. (4.31) on the original image. The structuring element used is a 15 15 square. Note that the colors of images (b), (c), and (d) are thresholded and inverted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.7 (a) The joint of a concrete pipe captured on a CCTV video frame, (b) extraction of the joint by applying the morphological operation described in Eqn. (4.31) on the original image. The results of using four structuring elements are fused. Vertical, horizontal, and two diagonal elements of length 15 and width 1 are used as the structuring elements. Note that the color of image (b) is thresholded and inverted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.8 (a) A lateral on a concrete pipe captured on a CCTV video frame, (b) extraction of the lateral by applying the morphological bottom-hat operation on the origi- nal image, (c) postprocessing by applying a 15 15 median filter. The results of using two structuring elements are fused. Vertical and horizontal elements of length 35 and width 1 are used as the structuring elements. Note that the colors of images (b) and (c) are thresholded and inverted. . . . . . . . . . . . . . . . . 168 xvi 6.9 (a) A crack on a sewer pipe captured on a CCTV video frame, (b) the result of applying the morphological bottom-hat operation on the original image using a 3 35 structuring element, (c) postprocessing (using a 10 10 median fil- ter) and thresholding of image (b), (d) the result of applying the morphological thinning operation. Note that the colors of images (c) and (d) are inverted. . . . 169 xvii Abstract Automated health monitoring and maintenance of civil infrastructure systems is an active yet challenging area of research. Current structure inspection standards require an inspector to visually assess structure conditions. A less time-consuming and inexpensive alternative to cur- rent monitoring methods is to use a robotic system that can inspect structures more frequently, and perform autonomous damage detection. Nondestructive evaluation techniques (NDE) are in- novative approaches for structural health monitoring. Among several possible techniques, the use of optical instrumentation (e.g., digital cameras), image processing and computer vision are promising approaches as nondestructive testing methods for structural health monitoring to com- plement sensor-based approaches. The feasibility of using image processing techniques to detect deterioration in structures has been acknowledged by leading researches in the field. This study represents the efforts that have been taken place by the author to form, implement, and evaluate several vision-based approaches that are promising for robust condition assessment of structures. It is well-recognized that civil infrastructure monitoring approaches that rely on visual ap- proaches will continue to be an important methodology for condition assessment of such systems. Part of this study presents and evaluates the underlying technical elements for the development of an integrated inspection software tool that is based on the use of inexpensive digital cameras. For this purpose, digital cameras are appropriately mounted on a structure (e.g., a bridge) and can zoom or rotate in three directions (similar to traffic cameras). They are remotely controlled by an inspector, which allows the visual assessment of the structure’s condition by looking at im- ages captured by the cameras. By not having to travel to the structures site, other issues related to safety considerations and traffic detouring are consequently bypassed. The proposed system gives an inspector the ability to compare the current (visual) situation of a structure with its former con- dition. If an inspector notices a defect in the current view, he/she can request a reconstruction of the same view using images that were previously captured and automatically stored in a database. xviii Furthermore, by generating databases that consist of periodically captured images of a structure, the proposed system allows an inspector to evaluate the evolution of changes by simultaneously comparing the structure’s condition at different time periods. The essential components of the proposed virtual image reconstruction system are: keypoint detection, keypoint matching, image selection, outlier exclusion, bundle adjustment, composition, and cropping. Several illustrative examples are presented to demonstrate the capabilities, as well as the limitations, of the proposed vision-based inspection procedure. Visual inspection of structures is a highly qualitative method. If a region is inaccessible, binoculars must be used to detect and characterize defects. Although several NDE methods have been proposed for inspection purposes, they are nonadaptive and cannot quantify crack thickness reliably. A contact-less remote-sensing crack detection and quantification methodology based on 3D scene reconstruction (computer vision), image processing, and pattern recognition concepts is introduced. The proposed approach utilizes depth perception to detect cracks and quantify their thickness, thereby giving a robotic inspection system the ability to analyze images captured from any distance and using any focal length or resolution. This unique adaptive feature is especially useful for incorporating mobile systems, such as unmanned aerial vehicles (UA V), into structural inspection methods since it would allow inaccessible regions to be properly inspected for cracks. Guidelines are presented for optimizing the acquisition and processing of images, thereby en- hancing the quality and reliability of the damage detection approach and allowing the capture of even the slightest cracks (e.g., detection of 0.1 mm cracks from a distance of 20 m), which are routinely encountered in realistic field applications where the camera-object distance and image contrast are not controllable. Corrosion is another crucial defect in structural systems that can lead to catastrophe if it is neglected. A novel adaptive approach based on multi-resolution wavelet analysis, color analysis and depth perception is proposed that drastically improves the performance of the defect detection xix algorithm. The main contribution of this part is the integration of the depth perception with the pattern classification algorithms, which has never been done in previous studies. Several analyt- ical evaluations are presented to illustrate the capabilities of the proposed system. Furthermore, the area of the corroded regions are quantified using the retrieved depth information. Insufficient inspections and maintenance of sewer pipes are the primary causes of today’s poor pipeline conditions. CCTV surveys, which is the most commonly used pipeline inspection technique in the United States, are both costly and laborious. Furthermore, they are subjective to an inspector’s level of experience, attentiveness, and fatigue. A preliminary study on autonomous condition assessment of sewer pipelines based on CCTV surveys is presented. The proposed sys- tem will analyze CCTV video frames and provide inspectors with a report about probable defects and their precise locations. With this system, an inspector is not required to watch an entire CCTV video; rather, he only has to assess the locations suggested by the system. Several segmentation techniques are tested and evaluated to extract joints, laterals, and cracks in sewer pipelines. Mor- phological operators are most effective in this regard. Several examples are presented to illustrate the capabilities of these promising algorithms. xx Chapter 1 Introduction Change detection by means of digital image processing has been used in several fields including homeland security and safety (Shinozuka, 2003), product quality control (Garcia-Alegre et al., 2000), system identification (Chung et al., 2004, Shinozuka et al., 2001), aircraft skin inspection (Siegel and Gunatilake, 1998), video surveillance (Collins et al., 2000), aerial sensing (Huertas and Nevatia, 2000, Watanabe et al., 1998), remote sensing (Bruzzone and Serpico, 1997, Deer and Eklund, 2002, Goldin and Rudahl, 1986, Peng et al., 2004), medical applications (Bosc et al., 2003, Dumskyj et al., 1996, Lemieux et al., 1998, Rey et al., 2002, Thirion and Calmon, 1999), underwater inspection (Edgington et al., 2003, Lebart et al., 2000), transportation systems (Achler and Trivedi, 2004), and nondestructive structural health monitoring (Abdel-Qader et al., 2003, Dudziak et al., 1999, Poudel et al., 2005, Sinha et al., 2003). These applications of digital image processing share common steps that have been reviewed by Singh (1989), Coppin and Bauer (1996), Lu et al. (2004) and most recently by Radke et al. (2005). Because change detection techniques are problem-oriented, most image processing approaches are limited to detecting only one type of defect at a time. 1 1.1 Motivation Civil infrastructure system assets represent a significant fraction of the global assets and in the United States are estimated to be worth $20 trillion. These systems are subject to deterioration due to excessive usage, overloading, and aging materials, as well as insufficient maintenance and inspection deficiencies. Bridges constitute one of the major civil infrastructure systems in the U.S. According to the National Bridge Inventory (NBI), more than 10,400 bridges are categorized as structurally deficient (Chong et al., 2003). According to the California’s Infrastructure Report Card published by ASCE in 2006, an annual investment of $37 billion is required for California’s infrastructure systems. There is an urgent need to develop effective approaches for the inspection and evaluation of these bridges. In addition, periodical inspections and maintenance of bridges will prolong their service life (McCrea et al., 2002). The National Bridge Inspection Program was established to inspect all highway bridges on the federal aid system by the Federal Highway Act of 1968. This mandatory inspection program was expanded to include all public bridges by Congress (Graybeal et al., 2002). According to the American Association of State Highway and Transportation Officials (AASHTO) Manual for Condition Evaluation of Bridges, there are five categories of inspections: initial inspections, routine inspections, in-depth inspections, damage inspections, and special inspections (Moore et al., 2001). Initial inspection is the first inspection of a new bridge. It ascertains the baseline for potential problem areas and estimates the structure inventory. Routine inspection is a periodic inspection of the bridge to identify changes with regards to the previous inspection. All the requirements of the National Bridge Inventory Standards (NBIS) must be satisfied in routine inspections. These requirements dictate the inspector’s qualifications and the frequency of inspections. In-depth inspection is a close-up inspection of a few structural members in which the defects cannot be detected by a routine inspection. In-depth inspections are less frequent than routine inspections. 2 Sophisticated NDE methods might be used for in-depth inspections to help the inspector detect deficiencies. Damage inspection is carried out in response to damage caused by human actions or environmental conditions. Special inspection is the monitoring of a known defect (Moore et al., 2001). Traditional bridge inspection is time-consuming and expensive because it requires an expert to visually inspect the structure site for changes. Yet, visual inspection remains the most commonly used technique to detect damages since many of the bridges in the United States are old and not instrumented with sensor systems. In cases of special structures, such as long-span bridges, access to critical locations for visual inspection can be difficult (Pines and Aktan, 2002). A robotic system that could inspect structures more frequently and is accompanied by other nondestructive techniques could be a great advance in infrastructure maintenance. The use of digital cameras, image processing, and pattern recognition techniques is an appropriate approach to reach this goal. A systematic approach that provides inspectors with the ability to inspect a structure remotely by controlling cameras at a bridge site can overcome the above shortcomings and avoid the costs of traffic detouring during the inspection. Cameras can be conveniently mounted on a structure (similar to the Department of Transportation traffic cameras), and in the case of bridges, the cameras can be mounted on bridge columns. Even though the cameras may be constrained in regard to translation, they can easily rotate in two or three directions. This can give inspectors the ability to assess a relatively large area covered by the camera. Determining the optimal number of cameras and their position are interesting problems that will not be discussed here. The main purpose of the current study is to enable inspectors to accurately and conveniently compare the structure’s current condition with its former condition. In the present study, a database of images captured by a camera is constructed automatically. If the inspector notices a defect in the current view, he or she can request the reconstruction of that view from the previously captured images. 3 In this way, the inspector can look at the current view and the reconstructed view simultaneously. Since the reconstructed view is based on images that are in the database and it virtually has the same camera pose of the current view, the inspector can easily compare the current condition of the structure with its previous condition and evaluate the evolution of defects. In this study, a novel image-based crack detection and quantification approach is introduced that includes autoadaptive features that have not been used in previous crack detection systems. The field implementation which is the main focus of this study, makes the current study different from other proposed crack detection techniques where the camera-object distance and the im- age contrast can be controlled. Unlike previous studies (e.g., finding cracks in paintings), the specimen under inspection cannot be investigated in a laboratory environment, and the 3D depth perception of the scene is used to adaptively detect and quantify cracks. Here, cracks are dark, spatially narrow and elongated objects that have contrast with the background. Corrosion is also an important defect in civil infrastructure systems. In order to segment corrosion-like regions from the rest of the image, both texture and color analysis should be used. Multi-resolution wavelet analysis is a powerful tool to characterize the appropriate features for texture classification. In this study, the traditional way of texture analysis is compared with three proposed approaches which are based on the depth information. In traditional wavelet analysis for corrosion detection, an image is divided into sub-images of fixed sizes (e.g., 8 8). In this study, it is shown that if a variable sub-image window size that represents a fixed area on the object under inspection is used as the datum, the classification performance and reliability of the system will dramatically increase. Through several experimental results, it is shown that the incorporation of the depth perception can improve the performance of a corrosion detection system significantly. Several experiments are carried out and the results are presented to evaluate the performance of different proposed scenarios. 4 The average service lifetime of a sewer pipeline is 70 years (Shehab-Eldeen, 2001), and many existing underground pipes were installed 50 years ago (Hashemi and Najafi, 2007). Insufficient inspection and maintenance of sewer pipes are the main reasons behind the poor condition of sewer pipes in Northern America. Closed Circuit Television (CCTV) survey is the most commonly used method for condition assessment of sewer pipelines in the United States (Gokhale et al., 1997). Inspection processes based on CCTV are done manually by human technicians who have to inspect thousands of miles of videotaped pipes to detect defects to prepare an assessment report. This inspection technique is subjective to an inspector’s experience, concentration, and fatigue. Furthermore, this approach is costly. The inspection of videotapes consumes 30% of the total cost for this procedure (Shehab- Eldeen, 2001). An autonomous system that can interpret the video captured by the CCTV can significantly decrease the subjectivity and inconsistency in interpretation, as well as the time and costs of inspections. Furthermore, pipes can be inspected more regularly by using such a system. Effective maintenance programs, which require regular inspections, will improve the performance and lifetime of sewer pipes, and consequently save large amounts of money. 1.2 Scope Chapter 2 introduces a multi-image stitching and scene reconstruction methodology that is used as an inspection tool for evaluating the effect evolution in structures. The details of the proposed system are presented and several illustrative examples are shown to demonstrate the capabilities of the proposed inspection tool. Chapter 3 is dedicated to describe the image processing and pattern recognition components that can be used for image-based defect detection of structures. A novel vision-based crack detection and quantification approach by incorporating depth perception is in- troduced and discussed in Chapter 4. In this chapter, several experimental results are presented to show the capabilities, as well as the limitations of the proposed system. Chapter 5 introduces a 5 texture analysis approach for corrosion detection in structures by incorporating the depth percep- tion. Several texture and color analysis scenarios are introduced and evaluated in this chapter. In Chapter 6, a study of an autonomous image-based interpretation system for condition assessment of sewer pipelines is presented. Conclusions and future works are summarized in Chapter 7. 6 Chapter 2 Multi-Image Stitching and Scene Reconstruction for Evaluating Defect Evolution in Structures 2.1 Introduction Even though many NDE techniques, such as liquid penetrate tests, ultrasonic testing, radiographic testing, magnetic flux leakage (magnetic flow), Eddy current, acoustic emission, electrochemical potential and resistance measuring, and infrared thermography have been developed for the in- spection of bridge structures (McCrea et al., 2002), visual inspection is the predominant method used for the inspection of bridges. In many cases, other NDE techniques are compared with vi- sual inspection results (Moore et al., 2001). Visual inspection is a labor-intensive task that must be carried out at least bi-annually in many cases (Chang et al., 2003). Recently, more effort has been dedicated to the improvement of NDE’s other than visual in- spection. Mizuno et al. (2001) developed an interactive support system that can enable inspectors to reach a decision at the bridge site using a wearable computer and a mobile communication device. Such a support system provides enhanced technical knowledge to the inspector. Choset 7 and Henning (1999) developed a remotely controlled serpentine robot that can be used for visual inspection of bridges using a mounted sensor suite. The serpentine robot is flexible enough to access all the points of a bridge structure. These techniques have yet to be adopted in current inspection processes. Gonz´ alez-Aguilera and G´ omez-Lahoz (2009) developed a pipeline for di- mensional analysis of bridge structures from an image. Kim et al. (2009) conducted a preliminary study to show the feasibility of autonomous monitoring of bridge construction progress based on image analysis. Jahanshahi et al. (2009) surveyed and evaluated several image-based approaches for automatic defect detection of bridge structures. None of the mentioned studies provide a pipeline for an inspector to visually assess the defect evolution in a structure. The visual inspection of structures is a subjective process that depends on the inspector’s ex- perience and focus. Furthermore, inspectors who feel comfortable with height and lift spend more time finishing their inspection and are more likely to locate defects (Graybeal et al., 2002). Diffi- culties in accessing some parts of a bridge hinders the transmission of knowledge and experience from an inspector to other inspectors. Consequently, improving the skills and experiences of in- spectors will take much time and effort using current visual inspection practices (Mizuno et al., 2001). A systematic approach that provides inspectors with the ability to inspect a structure remotely by controlling cameras at a bridge site can overcome the above shortcomings and avoid the costs of traffic detouring during the inspection. Cameras can be conveniently mounted on a structure (similar to the Department of Transportation traffic cameras), and in the case of bridges, the cameras can be mounted on bridge columns. Even though the cameras may be constrained in regard to translation, they can easily rotate in two or three directions. This can give inspectors the ability to assess a relatively large area covered by the camera. Determining the optimal number of cameras and their position are interesting problems that will not be discussed here. The main purpose of the current study is to enable inspectors to accurately and conveniently compare the 8 On-site Server Off-site PC Inspector On-Site Cameras Bridge Structure Figure 2.1: Schematic hardware configuration of the image-based inspection system. structure’s current condition with its former condition. In the present study, a database of images captured by a camera is constructed automatically. If the inspector notices a defect in the current view, he or she can request the reconstruction of that view from the previously captured images. In this way, the inspector can look at the current view and the reconstructed view simultaneously. Since the reconstructed view is based on images that are in the database and it virtually has the same camera pose of the current view, the inspector can easily compare the current condition of the structure with its previous condition and evaluate the evolution of defects. Figure 2.1 shows a simplified schematic hardware configuration of the proposed inspection system. A description of the inspection problem to be solved is presented in Section 2.2. Several components of the automatic multi-image stitching (i.e., image registration) are described in this section. Section 2.2.1 introduces and evaluates several keypoint detection techniques for image registration purposes. Section 2.2.2 describes keypoint matching between multiple images. Image selection and outlier exclusion are discussed in Section 2.2.3. Bundle adjustment is introduced in 9 Section 2.2.4. Composition and blending of multiple images in one reference view are briefly re- viewed in Sections 2.2.5 and 2.2.6, respectively. Experimental results are discussed and evaluated in Section 2.3. Section 2.4 includes a summary and suggested future work. 2.2 Multi-Image Stitching and Scene Reconstruction Image stitching algorithms are among the most widely used algorithms in computer vision (Szeliski, 2006). In the current study, the proposed procedure has similarities to panoramic reconstruction of a view (i.e., in both cases the camera is constrained in regard to translation). Hence, this study benefited from the available extensive research literature for panorama image stitching (Brown and Lowe, 2003, Hartley and Zisserman, 2000, Szeliski, 2006). Panorama image stitching has several commercial applications (Brown and Lowe, 2003); however, to the best of the author’s knowledge, the assessment of defect evolution in structures has not been among these applica- tions. PhotoStitch is an image stitching software that is bundled with Canon digital cameras. This software requires an initialization, such as a horizontal or vertical sweep (Brown, 2005). Au- toStitch is a panoramic image stitching software that automatically recognizes panoramas in an unordered set of images. AutoStitch transfers and stitches images in a spherical coordinate sys- tem (Brown, 2005). Commercial photo stitching products Autopano Pro, Serif PanoramaPlus, and Calico are all based on AutoStitch. These commercial products have more advanced stitching capabilities than AutoStitch. Microsoft Image Editor is also a free advanced panoramic image stitcher developed by Microsoft Research. Panorama Tools is a set of programs and libraries for re-projecting and blending multiple images to construct panoramas. Panorama Tools is the core engine for many panorama Graphical User Interface (GUI) front-ends, such as PTgui and hugin. There are generally two types of image alignment and stitching algorithms: direct and feature- based. Even though direct stitching benefits from using all of the image data and can yield highly 10 accurate registrations, feature-based stitching is faster, more robust, and has the ability to automat- ically detect overlapping relationships among a set of unordered images (Brown, 2005, Szeliski, 2006). The latter characteristic of feature-based algorithms is ideal for the purpose of the current study. Therefore, the proposed system in this study is essentially a feature-based image stitching algorithm. In order to reconstruct a view from a large collection of captured images in the database and project it based on the current camera pose, the images should be selected automatically from the database. For this purpose, automatic “keypoints” should be detected. In the next step, images that have greater number of matching keypoints with the current view should be identified. A procedure to select the images from the database is introduced below. The next step is to eliminate the outlier matching keypoints. Then, the camera poses for each of the selected views and the current view are computed. This is the bundle adjustment problem. The stitching of the selected images will take place after this step. Finally, the postprocessing will take place to give a smooth comparable image. In this section, the above components are introduced and discussed. Figure 2.2 shows a schematic overview of the proposed image stitching procedure described above. 2.2.1 Keypoint Detection In keypoint detection, control points such as distinctive objects, edges, topographies, points, line intersections, and corners are detected. There are many well-developed algorithms that can detect the control points. The Moravec corner detection algorithm is a simple and fast point feature algorithm (Moravec, 1977, 1979); however, it is sensitive to rotation such that the points extracted from one image are different from those extracted from the same image that has been rotated (anisotropic response) (Parks and Gravel, 2005). The Moravec operator is also highly sensitive to edges, which means that anything that looks like an edge (i.e., noise) may cause the intensity 11 x x x x x x x xx xxx x xx x x x x x x x x xx xx xx x x x x x x x x x x xx xxx x xx x x x x x x x x xx xx xx x x x x x x x x x x xx xxx x xx x x x x x x x x xx xx xx x x x Image Database Current View Keypoint Detection Keypoint Matching Image Selection x x x x x x x xx xxx x xx x x x x x x x x xx xx xx x x x Outlier Exclusion Bundle Adjustment Composition Blending Cropping Comparison Reconstructed View Figure 2.2: Schematic overview of the proposed image stitching procedure. First, automatic keypoints are detected in all the images. Keypoint are then matched between the current view and each database image. Database images that have greater number of matching keypoints with the current view and can appropriately reconstruct the scene are selected. Next, keypoints between all the selected images (including the current view) are matched and the outliers are eliminated. Then, the bundle adjustment problem is solved. The selected images are composed and blended. Finally, the reconstructed view is cropped and compared with the current view. variation to become significant. The intensity variation is the main criterion that detects a corner in this method (Parks and Gravel, 2005). The Harris (Plessey) operator (Harris and Stephens, 1988) is another algorithm, which has a higher detection rate than the Moravec operator. The former operator is more robust than the latter one in terms of repeatability (Parks and Gravel, 2005). On the other hand, the Harris detector is computationally more costly than the Moravec detector. Since the Harris technique is based on gradient variations, it is also sensitive to noise. Recent modifications have made this method capable of responding isotropically (Parks and Gravel, 2005). Figure 2.3 shows 563 detected keypoints in a real 3D truss model using the Harris operator. 12 Figure 2.3: Harris detector - keypoints (indicated by the ‘+’ symbol). Scale-Invariant Feature Transform Scale-Invariant Feature Transform (SIFT) (Lowe, 2004) is a popular choice for keypoint detection. SIFT keypoints are invariant to changes in scale and rotation, and partially invariant to changes in 3D viewpoint and illumination. The SIFT operator is also highly discriminative and robust to significant amounts of image noise. Keypoints are identified by finding local extrema in a scale- space representation of the image. For each extremum point, SIFT then computes a gradient orientation histogram over a region around the point producing a 128-element descriptor vector. Although SIFT offers better repeatability than the Moravec or Harris operators, the algorithm is computationally more expensive. Figure 2.4 shows 600 detected keypoints of the same truss system sown in Fig. 2.3 using the SIFT detector. 2.2.2 Initial Keypoint Matching At this stage, the detected keypoints from the current-view image are matched with the detected keypoints of the database images. The matching keypoints are used as the criterion for similar- ity comparison between the current-view image and the database images. An initial estimate is necessary to identify the correspondences. A common technique used for making the estimation is calculating a quantitative descriptor for each keypoint. Next, a distance matrixA is computed, 13 Figure 2.4: SIFT detector - keypoints (indicated by the ‘+’ symbol). where each A ij element indicates the closeness of the i th keypoint descriptor in the target im- age and thej th feature descriptor in the reference image (Ringer and Morris, 2001). The initial matching keypoints are selected as the smallest elements ofA. There should not be any row or column selected more than once. Ringer and Morris (2001) and Scott and Longuet-Higgis (1991) have provided further details about different descriptor distance matrix calculation methods. Figures 2.5 and 2.6 show examples of 75 putative matched keypoints in two different images of a single truss system using the Harris and SIFT detectors, respectively. Figure 2.5: Harris detector - 75 putative matched keypoints in two different images of a single truss system (matched keypoints are connected by matching lines). 14 Figure 2.6: SIFT detector - 75 putative matched keypoints in two different images of a single truss system (matched keypoints are connected by matching lines). Each SIFT keypoint has a 128-element descriptor vector assigned to it. The Euclidean dis- tances between each keypoint’s descriptor vector in the reference (current view) image and any of the keypoint descriptor vectors in the input image (any of the database images) are computed. An effective matching strategy was introduced by Lowe based on comparing the distance of the clos- est neighbor to that of the second-closest neighbor (Lowe, 2004). In the current study, all matches in which the distance ratio is greater than 0.6 were rejected. By doing this, about 95% of the false matches are discarded. Further details related to this approach may be found in (Lowe, 2004). High false match elimination rate guarantees the functionality of the initial keypoint matching as a resemblance criterion between images. Figure 2.9 shows the matching of 42 SIFT keypoints in two overlapping images using the above technique. There are few outliers among these matching keypoints. 2.2.3 Image Selection and Outlier Exclusion At this stage of the data processing, the images in the database that have overlaps with the current- view image are selected. One approach is to select a fixed number of images that have the greatest 15 number of matching keypoints (i.e., overlap or resemblance) with the current-view image; how- ever, in some cases a scene can be reconstructed by fewer images which reduces the registration error. In order to select the optimum number of images, the following procedure is proposed. All the images that have a number of initial matching keypoints greater than a threshold were selected. Images which have more than 40 matches with the current-view image were selected. In order to improve the correspondence estimation, outliers (defined as incorrect matching key- points) are identified and excluded. Random Sample Consensus (RANSAC) is used to compute homography between two images as well as to find outliers. RANSAC Algorithm When there are two images of a single scene taken from different view points, any point in one image lies along a line in the other image (Faugeras, 1993). This condition could be imposed as the following constraint: u T Fv = 0; (2.1) whereu andv are the points in the two images, and are expressed in the form of [x;y; 1] T . The Fundamental Matrix F uses Eqn. (2.1) to relate any two corresponding points of two images, which represent the same point of a scene. More details and estimation techniques of this matrix are provided by Torr and Murray (1997), Zhang (1998), and Hartley and Zisserman (2000). In the RANSAC algorithm, a number of keypoints are randomly chosen to compute the matrix F . These keypoints are initially estimated to be appropriate matches using the descriptor distance matrix. For each pair of corresponding keypoints, the error is defined as the result of calculating the left-hand side of Eqn. (2.1) for the estimated F . Those pairs with errors greater than a threshold are detected as outliers. This procedure is repeated several times until the least amount of total error is calculated, and the minimum number of outliers are detected. 16 The RANSAC algorithm can be used to compute homography between two images as well as find outliers. The concept is similar to the above; however, instead of the fundamental matrixF , the homography matrixH is computed. In this case,u andv are related as: u =Hv: (2.2) Since four correspondences determine a homography, four pairs of matching keypoints are ran- domly selected by RANSAC to compute the initial estimate of the homography. Symmetric transfer error is used as the cost function to optimize the homography: d 2 transfer =d(u;Hv) 2 +d(v;H 1 u) 2 ; (2.3) whered(u;v) is the Euclidean distance between the homogeneous pointsu andv. Figures 2.7 and 2.8 show 75 matched keypoints detected by the Harris and SIFT detectors using the RANSAC algorithm. These two figures contain fewer mismatched keypoints compared to Figs. 2.5 and 2.6 in which the RANSAC algorithm is not used. Figure 2.7: Harris detector - RANSAC is used to exclude the outliers. 17 Figure 2.8: SIFT detector - RANSAC is used to exclude the outliers. If w is the probability that a keypoint match is correct between two images (w=number of inliers / total number of data points) andr points (independently selected) are needed to estimate matrixF , the probability that all of ther points are inliers isw r . Consequently, the probability that at least one of ther points is not an inlier is 1w r , which is also the probability of estimating a bad model from ther selected points. (1w r ) n indicates the probability of selectingn point sets, where each point set consists of r points, and at least one outlier exists in each point set. Eventually, the probability of finding the correctF aftern trials is: p(F is correct) = 1 (1w r ) n : (2.4) For the purpose of this study r = 4. Assuming w = 0:5 and n = 400, the probability of not finding the correctF after 400 tries is less than 7 10 12 . RANSAC can be improved by minimizing the median error instead of the total error, also known as the Least Median Squares (LMedS) (Ringer and Morris, 2001, Torr and Murray, 1997); however, the latter technique is not efficient in the presence of Gaussian noise (Rousseeuw, 1987). Now, the image that has the greatest number of matching keypoints with the current-view image is transformed onto the current-view image (using the estimated homography by RANSAC) 18 to find its projection boundaries on the current-view image. Then, the current-view image is updated by setting the pixel values in the projection region to zero (i.e., that projection region will be eliminated from the current-view image). The above procedure is repeated using the remaining images and the updated current-view image until the updated current-view image turns into a black scene (which means the selected images cover the whole current-view image). If, after one iteration, none of the remaining selected images have any matching keypoints with the updated current-view image, the latter one is updated by stretching the remaining regions by 10% in the horizontal and vertical directions. This iteration continues until the updated images turn into a black scene. Figure 2.9 shows the matching SIFT keypoints between two overlapping images. The RANSAC algorithm identified 8 outlier matches. (a) (b) Figure 2.9: Matching SIFT keypoints in two overlapping images (matched keypoints are con- nected by matching lines). Red (dark) matching lines show the outliers identified by RANSAC. 2.2.4 Bundle Adjustment Bundle Adjustment (BA) is usually the last step of multi-view structure and motion reconstruction procedures that are feature-based (Lourakis and Argyros, 2004). BA aims to optimize 3D structure and viewing parameters (e.g., camera pose and intrinsic calibration) simultaneously, from a set of geometrically matched keypoints, from multiple views. In fact, BA is a large sparse geometric 19 estimation problem in which the parameters consist of camera poses and calibrations, as well as 3D keypoint coordinates (Triggs et al., 2000). Figure 2.10 shows a schematic history of bundle adjustment. matching Grün & Baltsavias 1985-92 Modern sparse matrix techniques Image-based enforced least-squares’ & uncertainty criterion matrices Baarda 1973 S transforms & Recursive partitioning matching Gauge freedom ~5x less error using ‘Self-calibration’ partitioning Inner & outer reliability Baarda 1964-75 ‘Data snooping’ constraints Recursive ‘Inner’ covariance & uncertain frames Helmert ~1880’s multiphoto’ & ‘Globally Gyer & Brown 1965-7 Free network adjustment Meissl 1962-5 ~1000 image aerial block ‘Geometrically constrained Brown 1964-72 Gauss, Legendre ~1800 Gaussian elimination Gaussian distribution models empirical camera modelling Least squares, BLUE Förstner, Grün 1980 1960 2000 Modern robust statistics & model selection photogrammetric reliability accuracy = precision + reliability over-parametrization & model choice 1990 Brown 1958-9 Calibrated Bundle Adjustment ~10’s of images 1800 1970 1980 Figure 2.10: A schematic history of bundle adjustment (Triggs et al., 2000). Assuming thatn 3D points are viewed inm images and x ij is the measured (detected) pro- jection of thei th point on thej th image. v ij is equal to one when thei th point is visible in the j th image, and it is equal to zero otherwise. Assuming that the 3D coordinates of thei th point are represented by vector b X i and that thej th camera parameters are represented byP j , BA minimizes the total reprojection error with respect to the 3D coordinates of all points and camera parameters as follows (Lourakis and Argyros, 2004): min b P j ; b X i n X i=1 m X j=1 v ij d( b P j b X i ; x ij ) 2 ; (2.5) 20 whered(x; y) is the Euclidean distance between the homogeneous points x and y. b P j b X i is the predicted projection of thei th point on thej th image.P is a 34 homogeneous camera projection matrix and it is formulated as: P =KR[Ij ~ C]; (2.6) and K = 0 B B B B B @ x s x 0 0 y y 0 0 0 1 1 C C C C C A ; (2.7) whereK is the camera calibration matrix, x and y represent the focal length of the camera in terms of pixels in thex andy directions respectively, ~ x = (x 0 ;y 0 ) is the principal point which is the projection of the camera’s center on the image plane,s is the skew between the axes, which is usually zero except for rare cases (e.g., image of an image), R is the 3 3 rotation matrix representing the orientation of the camera coordinate frame,I is the 3 3 identity matrix, and ~ C is a vector that represents the coordinates of the camera’s center. The general projective camera has 11 degrees of freedom: five forK, three forR, and three for ~ C. Parameters included inK are the internal camera parameters, while parameters inR and ~ C are the external parameters. Each point has three degrees of freedom. Assuming thatn points are seen inm views (this assumption does not affect the generality of this approach), Eqn. (2.5) requires minimization over 3n + 11m parameters. If the image measurement noise is zero-mean Gaussian, BA is the Maximum Likelihood (ML) estimator (Hartley and Zisserman, 2000). Unit quaternions conveniently represent the orientations and rotations of objects in 3D spaces. They are simpler to compose compared to Euler angles, and automatically avoid singularities (e.g., gimbal lock) (Hart et al., 1994). A rotation is encoded by just four real numbers when quaternions are used. Quaternions are less costly and numerically more stable than rotation ma- trices. Good references for quaternion rotations are presented by Hart et al. (1994) and Vicci 21 (2001). Using unit quaternions, the predicted projection b P j b X i can be written as (Lourakis and Argyros, 2004): b P j b X i = Q(a j ; b i ) =K j (T j N i T 1 j + ~ C j ); (2.8) where Q is the predicted projection function, K j is the calibration matrix for the j th camera, N i = (0; b X T i ) is the vector quaternion representation of the 3D point vector b X i , and T j is a unit quaternion that represents a 3D rotation about unit vector u j = (u 1 ;u 2 ;u 3 ) T by j angle: T j = cos( j =2) + u j sin( j =2): (2.9) The term T j N i T 1 j represents the rotation of the 3D point b X i by an angle of j about unit vector u j (Vicci, 2001). a j represents thej th camera parameters, and it is defined as a j = (T j ; ~ C T j ) T . b i represents thei th point parameters, and it is the same as b X i . Levenberg-Marquardt Algorithm The Levenberg-Marquardt (LM) algorithm is an iterative minimization method that has been used to minimize the reprojection error of the bundle adjustment problem. When the current solution is far away from the final minimum, LM acts like the gradient descent method. In this case, even though the convergence is slow, it is guaranteed; however, when the current solution is close to the final minimum, it acts like the Gauss-Newton iteration method. Thus, LM is more robust than the Gauss-Newton method. Let p2R M be the parameter vector mapped to an estimated measurement vector ^ x2R N by functionf as ^ x = f(p). The objective is to iteratively find the vector p u that minimizes the squared distancekk= T using an initial estimate p 0 , where = x ^ x and x is the measured 22 vector. For a smallk k, f(p + ) can be approximated by Eqn. (2.10) as the result of the Taylor series expansion: f(p + )f(p) + J; (2.10) where J is the linear mapping denoted by the Jacobian matrix @f(p) @p . At each step of the iteration, it is required to find the that minimizesk xf(p + )k. Using Eqn. (2.10) one can write: k xf(p + )kk xf(p) Jk=k Jk: (2.11) The solution for J T (J) = 0 will minimize the right hand side of Eqn. (2.11). This will yield to the normal equations as follows: J T J = J T : (2.12) Equation (2.12) is used for the Gauss-Newton iteration method. In the case of the LM method, the normal equations are replace by the augmented normal equations: (J T J +I) = J T ; (2.13) where is the damping term and its value varies from one iteration to the next, and I is the identity matrix. If the updated p+, where is computed from Eqn. (2.13), yields to a decrease in the error , the update is saved, the damping term is decreased, and the process is repeated. Otherwise, the damping term is increased, Eqn. (2.13) is solved again, and the process is repeated until the error decreases for a computed . If the covariance matrix x of vector x is included in the LM algorithm, the minimum is found by solving the weighted normal equations (Lourakis and Argyros, 2004): J T 1 x J = J T 1 x : (2.14) 23 Let X = (X 1 ; X 2 ;:::; X n ) T be the measured projection matrix, where X i = (x T i1 ; x T i2 ;:::; x T im ) T , and parameter vector p = (a T ; b T ) T , where a = (a T 1 ; a T 2 ;:::; a T m ) T is the camera parameter vec- tor and b = (b T 1 ; b T 2 ;:::; b T n ) T is the 3D points parameter vector. Since x ij depends only on the i th 3D point and thej th camera: @b x ij @a k = 0; 8j6=k; (2.15) and @b x ij @b k = 0; 8i6=k: (2.16) For the case ofn = 4 points andm = 3 cameras, the jacobian J is given as @ b X @p = 0 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B @ @b x 11 @a 1 0 0 @b x 11 @b 1 0 0 0 0 @b x 12 @a 2 0 @b x 12 @b 1 0 0 0 0 0 @b x 13 @a 3 @b x 13 @b 1 0 0 0 @b x 21 @a 1 0 0 0 @b x 21 @b 2 0 0 0 @b x 22 @a 2 0 0 @b x 22 @b 2 0 0 0 0 @b x 23 @a 3 0 @b x 23 @b 2 0 0 @b x 31 @a 1 0 0 0 0 @b x 31 @b 3 0 0 @b x 32 @a 2 0 0 0 @b x 32 @b 3 0 0 0 @b x 33 @a 3 0 0 @b x 33 @b 3 0 @b x 41 @a 1 0 0 0 0 0 @b x 41 @b 4 0 @b x 42 @a 2 0 0 0 0 @b x 42 @b 4 0 0 @b x 43 @a 3 0 0 0 @b x 43 @b 4 1 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C A : (2.17) When the LM algorithm is used to minimize the reprojection error (Eqn. (2.5)), matrices of (3n + 11m) (3n + 11m) dimensions have to be factored or inverted. The solution of the normal equations has the computational complexity ofO(N 3 ) in the number of parameters. 24 Consequently, as the number of parameters increases, the minimization will be costly and at some point impossible. Equation (2.17) shows the sparse structure of the jacobian matrix used in the LM algorithm. One way to overcome the high cost of the LM algorithm is to use sparse methods (Hartley and Zisserman, 2000). Lourakis and Argyros (Lourakis and Argyros, 2004) provided details of how to efficiently solve the BA problem based on the sparse structure of the Jacobian matrix used in the LM algorithm. Their modified implementation of this algorithm is used to solve the bundle adjustment problem in this study. 2.2.5 Composition In the problem under discussion, the camera has 4 degrees of freedom (three for the camera orientations and one for the camera focal length). The location ~ x = (x 0 ;y 0 ) is known (it is the center of the image),s is equal to zero,f x andf y are equal, and ~ C is a zero vector and constant for all the views (the camera is constrained by translation). Hence, the projection of thei th point on thej th andj 0th view can be simplified as: x ij =K j R j [X i Y i Z i ] T ; (2.18) and x ij 0 =K j 0R j 0 [X i Y i Z i ] T ; (2.19) whereX i ,Y i , andZ i are the 3D coordinates of thei th point,K j is the 3 3 camera calibration matrix of the j th view, and R j is the 3 3 rotation matrix of the j th view representing the orientation of the camera coordinate frame. Equations (2.18) and (2.19) confirm that the relation between x ij and x ij 0 is a homography: x ij 0 =K j 0R j 0R T j K 1 j x ij ; (2.20) 25 where x ij and x ij 0 are homogeneous image positions (x ij = w ij [u ij v ij 1] T , whereu ij andv ij are the pixel positions corresponding to thei th 3D point in thej th view). This means that, if the unknown camera parameters are estimated, one can transform the images from one plane to the other one through the above homography. Transforming the images using the computed homographies by solving the bundle adjustment problem minimizes the registration error between all the selected images as well as the current- view image. This is why the computed homographies from RANSAC are not used for image registration. The selected images are all transformed onto the plane of the current-view image and stitched using the homographies between each selected image and the current-view image. The composition surface is flat. Consequently, straight lines remain straight, which is important for inspection and edge removal purposes. Finally, the reconstructed scene is cropped and can then be compared to the current-view image. 2.2.6 Blending and Exposure Compensation After stitching the images together, some image edges are still visible. This effect is usually due to exposure differences, vignetting (reduction of the image intensity at the periphery of the im- age), radial distortion, or mis-registration errors Brown (2005). Due to mis-registration or radial distortion, linear blending of overlapped images may blur the overlapping regions. In the problem under discussion, we are interested in preserving the high-frequency components (e.g., cracks). It is challenging to smooth low-frequency exposure variances of an image without blurring its high- frequency components. A solution to this problem is to use a technique that blends low-frequency components over a larger spatial region, and high-frequency components over a smaller region. For this purpose, the Laplacian pyramid blending Burt and Adelson (1983) technique is used. Fig- ure 2.11(a) shows a linear blending of images, whereas Fig. 2.11(b) shows the Laplacian pyramid 26 blending of the same images. The visual artifacts due to radial distortion and mis-registrations are eliminated in the latter one. (a) (b) Figure 2.11: (a) Image stitching using a linear image blending, (b) image stitching using the Laplacian pyramid blending. The visual artifacts (i.e., the ‘ghosting effect’ at the gusset plate and the column, and blurriness of the diagonal member) in image (a), due to radial distortion and mis-registration, are eliminated in image (b). In order to stitch the selected images, an initial weight function is assigned to each image. This initial weight function for imagei is formed byW i (x;y) =w i (x)w i (y) for each pixel located at (x;y) wherew(x) is a linear function that varies from 1 in the middle of the image to zero at the edges, in thex direction. Furthermore, this weight function is updated as: W i (x;y) = 8 > < > : 1 if i = arg max j w j (x)w j (y), 0 otherwise. (2.21) Equation (2.21) assigns a value of 1 to the weight function if imagei has the maximum weight at (x;y) among all other images; otherwise, it assigns 0 to the weight function Brown (2005). These weight maps are used to form Gaussian weight pyramids as the blending weights for each band: W l i =W l1 i g; (2.22) 27 where l is the level of the pyramid, g is a 5 5 Gaussian kernel, and W 0 i is the weight map defined in Eqn. (2.21). Note that after each convolution, the results are sub-sampled by a factor of 2. This can be shown in more detail as follows: W l i (x;y) = 2 X m=2 2 X n=2 g(m;n):W l1 i (2x +m; 2y +n): (2.23) To form a Gaussian pyramid, each image is blurred by the Gaussian kernel followed by a sub- sampling factor of 2. IfI l i is the levell of the Gaussian pyramid for imagei, it can be formulated as: I l i =I l1 i g; (2.24) whereI 0 i is the original image. A more detailed formula is shown in Eqn. (2.25): I l i (x;y) = 2 X m=2 2 X n=2 g(m;n):I l1 i (2x +m; 2y +n): (2.25) Successive subtraction of the Gaussian pyramid levels leads to the construction of the Lapla- cian pyramid: L l i =I l1 i I l i =I l1 i I l1 i g: (2.26) Note that I l i is half the size of I l1 i in the horizontal and vertical directions due to sub- sampling. Hence,I l i has to be expanded prior to subtraction fromI l1 i : L l i (x;y) =I l1 i (x;y) 4 2 X m=2 2 X n=2 g(m;n):I l1 i ( xm 2 ; yn 2 ): (2.27) The steps used to construct the Gaussian and Laplacian pyramids can be reversed to preserve the original image. From a blending stand point, the Laplacian pyramids of overlapping images 28 are linearly combined at each level using the corresponding weights to construct the scene. This is a reverse problem; hence, so the reconstructed image at levelN (top pyramid level) is: I N (x;y) = P n i=1 L N i (x;y):W N i (x;y) P n i=1 W N i (x;y) ; (2.28) wheren is the number of stitching images. For the other pyramid levels, the following equation is used: I l1 (x;y) = P n i=1 L l1 i (x;y):W l1 i (x;y) P n i=1 W l1 i (x;y) +I l (x;y): (2.29) Note thatI l i has to be expanded first before it is added to the other term in Eqn. (2.29). I 0 is the final blending result. In this study, six pyramid levels are used (see Fig. 2.15). In order to reduce the exposure differences between the stitching images and the current-view image, the following error function is minimized: e = n X i=1 X y X x W cvi (x;y)[ i I i (x;y)I cv (x;y)] 2 ; (2.30) where i is the exposure compensation coefficient for imagei,I i (x;y) is the brightness value of the pixel located at (x;y) in imagei,I cv (x;y) is the brightness of the current-view image, and W cvi (x;y) identifies the overlapping region between the current-view image and imagei based on the weight functions from Eqn. (2.21). The weighting functionW cvi (x;y) is 1 in the overlapping region and 0 otherwise. By minimizing the error in Eqn. (2.30), the exposure compensation coefficients are computed as: i = P y P x W cvi (x;y)I cv (x;y) P y P x W cvi (x;y)I i (x;y) : (2.31) 29 Each stitching image is then multiplied by its corresponding exposure coefficient prior to blending. Note that, in order to gain some robustness against mis-registration errors, we have used the average brightness values in the overlapping regions in Eqn. (2.31). 2.3 Experimental Results and Discussion Figure 2.12 shows an image database consisting of four image sets: 24 images of a truss system, 24 images of a full-scale structural model, 24 images of a typical hospital support structure, and 32 images of a magnetorheological (MR) damper with its attached structures. The average camera- object distance for image databases in Figs. 2.12(a), (b), (c), and (d) are 2, 7, 2.5, and 1.5 meters, respectively. This quantity can be increased by using a more powerful lens. Each of these images has at least 50% overlap with its neighboring images. All of the images are saved in the database without any specific order (the images in Fig. 2.12 are presented in an order to give the reader the sense about the overlapping regions); however, indexing the images can enhance the search speed for image selection. The resolution is 640 480 pixels for each image. All the images are captured by a Canon PowerShot A610 digital camera. Note that four different image sets are saved in a single database to show the robustness of the image selection algorithm in the presence of the outlier images. The SIFT keypoints are detected and saved in a file for each of the database images. In this way, there is no need to recompute the keypoints for the database images while reconstructing each scene. 30 (a) (b) (c) (d) Figure 2.12: An image database consisting of: (a) 24 images of a truss system, (b) 24 images of a full-scale structural model, (c) 24 images of a typical hospital ceiling support structure, and (d) 32 images of an MR damper with its attached structures. 31 Figure 2.13(a) shows a current-view image of the truss system shown in Figure 2.12(a). A yellow tape is attached to the truss gusset plate in this image (this tape is not present in any of the database images). Figures 2.13(b), (c), and (d) are the (autonomously) selected images from the database to reconstruct the scene. Figure 2.14 shows the reconstructed scene and the contribution of the selected images in Fig. 2.13. On a AMD Athlon II X4 (2.6 GHz) processor, it takes 37 seconds for the proposed system to detect SIFT keypoints in the current-view image, find the matching keypoints between the current- view image and all the images in the database (104 images), select matching images, solve the bundle adjustment problem, blend the selected images and crop the reconstructed scene. Bundle adjustment take less than a second of the whole computation time (because the sparse bundle adjustment algorithm is efficiently implemented in C++). Note that no parallel processing is used in this process. As the number of images in the database increases, the search time will also in- crease. For the current example, if just the images in Fig. 2.12(a) were saved in the database, the computation time would decrease by about 10 seconds. Furthermore, higher resolution images lead to a greater number of keypoints, which leads to higher computational cost. Except for the bundle adjustment algorithm, which is implemented in C++, the rest of the algorithms are imple- mented in MATLAB. For faster performance (i.e., online processing), all the algorithms should be efficiently implemented in C++ (or an equivalent computer language). In this study, our goal is to provide the proof of concept for the proposed inspection system. In all our experiments, the bundle adjustment took the least computation time. Most of the computation time was consumed for searching images in the database. Fortunately, this process can be parallelized. The Laplacian pyramids and their corresponding Gaussian weight pyramids for each of the selected images in Fig. 2.13 are shown in Fig. 2.15. The higher-frequency and lower-frequency 32 (a) (b) (c) (d) Figure 2.13: (a) Current-view image of a truss system, (b), (c), and (d) are three image matches autonomously selected from the database. Note that a yellow tape is attached to the truss gusset plate in the current-view image. This tape is absent in the images from the database. components of an image are preserved in the lower and higher Laplacian pyramid levels, respec- tively. As described in Section 2.2.6, these Laplacians are linearly combined at each level using the corresponding weights to reconstruct the scene. The current-view image and the reconstructed scene (after blending and exposure compensa- tion) are shown in Figs. 2.16(a) and (b), respectively. One can recognize the yellow tape (shown by a red circle) in the current-view image while it is absent in the reconstructed scene (i.e., this synthetic change was not present in a prior inspection of the structure). 33 Figure 2.14: The reconstructed scene and the contribution of the selected images in Fig. 2.13. (a) (b) Figure 2.16: (a) Current-view image, (b) the reconstructed scene after blending, exposure com- pensation, and cropping. The yellow tape (shown by red circle) on the truss gusset plate in image (a) is absent in image (b). Figure 2.17(a) shows the scene reconstruction and the contribution of five selected images from the database (Fig. 2.12). Figure 2.17(b) is the current-view image captured relatively far away from the structure. Figures 2.17(c) and (d) show the reconstructed scene using a linear blending versus the Laplacian pyramid blending. Exposure differences of stitched images have 34 (a) Level 1 (b) Level 2 (c) Level 3 (d) Level 4 (e) Level 5 (f) Level 6 Figure 2.15: Six level of the Laplacian pyramids and their corresponding Gaussian weight pyra- mids for each of the selected images in Fig. 2.13 (see Eqns. (2.22) through (2.27)) . The laplacian pyramids for the images are presented on the left side of each row while their corresponding Gaussian weight pyramids are presented on the right side. Linear combination of the Laplacian pyramids at each level using the corresponding weights yields to the seamless reconstruction of the scene as described in Section 2.2.6. 35 led to a poor reconstruction result in Fig. 2.17(c) with respect to Fig. 2.17(d). Figure 2.17(e) shows the reconstructed scene using the Laplacian pyramid blending and the proposed exposure compensation technique. One can compare the lower left side of Figs. 2.17(b), (d), and (e) to evaluate the effect of the proposed exposure compensation approach. It is obvious that Figure 2.17(e) has more resemblance (i.e., less exposure difference) with Fig. 2.17(b) (the current-view image). Furthermore, there is an aluminum element in the reconstructed scene (Fig. 2.17(e)) which is absent in the current-view image (Fig. 2.17(b)). Figure 2.18 presents an example where an inspector encounters a suspicious condition in the current-view image. By using the proposed system, he or she can visually compare the current condition of the structure with its previous status to identify probable changes. Figure 2.18(a) shows the scene reconstruction and the contribution of four selected images from the database (Fig. 2.12) that contains 104 images. Figure 2.18(b) is the current-view image of a typical hospital support structure. Figure 2.18(c) is the final reconstructed scene. One can recognize that a ceiling tile is missing in the current-view image by comparing Figs. 2.18(b) and (c). Figure 2.19(a) shows the scene reconstruction and the contribution of twelve selected images from the database. Figure 2.19(b) is the current-view image of an MR damper. Note that this image is zoomed-out relative to the images in the database (see Fig. 2.12) and taken in a different lighting condition to show the invariance of the proposed system with respect to image exposure (the image is captured by using flash). Figure 2.19(c) shows the reconstructed scene using the Laplacian pyramid blending and the proposed exposure compensation technique. One can recog- nize that two nuts (shown by red circles in Fig. 2.19(c)) are missing in the current-view image by comparing Figs. 2.19(b) and (c). This is a good example of the practical capacities of the proposed methodology. 36 (a) (b) (c) (d) (e) Figure 2.17: (a) The scene reconstruction and the contribution of five selected images from the database (Fig. 2.12), (b) current-view image of a full-scale structural model, (c) scene recon- struction using a linear blending, (d) scene reconstruction using the Laplacian pyramid blending, and (e) scene reconstruction using the Laplacian pyramid blending and exposure compensation. There is an aluminum element in image (e) which is absent in image (b). 37 (a) (b) (c) Figure 2.18: (a) The reconstruction and the contribution of three selected images from the database (Fig. 2.12), (b) current-view image of a typical hospital ceiling facilities, and (c) scene reconstruction using the Laplacian pyramid blending and the exposure compensation. Note that a ceiling tile is missing in image (a). 38 (a) (b) (c) Figure 2.19: (a) The reconstruction and the contribution of twelve selected images from the database (Fig. 2.12), (b) current-view image of an MR damper (zoomed out), and (c) scene reconstruction using the Laplacian pyramid blending and exposure compensation. Note that two missing nuts in image (b) are shown by red circles in image (c). The lighting condition is different in images (a) and (b). If an inspector zooms-in to have a closer look at the center of Fig. 2.19(b) (see Fig. 2.20(a)), the proposed system finds the matching image in the database (Fig. 2.20(b)) and reconstructs the scene (Fig. 2.20(c)). Figure 2.20(d) shows the simple subtraction of Figs. 2.20(a) and (c). 39 (a) (b) (c) (d) Figure 2.20: (a) Current-view image of an MR damper (zoomed in), (b) the selected image from the database (Fig. (2.12)), (c) reconstructed scene, and (d) difference of images (a) and (c). Note that there are two missing nuts is image (a) in comparison with image (c). If overlapping images are captured periodically and saved in separate databases, then the evolution of changes can be efficiently tracked through time. Figure 2.21 shows three sets of image databases captured from a structural system at different time periodst 1 ,t 2 , andt 3 where t 1 <t 2 <t 3 . Figure 2.22 shows the change evolution of this structural system. Figures 2.22(a), (b), and (c) are reconstructed from the images in Figs. 2.21(a), (b), and (c), respectively. Figure 2.22(a) shows a bolted beam-column connection at inspection timet 1 . Figure 2.22(b) shows that the nut is disappeared at inspection timet 2 . Figure 2.22(c) shows that the bolt is disappeared and 40 the beam has been displaced at inspection timet 3 . Figure 2.22(d) shows the current view of the connection where the beam has been displaced even more. (a) (b) (c) Figure 2.21: Three image databases of a structural system captured at different time periods: (a), (b), and (c) images of a structural system captured at time periods t 1 , t 2 , and t 3 , respectively (t 1 <t 2 <t 3 ). 41 (a) (b) (c) (d) Figure 2.22: Change evolution in a structural system: (a), (b), and (c) scene reconstructions of a structural system (beam-column connection) at time periodst 1 ,t 2 , andt 3 , respectively (t 1 < t 2 <t 3 ). (d) Current-view image of the same structural system. Figure 2.23 shows two sets of image databases captured from a truss system (i.e., a structural system similar to bridge structures) at different time periodst 1 andt 2 , wheret 1 ¡t 2 . The resolu- tion is 640 480 pixels for each image. Figure 2.24(a) shows a current-view image of the truss system shown in Fig. 2.23. The resolution for this image is 800 600 pixels. A yellow tape is attached to the truss in this image. Figures 2.24(b) and (c) are the reconstructed and cropped scenes using the images captured at time periodst 2 andt 1 , respectively. The regions of interest are shown by red circles in these figures. One can see that the yellow tape did not exist at time 42 periodt 1 . At timet 2 , a vertical tape is attached to the truss. The current-view image shows two vertical and horizontal yellow tapes attached to the structure. (a) (b) Figure 2.23: Two image databases of a truss system captured at different time periods: (a) and (b) images of a truss system captured at time periodst 1 andt 2 , respectively (t 1 <t 2 ). 43 (a) (b) (c) Figure 2.24: Change evolution in a truss system: (a) current-view image of a truss system, (b) and (c) scene reconstructions of the same truss system at time periodst 2 andt 1 , respectively (t 1 <t 2 ). The changed region is shown with a red circle. Note that none of the images in Figs. 2.23(a) and (b) are identical with the reconstructed scenes in Figs. 2.24(b) and (c). To reconstruct the scenes shown in Figs. 2.24(b) and (c), four and six images are selected automatically from the databases in Figs. 2.23(a) and (b), respec- tively. Figure 2.25 shows the contribution of four images used to reconstruct Fig. 2.24(b). On a AMD Athlon II X4 (2.6 GHz) processor, it takes 110 seconds for the proposed system to detect SIFT keypoints in the current-view image, find the matching keypoints between the current-view image and all the images in the database (32 images), select matching images, solve the bundle adjustment problem, blend the selected images and crop the reconstructed scene in Fig. 2.24(b). 44 Figure 2.25: The scene reconstruction and the contribution of four selected images from the database captured at time t 2 (Figure 2.23(b)). The current-view image corresponding to this reconstruction is shown in Fig. 2.24(a). 2.3.1 Comparison of Image Stitching Algorithms In order to check the registration performance of the proposed system, the registration Root Mean Square (RMS) error is computed. Ten real image sets with different image resolutions and dif- ferent number of stitching images (varying from two to thirteen) are used. The RMS errors in the current study are compared with the RMS errors from a well-known automatic stitcher: Au- toStitch (Brown and Lowe, 2007). Figure 2.26 confirms that the registration errors of the current study and that of AutoStitch are close. It is worth mentioning that AutoStitch optimizes the cam- era parameters (motion) by minimizing the homography errors between stitching images, while in this study the proposed system minimizes the reprojection error by optimizing the camera pa- rameters and the 3D coordinates of matching keypoints (motion and structure) simultaneously. The physics of the problem is better preserved using the latter approach. In many panoramic im- age stitching algorithms, including AutoStitch, the composition surface is cylindrical or spherical 45 whereas in the current study this surface is flat. By using the flat composition surface, straight lines remain straight which is important for inspection purposes. 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 RMS error (pixel) Set number AutoStitch Current study Figure 2.26: Comparison of RMS errors using the current study and AutoStitch. Ten real image sets with different image resolutions and different number of stitching images are used to evaluate the registration errors. 2.3.2 Minimum Measurable Defect Based on the inspection guidelines, appropriate cameras should be used for defect detection pur- poses. Here, a general introduction about the different parameters that affect the detection capa- bilities of a camera and their relation is presented. In order to accurately measure a feature in an image, at least two pixels which represent the feature have to be detected. Below, a formula that gives the smallest measurable feature in an image is given: SF = WD FL SS SR 2; (2.32) whereSF is the smallest measurable feature,FL is the camera focal length,WD is the working distance (distance between an object and the camera),SS is the camera sensor size, andSR is the camera sensor resolution. Note that this formula does not consider lens distortion and the type of 46 defect detection algorithm. This formula helps to select the appropriate image acquisition system for a given working distance and the smallest measurable defect. For instance, for a working distance of 3000 mm and the usage of a Canon PowerShot A610 digital camera (where the maximum focal length is 29.2 mm, the sensor size is 7.2 mm and 5.3 mm in horizontal and vertical directions, and the maximum sensor resolution is 2592 1944 pixels) the minimum measurable feature is: SF = 3000mm 29:3mm 7:2mm 2592pixel 2 = 0:57mm: (2.33) 2.4 Summary and Future Work Among the possible techniques for inspecting civil infrastructure, the use of optical instrumen- tation that relies on image processing is a less time-consuming and inexpensive alternative to current monitoring methods. Visual inspection is the predominant method for bridge inspections. The visual inspection of structures is a subjective measure that relies heavily on the inspector’s experience and focus (attention to detail). Furthermore, inspectors who do not have fear of heights and feel comfortable with height and lift spend more time finishing their inspection and are more likely to locate defects. Difficulties accessing some parts of a bridge adversely affect the transmis- sion of knowledge and experience from an inspector to other inspectors. The integration of visual inspection results and the optical instrumentation measurements gives the inspector the chance to inspect the structure remotely by controlling cameras at the bridge site. This approach resolves the above difficulties and avoids costs of traffic detouring during the inspection. Cameras can be appropriately mounted on the structure. Although the cameras are constrained by translation (i.e., attached to a fixed location), they can rotate in two directions. The inspector thus has the appropriate tools to inspect different parts of the structure from different views. 47 The main purpose of the current study is to give the inspector the ability to compare the current situation of the structure with the results of previous inspections. In order to reach this goal, a database of images captured by a camera is constructed automatically. When the inspector notices a defect in the current view, he can request the reconstruction of the same view from the images captured previously. In this way, the inspector can evaluate the growth of a defect of interest. In order to stitch the images that have been previously captured from different views and reconstruct an image from them that has the same view as the newly captured image, the database must be autonomously searched to select images relevant to the new image. For this purpose, automatic keypoints should be detected. In the next step, the images that have the greatest number of matching keypoints with the current view are identified. The following step is to eliminate the outlier matching keypoints and find the optimum number of matches that reconstruct the current view. Then, the the camera parameters for each of the selected views as well as the 3D coordinates of the matching keypoints are computed. This is the bundle adjustment problem. The stitching and blending of the selected images will take place after this stage. Eventually, the reconstructed scene is cropped so as to be compared to the current view. If overlapping images are captured periodically and saved in separate databases, then the evolution of changes can be tracked through time by multiple reconstruction of a scene from images captured at different time intervals. Furthermore, this study confirms the feasibility of a system that can autonomously reconstruct scenes from previously captured images and provide a reference scene for an inspector while he or she visually inspects the structure. In this study, several experimental examples are provided. Four different sets of images (total of 104 images) are saved in a single database to show the robustness of the proposed system in the presence of outlier images. Furthermore, three sets of images captured from a structural system at different time periods are presented to show how the proposed system facilitates the tracking of 48 change evolution in a scene. The reconstruction examples in this study include zoom-in, zoom- out, different lighting conditions, missing parts, displacement, and change evolution that show the capabilities of the proposed system for different scenarios. Since many structures, such as bridges and tall buildings, continuously oscillate, the captured images are subject to motion blur (i.e., the apparent streaking caused by rapidly moving objects). Image stabilizing approaches can be used to prevent blurring caused by the minute shaking of a camera lens. Restoration techniques can be used for motion blur caused by movements of the object of interest or intense movements of the camera. Although there are several proposed algo- rithms for restoring motion blur, selecting an appropriate algorithm for the proposed application in this study needs further research. Furthermore, motion blur depends on parameters such as shutter speed, focal length, and motion frequency. By considering the effects of these parameters, it is possible to reduce motion blur by selecting the appropriate image acquisition system. Under different weather conditions, the contrast and color of images are altered. On the other hand, the proposed study requires robust detection of keypoints. Fortunately, SIFT keypoints are highly robust to image noise and partially invariant to illumination changes. Consequently, except for some severe weather conditions where the light is intensively scattered by the atmosphere, the proposed study can appropriately reconstruct the virtual scene (see Fig. 2.19). In the case of very extreme weather conditions, contrast restoration techniques can be used to remove weather effects from images (Narasimhan and Nayar, 2003). Furthermore, in order to keep illumination variations as low as possible, the use of Light-Emitting Diode (LED) is recommended. The correction of radial distortion is not considered in this study. Radial distortion can be modeled using low order polynomials. Furthermore, selection of blending weights based on the sharpness of the captured images is more of interest for inspection purposes whereas in this study, the closeness of a given pixel to the center of the selected images is used to assign blending weights. Eventually, implementing all of the discussed algorithms in a computer language such 49 as C or C++ will dramatically decrease the computation time and will hasten the online usage of the proposed system. In this study, in order to initially match the keypoints an exact k-nearest neighbors approach is used. The usage of a k-d tree to find the approximate nearest neighbors will significantly improve the speed of the proposed system (Beis and Lowe, 1997). 50 Chapter 3 Components of Image-Based Defect Detection of Structures 3.1 Introduction Before dealing with image-based defect detection of structures, it is necessary to review some concepts that are useful in this regard. This chapter presents a detailed introduction to image preprocessing techniques (including mean filter, median filter, and averaging of multiple images), pattern recognition (including segmentation, feature extraction, and classification), and wavelet filter bank (and its usage in image processing), which will provide the reader with a set of concepts that are useful for structural defect detection. Furthermore, an evaluation of the aforementioned techniques is also presented. If the reader is an expert in the field of image processing and pattern recognition, he or she can skip this chapter. A quick review of preprocessing techniques is presented in Section 3.2, since preprocessing of the captured images may be a prerequisite for the application of other algorithms. In Section 3.3, a brief review of pattern recognition concepts is presented, and supervised and unsupervised classification algorithms are introduced. Classification plays an important role in differentiating defects from non-defective changes. Some important and useful classification techniques are 51 discussed in this section. Section 3.4 briefly reviews wavelet decomposition and reconstruction of images. 3.2 Preprocessing Preprocessing consists of a series of steps that prepare the image for further processing. These enhancement techniques including image smoothing, image sharpening, contrast modification, and histogram modification can be found in almost any Digital Image Processing book (Gonzalez and Wintz, 1987, Gonzalez and Woods, 1992, Gonzalez et al., 2004, Pratt, 2001). The purpose of image smoothing is to reduce noise in an image. Here, some practical and useful image smoothing techniques are introduced: 3.2.1 Neighborhood Averaging (Mean Filter) The average gray-level value of a neighborhood is replaced as the new value in the smoothed image. Although this technique is very simple, it will blur any sharp edges. To overcome this shortcoming, it is necessary to average the brightness values of only those pixels in the neighbor- hood that have similar brightness as the pixel which is being processed. The most important factor in this technique is the neighborhood and the assigned weights for averaging the values within the neighborhood. In fact, it is possible to write neighborhood averaging as a 2-D convolution by sliding a kernel over the gray scale image. This convolution could be written mathematically as: Q(i;j) = m 0 X k=m n 0 X l=n I(ik 1;jl 1)K(k +m + 1;l +n + 1); (3.1) whereI is the gray scale (M +m+m 0 )(N +n+n 0 ) image derived from the initialMN image by mirroring the border elements to create a larger matrix. In that way,Q will beMN as 52 well. Each value in the image matrix is the brightness value of the relevant pixel.K is the kernel withm +m 0 + 1 rows andn +n 0 + 1 columns. Figure 3.1 shows an example of a 5 5 matrix (image), convolved by a 3 by 3 kernel. This figure is useful for understanding the concept of 2-D discrete convolution in image processing. In order to have the same size output, the elements next to the border of the input are duplicated as shown by the dashed lines. A 3 by 3 sliding window moves over the input and in order to compute each output element, the values of the input elements located in the sliding window (which is centered at the same coordinate as the considered output element) are multiplied by the kernel elements and summed. The sliding window moves over the rows and columns of the input to construct the output. Convolution is applied in high-pass and low-pass filters of wavelet analysis as discussed in Section 3.4. Gaussian smoothing kernel can be used as a suitable neighboring average kernel. This kernel can be estimated as a 2D isotropic Gaussian distribution as shown in Eqn. (3.2): G(x;y) = 1 2 2 exp( x 2 +y 2 2 2 ): (3.2) Since the digital image is a set of discrete pixels, a discrete approximation of the Gaussian dis- tribution is needed. One can assume that the value of the Gaussian distribution for points further than three standard deviations from the mean is zero. Figure 3.2 shows a continuous 2D Gaussian distribution with = 1. A typical 5 5 Gaussian kernel resulting from a discrete approximation of the 2D Gaussian distribution with standard deviation equal to one is shown in Fig. 3.3. This convolution can be split into two 1D Gaussian kernels inx andy directions, then the image should be convolved first along one direction and finally the resultant should be convolved along the other direction. Horn (1986) has introduced a way to approximate continuous Gaussian functions with discrete filters. 53 ! " # $ % & Figure 3.1: An example of a 5 5 input image convolved by a 3 3 kernel. 3.2.2 Median Filter It is possible to overcome the image blurring of the Neighborhood Averaging method by choosing a threshold; however, the threshold is usually based on extensive trial and error. For this reason, 54 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 x y Gaussian Distribution Figure 3.2: 2D Gaussian distribution with = 1. Figure 3.3: A typical Gaussian kernel as an approximation of Gaussian distribution with = 1. the gray level of each pixel can be replaced by the median value of the gray level of the neighbor- ing pixels. This simple nonlinear filter is referred to as a Median Filter. This technique is suitable when crack detection is of interest. 3.2.3 Averaging of Multiple Images Suppose that a noisy imageJ k (i;j) is formed as follows: J k (i;j) =I(i;j) + k (i;j); (3.3) 55 whereI(i;j) is the original image and k (i;j) is noise. Assuming that noise is uncorrelated in different pixels and has zero mean value, the average of multiple images can be written as J(i;j) = 1 N N X k=1 J k (i;j): (3.4) It then follows that: E(J(i;j)) =I(i;j); (3.5) and J(i;j) = 1 p N (i;j) : (3.6) Equations 3.5 and 3.6 show that, as the number of multiple images increases, the averaged image converges to the original image, and the deviation of the pixel values decreases. This is a suitable way to remove noise; however, the noisy images should be properly registered prior to averaging (Gonzalez and Wintz, 1987, Gonzalez and Woods, 1992). 3.3 Pattern Recognition The aim of pattern recognition is to classify the objects or patterns that have similar attributes into the same class. A scheme of a complete pattern recognition is shown in Fig. 3.4. In this scheme, the first step is data collection. Data Sensing can be done by a digital camera in this case. Data Sensing Segmentation Feature Extraction Classification Figure 3.4: A pattern recognition system scheme. 56 3.3.1 Segmentation Segmentation is a set of steps that isolates the patterns that can be potentially classified as the defined defect; however, sometimes it mistakenly picks out patterns that do not belong to the class of potential defects. The aim of segmentation is to reduce the extraneous data about patterns whose classes are not desired to be known. A good segmentation algorithm can help the classifier correctly classify patterns, and it can also affect the type of classifier used. Figure 3.5(a) shows a corroded column and its background, and Fig. 3.5(b) shows the corroded area segmented from the rest of the image. Some segmented portions of the image in Fig. 3.5(b) do not belong to the corroded area. Figure 3.5(c) shows the classification of pixels into three classes using thek-means classifier based on the RGB color vector of each pixel. Pixels with the same gray levels (black, white, or gray) belong to the same class. 3.3.2 Feature Extraction After segmenting the patterns of interest, it is time to assign them a set of finite values representing quantitative attributes or properties called features. These features should represent the important characteristics that help identify similar patterns. The process of selecting these suitable attributes is called Feature Extraction. According to Hogg (1993), there are five main factors for visual image inspection used by experienced human operators: intensity (a spectral feature), texture (a local spatial feature), size, shape, and organization. In automatic classification of patterns or objects in an image, the spectral and textural attributes are used as features (Sinha et al., 2003). It is possible to assign a feature vector to each pattern in which the elements of the vector are quantitative values representing the extracted features of the pattern. This means that anM- dimensional feature space can be defined where each axis in this space represents a feature and each pattern is one point. It is better if the coordinates are orthogonal in the feature space (it is 57 (a) (b) (c) Figure 3.5: (a) Corroded column and its background, (b) segmentation of the corroded-like area, (c) classification of pixels using thek-means classifier into three classes based on the RGB color vector of each pixel. preferred that the features are independent and also orthogonal). In order to lower the feature vec- tor dimension, it is possible to map the principal features of a pattern from a higher dimensional space to a lower dimensional space by means of a mapping transformation such as discrete co- sine transformation, Fourier transformation, or principal components analysis (Karhunen-Loeve transform) (Sinha et al., 2003). Principal Components Analysis (PCA) PCA is a linear transformation that keeps the subspace with the largest variance and it needs more computation with respect to the other specified mapping transformations. A brief description of the PCA algorithm, which is sometimes called the covariance method, is presented here. The goal 58 of this algorithm is to lower the dimensionality of a given data setX fromM toL whereL<M. Let’s call this new data setY as shown in Eqn. (3.7): Y = KLTfXg; (3.7) whereX is anMN matrix consisting ofN data column vectorsX i (each withM dimensions), Y is anLN matrix and KLT stands for Karhunen-Loeve transform. MatrixC is anMM matrix calculated as the covariance matrix ofB, whereB is the normalized form ofX, as shown in Eqn. (3.8): B =Xuh: (3.8) In this equationu is the column vector ofM1, where its elements represent the mean along each row in matrixX, andh is a row vector of 1’s with the length ofN. The next step is to compute the eigenvalues and eigenvectors of matrixC.D is defined as a diagonal eigenvalue matrix, in which each of its diagonal elements i is an eigenvalue of matrix C. V is the eigenvector matrix in which each of its columns is an eigenvector ofC. Since eigenvalues and eigenvectors are paired, they can be sorted based on the magnitude of the eigenvalues from the largest value to the smallest one. D = 0 B B B B B B B B B @ 1 0 ::: 0 0 2 ::: 0 . . . . . . . . . . . . 0 0 ::: M 1 C C C C C C C C C A MM ; (3.9) 59 V = 0 B B B B B B B B B B B B @ 0 B B B B B B B B B B B B @ e 11 e 12 e 13 . . . e 1M 1 C C C C C C C C C C C C A 0 B B B B B B B B B B B B @ e 21 e 22 e 23 . . . e 2M 1 C C C C C C C C C C C C A ::: 0 B B B B B B B B B B B B @ e M1 e M2 e M3 . . . e MM 1 C C C C C C C C C C C C A 1 C C C C C C C C C C C C A MM : (3.10) Cumulative energy content corresponding to each eigenvalue i is defined as the summation of all eigenvalues from 1 toi. The firstL columns of the eigenvector matrixV are selected as matrixH. When the cumulative energy content for theL th eigenvalue is more than a specified threshold, the rest of the eigenvalues can be truncated andH is built as defined above. MatrixQ could be defined as the result of dividing each element of matrixB by the corresponding member of matrixS, where matrixS is defined in Eqn. (3.12): Q ij = B ij =S ij ; (3.11) S = sh; (3.12) wheres is a column vector whosei th value is the square root of thei th diagonal element ofC, andh is a row vector of 1’s of the lengthM. Finally, theLN matrix of projected vectors is computed as: Y =H y Q; (3.13) wherey is the conjugate transpose operator. 3.3.3 Classification The last step in a pattern recognition system is Decision Making or Classification. The feature vectors previously extracted for each pattern are inputted into the appropriate classifier, which outputs the classified patterns. Figure 3.6 shows the clustering of 7776 pixels of Fig. 3.5 (c) 60 plotted in a 3-dimensional feature space, the RGB color space. There are two types of classifiers as described below. 0 50 100 150 200 250 300 0 50 100 150 200 250 0 50 100 150 200 250 Green Red Blue Class 1 Class 2 Class 3 Figure 3.6: Clustering of pixels of figure 3.5(c) in the Red-Green-Blue feature space. Supervised Classification In this type of classification, a set of feature vectors belonging to the known classes is used to train the classifier. The goal of using training set is to find a relation between the extracted features of the same-class patterns and predict the class of a valid feature vector when its class is unknown. Choosing an appropriate training set is essential to obtaining reasonable and accurate results from supervised classification. k-Nearest Neighbor classifier is an example of supervised classifiers. In this technique, an un- classified pattern is classified based on the majority of itsk nearest neighbors in the feature space. The neighbors are the patterns in training set. The distance between patterns in the feature space is usually Euclidian distance. This classifier is very sensitive to noise, however incrementingk decreases its sensitivity to noise. The appropriate value ofk depends on the type of data. If the 61 population of the training set grows enough, the nearest neighbor in the training set represents the class of the unknown pattern. A Support Vector Machine or an Artificial Neural Network can be used as a supervised classification tool in many classification problems. Further details are given by Duda et al. (2001). Unsupervised Classification There is no training set in an unsupervised classification system. Instead, there is a set of non- classified patterns. The goal is to classify or cluster different patterns of a given data set. This technique is very useful in cases where obtaining an appropriate training set is time consuming or costly. In some cases, a large amount of data can be clustered by an unsupervised classifier, and the class of each cluster can be determined using a supervised classification (Duda and Hart, 1973, Duda et al., 2001). A very common unsupervised classifier isk-means classifier. The goal of this classifier is to cluster patterns into k classes (k is known). In order to achieve this goal, k feature vectors of given patterns are selected randomly as the initial mean of eachk class. Each remaining pattern is classified as the class with nearest mean vector to it. The distance is usually Euclidean distance. After clustering the data, the mean vector of each class is computed. The patterns are clustered again based on the nearest mean vector. These steps are repeated until the mean vectors do not change or specific number of iterations is reached. The mean value is calculated from Eqn. (3.14): i = 1 N i X j2C i X j ; (3.14) whereC i is thei th cluster,N i is the number of members belonging to thei th cluster, andX j is the feature vector of thej th pattern. The negative aspect of this classifier is the predefined value of k. For automatic clustering of patterns of an image, it is important to find the optimum value of k for that image. Porter 62 and Canagarajah (1996) proposed a way to automatically detect the true cluster number when the number of the clusters is unknown. Within-cluster distance is defined as the sum of all distances between feature vectors and their corresponding cluster mean vectors. Within-cluster distance can be used as a criterion to select the true clustering number. Within-clustering distanceD k can be defined as indicated by Eqn. (3.15), whered( i ;X j ) represents the distance between the mean vector of thei th cluster and feature vectorX j : D k = 1 k X m=1 N m k X i=1 X j2C i d( i ;X j ): (3.15) The maximum value for within-cluster distance occurs at k = 1; as k increases, D k rapidly decreases until it reaches the true cluster number, after whichD k decreases very slightly or con- verges to a constant value. When the first difference of the within-cluster distances is small enough, the true cluster number is found; however, this method requires a threshold. On the other hand, the rapid decrease ofD k before the true cluster number and its gradual decrease after the true cluster number means that the gradient of the “within-cluster distance” versus “cluster number” graph has a significant change at the true cluster number. Based on this, the true cluster number is the one that has the maximum value for the second difference of with-in cluster distance (Porter and Canagarajah, 1996). 3.3.4 Neuro-fuzzy Classification Neuro-fuzzy systems are promising approaches used by Chae (2001) and Sinha and Fieguth (2006a) to detect defects, including cracks in sewer pipe systems. The performance of the neuro-fuzzy systems proposed in these two researches is better than the regular neural networks and other classical classification algorithms (Chae, 2001, Sinha and Fieguth, 2006a). Kumar and Taheri (2007) used neuro-fuzzy expert systems in their automated interpretation system for 63 pipeline condition assessment. Neuro-fuzzy systems simultaneously benefit from the data im- precision tolerance (vague definitions) of fuzzy logic systems and the tolerance of the neural networks for noisy data. The easily comprehendible linguistic terms and if-then rules of the fuzzy systems and the learning capabilities of the neural networks are fused into a neuro-fuzzy system (Lee, 2005). Different fusions of neural networks and fuzzy systems, which lead to neuro-fuzzy expert systems, are provided by Lee (2005). ANFIS: Adaptive Neuro-Fuzzy Inference System Adaptive Neuro-Fuzzy Inference System (ANFIS) was first introduced by Jang (1993). ANFIS is a class of adaptive networks with equal fuzzy inference system functionality (Jang et al., 1997). In adaptive neuro-fuzzy systems, the parameters of the fuzzy inference are obtained by an algorithm that can learn these parameter values from the input-output data (training data). Here, the ANFIS structure representing the first order Sugeno fuzzy model will be introduced. Figure 3.7 (a) and (b) show a first-order Sugeno fuzzy inference system with two inputs and its equivalent ANFIS architecture, respectively. The set of rules equivalent to these two systems are: Ifx isA 1 andy isB 1 , thenf 1 =p 1 x +q 1 y +r 1 , Ifx isA 2 andy isB 2 , thenf 2 =p 2 x +q 2 y +r 2 . ANFIS consists of five layers as shown in Fig. 3.7 (b). The description of these layers is presented below based on the book published by Jang et al. (1997). Note that the output of thei th node in layerl is presented asO l;i . Layer1: Fuzzification Layer Every node in this layer is adaptive. The input to this layer are the input variables (x andy). The node functionO 1;i is the membership function of the fuzzy sets (A 1 ,A 2 ,B 1 , orB 2 ). For the example shown in Fig. 3.7 (b),O 1;i = A i (x) fori = 1; 2, andO 1;i = B i2 (y) 64 y y 2 1 2 1 2 1 2 1 layer 5 layer 4 layer 3 layer 2 layer 1 y x y x B B A A f 2 w 2 1 w 1 f w 2 1 w w 2 1 w x f X X Y Y AB AB x 1 w w 2 1 f f 2 = p x +q y +r = p x +q y +r = w 2 w 1 2 w 2 f f 1 w 1 + 2 w 2 f f 1 w 1 f= + + (a) (b) 2 1 2 1 2 1 (a) y 2 1 2 1 layer 5 layer 4 layer 3 layer 2 layer 1 y x y x B B A A f 2 w 2 1 w 1 f w 2 1 w w 2 1 w x f (b) (b) Figure 3.7: (a) First-order Sugeno fuzzy inference system with two inputs and two rules, (b) the equivalent ANFIS architecture (Jang et al., 1997). fori = 3; 4, where is the membership function of the specified fuzzy set. A generalized bell shape membership function is used in Fig. 3.7 (a): A i (x) = 1 1 +j xc i a i j 2b i ; (3.16) where a i , b i , and c i are premise parameters that are to be determined during the hybrid learning process of ANFIS. Other types of membership functions, including triangular, trapezoidal, Gaussian, sigmoidal, or left-right memberships, can be used as the node func- tion in this layer. Layer2: Rule Layer The nodes in this layer are all fixed. The output of each node is the firing strength of the 65 rule. In the above example, the output of each node in this layer is the product of the input signals: O 2;i =w i = A i (x) B i (y); i = 1; 2: (3.17) Alternatively, any other T-norm that specifies the AND function can be used as the node function in this layer. Layer3: Normalization Layer The nodes are all fixed in this layer. Thei th rule’s firing strength is normalized with respect to the sum of all rules’ firing strength at thei th node: O 3;i =w i = w i P i w i ; (3.18) wherew i is thei th normalized firing strength. Layer4: Defuzzification Layer The nodes are adaptive in this layer. The node functions have the following form: O 4;i =w i f i =w i (p i x +q i y +r i ); (3.19) wherep i ,q i , andr i are the adaptive consequent parameters determined during hybrid learn- ing of ANFIS. Layer5: Summation Layer The single fixed node in this layer computes the final output as the summation of all the input signals: O 5;1 = X i w i f i = P i w i f i P i w i : (3.20) 66 3 2 2 1 1 3 3 2 1 3 2 1 9 6 3 8 5 2 7 4 1 9 8 7 6 5 4 3 2 1 1 1 Y X Y X B B B A A A premise parameters consequent parameters x y f A A A B B B x y (a) (b) Figure 3.8: (a) Two-input Sugeno-type ANFIS architecture with nine rules, (b) nine fuzzy regions corresponding to the partitioned input space (Jang, 1993). Figure 3.8 (a) and (b) show a two-input Sugeno-type ANFIS architecture with nine rules and the nine fuzzy regions corresponding to the partitioned input space, repectively. ANFIS uses a two-pass learning cycle called the hybrid learning algorithm. In the forward pass, the premise parameters are set as fixed and the algorithm uses the least-squares estimator to compute the consequent parameters on layer 4. In the backward pass, the consequent parameters are fixed and the premise parameters are updated by a back-propagation gradient descent algo- rithm. If the size of the training set is not big enough to convey enough information about the target system, fixed human-determined membership functions can be imposed on the system to overcome the lack of knowledge about the target system (Jang et al., 1997). 3.4 Wavelet Filter Bank The 2D Discrete Wavelet Transform (DWT) of images (Mallat, 1989) is a useful technique in many image processing problems, and there are many papers published on this subject. Wavelet transform provides a remarkable understanding of the spatial and frequency characteristics of an image (Gonzalez et al., 2004). Since the low frequencies dominate most images, the ability 67 of wavelet transform to repetitively decompose in low frequencies, makes it popular for many image analysis tasks (Porter and Canagarajah, 1996). In this section, the decomposition and reconstruction of images using wavelet transforms are introduced. Figure 3.9 represents a schematic decomposition procedure of an image by two-dimensional DWT. The input to this system is the initial image, i th approximation; h ' andh are low-pass and high-pass decomposition filters, respectively. The words “Columns” and “Rows” underneath these filters indicate whether the columns or rows of the input should be convolved with the decomposition filter. Since one-step decomposition of the input with a low-pass and a high- pass filter yields almost a doubled amount of data, a down-sampling (indicated by 2#) keeps the amount of the data almost the same size as the input. The words “Columns” and “Rows” beneath the down-sampling boxes shows that the down-sampling should take place either over columns or rows (which could be done simply by keeping the even-indexed columns or rows). The result is a (i + 1) th approximation, which includes the low frequency characteristics of the input, and it is the most similar output to the input image. There will be three horizontal, vertical, and diagonal details which include the details of the input in the specified directions. These outputs are the wavelet transform coefficients. Since the (i + 1) th approximation has the most characteristics of the input image, it can be fed to the decomposition system as an input, and decomposition can take place repeatedly. Figure 3.10 shows a 2D wavelet tree for a three-stage decomposition of an image. As the order of decomposition increases, more details will be decomposed from the image. A two-stage decomposition of a truss model is shown in Fig. 3.11. One can see the horizon- tal, vertical, and diagonal details in the two-level decomposition of a truss model in this figure, while the 2 nd approximation contains most of the information about the original image. These approximation coefficients and detail coefficients can be used as features for textural analysis of an image. This will be discussed in Section 5.2. 68 Figure 3.9: 2D discrete wavelet transform decomposition scheme. Figure 3.10: 2D discrete wavelet transform decomposition scheme. The approximation compo- nent of the (i) th decomposition stage can be decomposed to (i + 1) th approximation, horizontal, vertical, and diagonal details. The two-dimensional Inverse Discrete Wavelet Transform (IDWT) can be used to reconstruct the initial image from the approximation coefficients and detail coefficients as shown in Fig. 3.12. The notation 2" indicates up-sampling over rows or columns, which can be done by inserting zeros at odd-indexed rows or columns. Low-pass and high-pass reconstruction filters are denoted ash 0 ' andh 0 . The decomposition and reconstruction filters are derived from the scaling function' and the mother wavelet of a specific wavelet family. The decomposition filters in Fig. 3.11 are based on Daubechies (1992) wavelet family of order 8. The coefficients for this wavelet family in the 69 Figure 3.11: Two-stage DWT decomposition of a truss image. Figure 3.12: 2D discrete wavelet transform reconstruction scheme. 70 form of column vectors (as shown in Eqn. (3.21)) and their transposes can be used for column and row convolutions, respectively. h ' = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0:23038 0:71485 0:63088 0:02798 0:18703 0:03084 0:03288 0:01060 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 ; h = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0:01060 0:03288 0:03084 0:18703 0:02798 0:63088 0:71485 0:23038 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 : (3.21) Repetitive wavelet decomposition of an image followed by elimination of its details using a threshold and reconstruction of the edited data, leads to image smoothing, while elimination of the approximations will lead to edge detection. The latter characteristic can be used for crack extraction. Wavelet transform is also used as an image compression tool (DeV ore et al., 1992, Villasenor et al., 1995), and by setting an appropriate threshold its performance is confirmed as a noise removal technique (Chang et al., 2000). 3.5 Summary and Conclusions The concepts that are useful for image-based defect detection of structures are introduced in this chapter. These concepts include image processing and pattern recognition methods. Prepro- cessing consists of a series of steps that prepare the image for further processing. Few image preprocessing techniques including mean filter, median filter, and averaging of multiple images are reviewed in this chapter. 71 The aim of pattern recognition is to classify the objects or patterns that have similar attributes into the same class. Different aspects of pattern recognition including segmentation, feature ex- traction, principal component analysis, supervised classification, unsupervised classification, and Adaptive Neuro-Fuzzy Inference System (ANFIS) are described in this chapter. Neuro-fuzzy sys- tems simultaneously benefit from the tolerance of fuzzy logic systems for data imprecision and the tolerance of neural networks for noisy data. The easily comprehendible linguistic terms and if-then rules of the fuzzy systems and the learning capabilities of the neural networks are fused into a neuro-fuzzy system. These capabilities make the neuro-fuzzy expert systems appropriate for pattern recognition and defect classification. A 2D discrete wavelet filter bank is introduced and several applications of wavelet transform coefficients in edge detection, crack detection, image enhancement, and compression, as well as corrosion detection are discussed. Wavelet transform provides a remarkable understanding of the spatial and frequency characteristics of an image. Since low frequencies dominate most images, the ability of wavelet transform to repetitively decompose in low frequencies makes it popular for many image analysis tasks. The subjects covered in this chapter provide an adequate background for the automatic image- based defect detection methodology discussed in the following chapters. 72 Chapter 4 Adaptive Vision-Based Crack Detection and Quantification by Incorporating Depth Perception Using 3D Scene Reconstruction 4.1 Introduction In the past two decades, efforts have been made to implement image-based technology in crack detection methods. An automatic crack detection procedure in welds based on magnetic particle testing (Coffey, 1988) was introduced by Ho et al. (1990). This method can only be used on ferromagnetic materials. First, the testing surface is sprayed with white paint to reduce the initial noise of subsequently captured images. Next, a magnetic field is applied to the weld. Then magnetic ink made of small magnetic particles suspended in oil is sprayed over the testing surface. The change of flux density at the crack causes the magnetic particles to trace out the shape of the crack on the weld surface. Lastly, an image of the prepared surface is captured and cracks are detected by means of the Sobel edge detection operator (Duda and Hart, 1973, Gonzalez et al., 2004) and by implementing a boundary tracing algorithm. The results were satisfactory as 73 reported by Ho et al. (1990), but clearly this technique has drawbacks since a preprocessing step is required. Tsao et al. (1994) composed image analysis and expert system modulus to detect spalling and transverse cracks in pavements. The overall accuracy of the system for detecting spalling and transverse cracks was reported to be 85% and 90%, respectively (Chae, 2001). Kaseko et al. (1994) and Wang et al. (1998) used the image processing and neural network techniques to detect defects in pavements. Siegel and Gunatilake (1998) developed a remote visual inspection system of aircraft surfaces. To detect cracks, their proposed algorithm detects rivets, since cracks propagate on rivet edges. Multi-scale edge detection is used to detect the edges of small defects at small scales and the edges of large defects at large scales. By tracing edges from high scale to low scale, it is possible to define the propagation depth of edges. Using other features based on wavelet transformation (Abtoine et al., 2004, Prasad and Iyenger, 1997) and a trained back-propagation neural network (Duda et al., 2001) cracks can be classified from other defects such as scratches. Corroded regions can also be detected by defining features based on 2D discrete wavelet transformation of the captured images and using a neural network classifier (Siegel and Gunatilake, 1998). Nieniewski et al. (1999) developed a visual system that could detect cracks in ferrites. A morphological detector based on top-hat transform (Salembier, 1990) detects irregular changes of brightness, which could lead to crack detection. k-Nearest Neighbors (Duda et al., 2001) is used as a classifier to classify cracks from grooves. The outcome of this study is very promising, and this technique is quite robust despite the presence of noise, unlike other edge detection operators used for crack extraction. Moselhi and Shehab-Eldeen (2000) used the image analysis techniques and the neural network to automatically detect and classify the defects in sewer pipes. The accuracy rate of the proposed algorithm is 98.2% as reported by the authors. 74 Chae (2001) proposed a system consisting of image processing techniques along with the neural networks and fuzzy logic systems for automatic defect (including cracks) detection of sewer pipelines. Benning et al. (2003) used photogrammetry to measure the deformations of reinforced con- crete structures. A grid of circular targets is established on the testing surface. Up to three cameras capture images of the surface simultaneously. The relative distances between the centers of adjacent targets make it possible to monitor the evolution of cracks. Abdel-Qader et al. (2003) analyzed the efficacy of different edge detection techniques in iden- tifying cracks in concrete pavements of bridges. They concluded that the Fast Harr Transform (FHT), which is a wavelet transform with mother wavelet of Harr, has the most accurate crack detection capability in contrast with Fast Fourier transform, Sobel, and Canny edge detection operators (Alageel and Abdel-Qader, 2002, Bachmann et al., 2000). A study on using computer vision techniques for automatic structural assessment of under- ground pipes has been done by Sinha et al. (2003). The algorithm proposed by Sinha et al. (2003) consists of image processing, segmentation, feature extraction, pattern recognition, and a proposed neuro-fuzzy network for classification. Giakoumis et al. (2006) detected cracks in digitized paintings by thresholding the output of the morphological top-hat transform. Sinha and Fieguth (2006b) detected defects in underground pipe images by thresholding the morphological opening of the pipe images using different structuring elements. Abdel-Qader et al. (2006) proposed algorithms based on Principal Component Analysis (PCA) to extract cracks in concrete bridge decks. Yu et al. (2007) introduced an image-based semi-autonomous approach to detect cracks in concrete tunnels. Yamaguchi and Hashimoto (2006) proposed a crack detection approach based on a percolation model and edge information. Chen et al. (2006) introduced a semi-automatic measuring system for concrete cracks using multi-temporal images. 75 Recently, Fujita and Hamamoto (2009) proposed a crack detection method in noisy concrete surfaces using probabilistic relaxation and a locally adaptive thresholding. Jahanshahi et al. (2009) surveyed and evaluated several crack detection techniques in conjunction with realistic infrastructure components. 4.1.1 Contribution In all of the above studies, many important parameters (e.g., camera-object distance) are not con- sidered or assumed to be constant. In practical circumstances, the image acquisition system often cannot maintain a constant focal length, resolution, or distance to the object under inspection. In the case of nuclear power plants, for instance, the image acquisition system needs to be located a significant distance from the reactor site. To detect cracks of a specific thickness, many of the pa- rameters in these algorithms need to be adaptive to the 3D structure of a scene and the attributes of the image acquisition system; however, no such study has been reported in the open litera- ture. The proposed approach in this study gives a robotic inspection system the ability to detect and quantify cracks in images captured from any distance to the object, with any focal length or resolution. In human vision, depth perception allows a person to estimate the size of an object based on the distance to the object. In this study, a contact-less crack detection and quantification approach based on depth perception is introduced to segment crack-like patterns. First, several pictures of a scene are captured from different views. By solving the Structure from Motion (SfM) problem (Snavely, 2008), the sparse structure of a scene as well as the camera’s position, orientation, and internal parameters for each view are determined. By scaling the reconstructed sparse 3D model of a scene, the depth perception is obtained. Subsequently, a morphological crack segmentation operator is introduced. The structuring element parameter for this operator is automatically ad- justed based on the camera focal length, object-camera distance, camera resolution, camera sensor 76 size, and the desired crack thickness. Appropriate features are extracted and selected for each seg- mented pattern using the Linear Discriminant Analysis (LDA) (Fisher, 1936) approach. A trained Neural Network (NN), a Support Vector Machine (SVM), and a nearest-neighbor classifier are used to classify real cracks. The performance of these classifiers in the problem of interest are discussed. Finally, a multi-scale approach is introduced to obtain a crack map. The proposed methodology is also effective for other pattern analysis purposes (e.g., texture analysis) which are not discussed in this paper. Moreover, a contact-less crack thickness quantification procedure is introduced, which has been lacking in previous NDT approaches. The proposed approach is also based on the depth per- ception of the scene. First, a novel method is introduced to extract the tangent and the thickness orientation at each crack centerline. Then, for each centerline pixel, the pixels in the crack map that are aligned with the corresponding thickness orientation are counted in the horizontal and ver- tical directions. Finally, a method to compensate for the perspective error is introduced. Validation tests are performed to evaluate the capabilities, as well as the limitations, of this methodology. Guidelines are presented for optimizing the acquisition and processing of collected pho- tographs so as to enhance the quality and reliability of the crack detection and quantification approach, allowing the capture of even the slightest damage or deterioration (e.g., cracks), which are routinely encountered in realistic field applications. 4.1.2 Scope Section 4.2 discusses the interaction of different image acquisition parameters. In Section 4.3, the 3D scene reconstruction is introduced. Section 4.4 is dedicated to crack detection. Crack quantification methodology is described in Section 4.5. Experimental results and discussion are presented in Section 4.6. Section 4.7 includes the summary. 77 Figure 4.1: The geometric relation between image acquisition parameters of a simple pinhole camera model. 4.2 Image Acquisition System Using a simple pinhole camera model, the relation between different image acquisition parameters is shown in Eqn. (4.1): SF = WD FL SS SR n; (4.1) where SF (mm) is the size of a pattern (e.g., crack thickness) represented by n pixels in an image, WD (mm) is the working distance (camera-object distance), FL (mm) is the camera focal length,SS (mm) is the camera sensor size, andSR (pixels) is the camera sensor resolution. The geometric relation between these parameters is shown in Figure 4.1. The camera sensor size can be obtained from the manufacturer and the camera sensor resolu- tion is known from the image size. The measurements for the working distance, and the camera focal length are needed to quantify ann-pixels feature. These two parameters can by estimated as described below. In this study, the above equation is used to optimize the acquisition and processing parameters to detect and quantify cracks reliably. In order to obtain reliable crack quantification results, based on experiment, we recommend to select the image acquisition system parameters in a way that the thickness of the thinnest crack would be represented by six pixels in an image. 78 Guidelines for Image Acquisition System Specifications For an accurate measurement, a minimum of two pixels should be used to represent the smallest feature that needs to be detected (i.e.n = 2 in Eqn. (4.1)). Suppose that the size of the minimum detectable crack is 0.2 mm and the working distance is 25 m. Based on Eqn. (4.1), Figs. 4.2 and 4.3 show the relation between the focal length, the sensor width size, and the sensor width resolution for 25 m and 35 m working distances, respectively. Note that, the camera orientation is assumed to be perpendicular to the object’s plane. Since the working distance is huge, there is a need to use a telephoto lens with the camera. Alternatively, a combination of a camera, lens and teleconverter can be used. By selecting a camera, one can get the sensor size and the resolution of the camera and consequently find the appropriate focal length (which will be used to select the telephoto lens) from Figs. 4.2 or 4.3. Reversely, by having the lens focal length, one can use the same figures to select the appropriate camera for detecting a 0.2 mm crack. Finally, based on the selected image acquisition system, the field of view (area under inspection that camera can acquire) should be computed to check if it is reasonable or not. Note that the greater the sensor size, the greater the field of view will be for a constant focal length and working distance. It is worth mentioning that based on our experience, in order to quantify the crack thickness with high accuracy, a crack should have been represented by at least 6 pixels. In this case, Figs. 4.2 and 4.3 can be generated by using Eqn. (4.1) and assumingn = 6. Now, we show how to select the image acquisition system. Assume that the selected camera is a Canon EOS 7D and the following parameters are given: Working distance = 25 m Minimum crack size = 0.2 mm Image sensor size: 22.3 14.9 mm Maximum resolution: 5184 3456 pixel 79 Figure 2. Resolution, focal length and sensor size relation for 25 m working distance and 0.2 mm crack size. Figure 3. Resolution, focal length and sensor size relation for 35 m working distance and 0.2 mm crack size. Figure 4.2: Resolution, focal length and sensor size relation for 25 m working distance and 0.2 mm crack size. The focal length needed to extract a 0.2 mm crack from a distance of 25 m form Eqn. (4.1) will be: Focal Length = 2 (25000 mm 22.3 mm) / (0.2 mm 5184 pixel) = 1074 mm Alternatively, Figure 4.2 can be used to get the same result. So, a super telephoto lens with 1074 mm focal length is needed. One choice could be Canon EF 1200 mm f/5.6L USM which has the focal length of 1200 mm This lens is a prime lens. Note that a prime lens is either a photographic lens whose focal length is fixed or it is the primary lens in a combination lens system. Consequently, the field of view will be: Field of View (height) = 14.9 mm (25000 mm / 1200 mm) = 310 mm Field of View (width) = 22.3 mm (25000 mm / 1200 mm) = 465 mm 80 Figure 2. Resolution, focal length and sensor size relation for 25 m working distance and 0.2 mm crack size. Figure 3. Resolution, focal length and sensor size relation for 35 m working distance and 0.2 mm crack size. Figure 4.3: Resolution, focal length and sensor size relation for 35 m working distance and 0.2 mm crack size. Figure 4.4: Schematic overview and components of the SfM problem. 4.3 3D Scene Reconstruction To create depth perception, the 3D structure of a scene has to be recovered. First, several over- lapping images of the object are captured from different views. The SfM approach aims to opti- mize a 3D sparse point cloud and viewing parameters simultaneously from a set of geometrically matched keypoints taken from multiple views. Figure 4.4 shows the schematic overview of the SfM problem. The SfM system developed by Snavely et al. (2006) is used in this study. In this system, SIFT keypoints (Lowe, 2004) are detected in each image and then matched between all pair of images. The RANSAC algorithm (Fischler and Bolles, 1981) is used to exclude outliers. These matches are used to recover focal length, camera center and orientation, and radial lens distortion 81 parameters (two parameters corresponding to a 4 th order radial distortion model (Bouguet, 2008) are estimated) for each view, as well as the 3D structure of a scene. This huge optimization process is called bundle adjustment. Figure 4.5 shows an example of the SfM problem and the 3D scene reconstruction from sixteen images. Since measuring the camera-object distance is not always an easy or practical task, the recon- structed 3D cloud and camera locations from the SfM problem are used to estimate the working distance; however, the SfM problem estimates the relative 3D point coordinates and camera lo- cations. By knowing how much the camera center has moved between just two of the views, the reconstructed 3D points and camera locations can be scaled. To obtain the absolute camera- object distance, a plane is fitted to the 3D points seen in the view of interest. This can be done by using the RANSAC algorithm to exclude the outlier points. By retrieving the equation of the fitted plane, one can find the intersection between the camera orientation line passing through the camera center and the fitted plane. The distance between this intersection point and the camera center is computed as the working distance. Furthermore, the estimated focal lengths from the SfM problem are in pixels. To scale these quantities, the estimated focal length for each view is scaled by the ratio of the sensor size to the sensor resolution. This means that Eqn. (4.1) can be simplified to: SF = WD FL n; (4.2) whereFL is in pixels. Note that if scene reconstruction is impossible (e.g., not enough views are available), the ap- proximate focal length can be extracted from the image Exchangeable Image File Format (EXIF) file. In this case, one can use Eqn. (4.1) to estimate the interaction of the image acquisition parameters provided that the working distance is given. 82 (a) (b) Figure 4.5: 3D scene reconstruction: (a) sixteen images of a scene taken from different locations, (b) the 3D reconstructed point cloud and camera poses. Each red cone represents a camera that corresponds to one of the images in (a). 83 Figure 4.6: The overview scheme of the proposed crack detection approach. 4.4 Crack Detection A crack detection procedure is proposed herein based on the scene reconstruction approach dis- cussed above. The main elements of the proposed crack detection procedure are segmentation, feature extraction, and decision making. Note that before processing any image, it has to be undistorted using the distortion coefficients obtained from the SfM problem. Figure 4.6 shows the overview scheme of the proposed system. The main elements of the proposed crack detection procedure are data sensing, 3D scene reconstruction, segmentation, feature extraction, and deci- sion making. Note that before processing any image, preprocessing approaches can be used to enhance the image Jahanshahi et al. (2009). 4.4.1 Review of Crack Segmentation Techniques Segmentation is a set of steps that isolate the patterns that can be potentially classified as a defined defect. The aim of segmentation is to reduce extraneous data about patterns whose classes are not desired to be known. Edge-Based Techniques Edge detection techniques can be used to extract crack-like edges in the region of interest where usual edges such as element boundaries do not exist. Two review papers on edge detection tech- niques are provided by Davis (1975) and Ziou and Tabbone (1998). A comprehensive description of several edge detection techniques is reviewed by Pratt (2001). Any edge detection technique should consist of smoothing, differentiation, and labeling. Smoothing is a preprocessing step that 84 reduces noise and may cause the loss of some edges. An edge can be defined as discontinuity or sudden change in the image intensity. This is identical to the derivative of an image having local maximum values at the edges. For this purpose, the gradient of an image is an appropriate tool for identifying edges. The gradient vector of a given imagef(x;y) is defined as in Eqn. (4.3): rf = 2 6 4 G x G y 3 7 5 = 2 6 6 6 6 6 4 @f @x @f @y 3 7 7 7 7 7 5 : (4.3) The magnitude of the gradient, located at (i;j), can be calculated as indicated by Eqn. (4.4): jrf(i;j)j = (G 2 x (i;j) +G 2 y (i;j)) 1 2 : (4.4) For computational simplicity, one can approximate the magnitude of the gradient using Eqns. (4.5) or (4.6): jrf(i;j)j =G 2 x (i;j) +G 2 y (i;j); (4.5) jrf(i;j)j =jG x (i;j)j +jG y (i;j)j: (4.6) The gradient magnitude is zero in areas of constant intensity, whereas in the presence of edges the magnitude is the local maximum. Gradient edge detection can be used to compute the direction of changes as defined in Eqn. (4.7): (i;j) = tan 1 ( G y (i;j) G x (i;j) ): (4.7) Common convolution masks (kernels) for digital estimation ofG x andG y are Sobel (Duda and Hart, 1973, Gonzalez et al., 2004), Roberts (1965), Prewitt (1970), Frei and Chen (1977), and Canny (1986) edge detection operators. Convolving the initial image with one of the first-order 85 derivative edge detection masks both vertically and horizontally generates the approximate gradi- ent magnitude of the pixels. The pixels with values greater than a specified threshold are deter- mined to be edges (labeling). Lower threshold values will lead to detection of more edges while higher values will cause some edges to be undetected. Different techniques have been proposed to select the appropriate threshold (Abdou, 1973, Abdou and Pratt, 1979, Henstock and Chelberg, 1996). Gonzalez and Woods (1992) proposed an automatic way to compute the global threshold by selecting an initial random thresholdT on the histogram of the image. Then 1 and 2 are computed as the average intensity values of the pixels with intensity values that are greater or less thanT , respectively. A newT is then computed as the average of 1 and 2 . This iteration continues until a constantT is achieved. Another category of the first-order derivative edge detection techniques is based on computing the gradient in more than two orthogonal directions by convolving the initial image with several gradient impulse response arrays and then selecting the maximum value of the convolved images with different templates as shown in Eqn. (4.8): G(i;j) = maxfjG 1 (i;j)j;jG 2 (i;j)j;:::;jG N (i;j)jg; (4.8) whereG k (i;j) is the result of convolving the initial imagef(x;y) with thek th gradient response array. Since the intensity of pixels in an image changes rapidly at the edges (the first derivative has a local maximum), the second derivative will have a zero crossing. The second-order derivative off(x;y) can be computed by the Laplacian operator as defined in Eqn. (4.9): r 2 f(x;y) = @ 2 f(x;y) @x 2 + @ 2 f(x;y) @y 2 : (4.9) 86 In order to compute the second derivative of an image, a window mask is convolved with the image (Eqn. (4.10)): L(x;y) =f(x;y)H(x;y): (4.10) A simple four-neighbor Laplacian mask is shown in Eqn. (4.11): H = 2 6 6 6 6 6 4 0 1 0 1 4 1 0 1 0 3 7 7 7 7 7 5 : (4.11) The Laplacian is rarely used for edge detection alone because it is very sensitive to noise, and it cannot detect the direction of edges. Laplacian convolution operators will lead to double-edge detection, which is inappropriate for direct edge detection; however, they can be a complement for other edge detection techniques. Applying a Gaussian smoothing filter to an image and then using the Laplacian of the new image for edge detection yields to Laplacian of the Gaussian operator. Because of its linearity, this detector can be directly applied as the convolution of the initial image with the Laplacian of the Gaussian function (Eqn. (4.12)): r 2 h(x;y) =[ (x 2 +y 2 ) 2 4 ] exp( x 2 +y 2 2 2 ): (4.12) Increasing the window size of the edge detection operator decreases its sensitivity towards noise. Since the Roberts window is the smallest in size, it is very noise-sensitive, and many spots are detected as edges by this operator. The Prewitt operator is weak in detecting diagonal edges (Pal and Pal, 1993). The Sobel operator does not have noise sensitivity since it gives more weight to the pixels closer to the pixel of interest, which is located in the middle of the convolution window. Among the operators mentioned above, the Canny operator has the best performance. This technique considers the edges as the local maxima of the derivative of a Gaussian filter. In 87 other words, the smoothing step is imbedded within the operator. Subsequently, the weak edges and the strong edges are extracted by setting two different thresholds. Finally, the strong edges and the weak edges that are connected to strong edges are detected as the real edges. Consequently, less weak edges are falsely detected. Based on experiments on bridge pavements, Abdel-Qader et al. (2003) have concluded that the Canny edge detection technique is more successful in detecting cracks than Sobel and Fast Fourier Transform (FFT) techniques. This result is also confirmed by the authors of the current paper (that the FFT approach has the worst performance). The FFT approach includes the frequency properties of the image in the frequency domain. The mathematical FFT formulation is shown in Eqn. (4.13): F (u;v) = 1 MN M1 X x=0 N1 X y=0 f(x;y) exp(2j( xu M + yv N )); (4.13) where f(x;y) is the MN image, x and y are the spatial coordinates, and u and v are the transformation coordinates in the frequency domain. Since the FFT is highly sensitive to noise, it is not recommended to be used for the problem in question. 88 Abdel-Qader et al. (2003) have demonstrated that the Fast Haar Transform performs even better than the Canny detector for detecting cracks in concrete bridge pavements. Haar is the simplest wavelet whose mother wavelet and scaling function' are shown below: (t) = 8 > > > > > > > > > > > > < > > > > > > > > > > > > : 1 0t< 1 2 1 1 2 t< 1 0 elsewhere ; (4.14) '(t) = 8 > > > > > < > > > > > : 1 0t< 1 0 elsewhere : (4.15) The decomposition and reconstruction filters for this wavelet family are: h ' = 1 p 2 2 6 4 1 1 3 7 5; h = 1 p 2 2 6 4 1 1 3 7 5: (4.16) Abdel-Qader et al. (2003) used the Haar wavelet to get the one-level decomposition of an image as described in Section 3.4. Then, the three details are combined to generate the magnitude image. The threshold is defined as the average intensity value of all pixels in the captured images. The overall accuracy of this technique is 86% as reported by authors (Abdel-Qader et al., 2003). Because the effect of light and the contrast of each image are not considered independently when choosing the threshold, this thresholding is inappropriate. By selecting an independent threshold for each image, a better detection rate is expected. In this approach, no other classification is used to detect cracks from non-crack edges. 89 Mallat and Zhong (1992) have demonstrated that the local maxima of an image wavelet trans- form can be used to extract and analyze multi-scale edges. Siegel and Gunatilake (1998) have used the wavelet filter bank for detecting cracks in aircraft surfaces, where they used a cubic spline and its first derivative as scaling and wavelet functions. This wavelet transform is equiva- lent to applying a smoothing filter on the image followed by taking the derivative of the smoothed image, which is identical to a classical edge detection procedure. Siegel and Gunatilake (1998) applied a three-level decomposition on the region of interest; however, the decomposition algorithm is slightly different from what is described in Section 3.4. This decomposition is defined as applying the high-pass decomposition filter,g i (mother wavelet function), once to the rows and once to the columns separately, which leads to W yi and W xi , respectively, wherei is the decomposition level. The low-pass decomposition filter,h i (scaling function), is applied to the rows and the columns. The wavelet and scaling decomposition filters for the specified wavelet at the three levels are shown below: g 1 = [2;2]; (4.17) g 2 = [2; 0;2]; (4.18) g 3 = [2; 0; 0; 0;2]; (4.19) h 1 = [0:125; 0:375; 0:375; 0:125]; (4.20) h 2 = [0:125; 0; 0:375; 0; 0:375; 0; 0:125]; (4.21) h 3 = [0:125; 0; 0; 0; 0:375; 0; 0; 0; 0:375; 0; 0; 0; 0:125]: (4.22) The schematic procedure described above is presented in Fig. 4.7. In order to extract crack-like 90 Figure 4.7: Multiresolution decomposition of the image used by Siegel and Gunatilake (1998). edges, the magnitude imageM i is computed for each level as follows: M i = q W 2 xi +W 2 yi : (4.23) By choosing a dynamic threshold based on the histogram of each magnitude image, pixel values above the threshold are detected as edge points. Since the direction of a crack varies smoothly, edge points are linked based on 8-neighbors if the difference of the corresponding angles is less than a specific angle (Siegel and Gunatilake, 1998). For this purpose, the angle image of each levelA i is defined as: A i = arctan ( W yi W xi ): (4.24) A one hidden-layer neural network consisting of four neural units, six input units, and one output unit is used to classify cracks from non-crack edges. On aircraft surfaces, the cracks are smaller than non-crack edges (e.g., scratches). On the other hand, an edge that only appears in the first- decomposition level is smaller than an edge that appears in the first two levels. Similarly, the latter is smaller than an edge that appears in all levels of the decomposition. This important characteristic, which was introduced as “propagation depth” by Siegel and Gunatilake (1998), is considered to be one of the selected features. The “propagation depth” represents the number of decomposition levels in which a specific edge has appeared, and also conveys the size information 91 of the edge. A “propagation depth” is assigned to each edge that appears in the first decomposition level. For this reason, a “coarse-to-fine edge linking process” is used, which provides information about an edge from a coarse resolution to a fine resolution. The features assigned to each edge in the first decomposition level, as defined by Siegel and Gunatilake (1998), are: (i). Computed propagation depth number (ii). Number of pixels constituting the edge (iii). Average wavelet magnitude of the edge pixels (iv). Direction of pixels constituting the edge in level one (v). The signs of P W x1 and P W y1 for all pixels belonging to the edge (vi). Average wavelet magnitude of linked edges in levels two and three during “coarse-to-fine edge linking process” The accuracy of this technique in crack detection is 71.5% (Siegel and Gunatilake, 1998). The smaller accuracy of this study with respect to the one done by Abdel-Qader et al. (2003) is due to the problem’s level of complexity. This technique is highly dependent on the direction of light direction during the image acquisition. The steps necessary to obtain better performance of the classification process are: selecting additional features, capturing more images of a surface with different camera orientations (in order to gather and enrich the data with different light directions), and also increasing the population of the training set (Siegel and Gunatilake, 1998). None of the techniques discussed above deal with the problem of major non-defect edges such as structural member edges or background crack-like objects. Another set of techniques that can overcome this shortcoming is discussed in the following section. Morphological Techniques Morphological image processing extracts useful information about the objects of an image based on mathematical morphology. The foundation of morphological image processing is based on 92 previous studies of Minkowski (1903) and Matheron (1975) on set algebra and topology, respec- tively (Pratt, 2001). Morphological techniques can be applied to binary or gray-scale images. Although morphological operations are also discussed in the context of color image processing (Al-Otum, 2003, Comer and Delp, 1999, Yu et al., 2004), the gray-scale operations that are useful for segmenting cracks from the rest of an image are introduced here. Figure 4.8 (a) shows a verti- cal crack on a steel strip caused by a tensile rupture and Fig. 4.8 (b) shows a horizontal crack on a rebar caused by a torsional rupture. The results of performing different morphological operations on these two images are presented later in this paper to give the reader a better understanding of the applications of the described operations. Cm In In Cm (a) (b) Figure 4.8: (a) Vertical crack caused by a tensile rupture of a steel strip, (b) horizontal crack caused by a torsional rupture of a steel rebar. Morphological image processing generally can be used in image filtering, image sharpen- ing or smoothing, noise removal, image segmentation, edge detection, feature detection, defect detection, preprocessing and postprocessing tasks. A brief discussion of some definitions used in morphological approaches follows: Dilation The gray-scale dilation of imageI and the structuring elementS is defined as: (IS)(x;y) = max[I(xx 0 ;yy 0 ) +S(x 0 ;y 0 )j(x 0 ;y 0 )2D S ]; (4.25) 93 whereD S , a binary matrix, is the domain ofS (the structuring element) and defines which neighboring pixels are included in the maximum function. In the case of nonflat structur- ing elements, D S indicates the pixels included in the maximum function as well as their weights (D S is not binary in this case). During the dilation process, which is similar to the convolution process,I(x;y) is assumed to be1 for (x;y)62 D S . In the case of flat structuring elements,S(x 0 ;y 0 ) = 0 for (x 0 ;y 0 )2D S . Flat structuring elements are usually used for gray-scale morphological operations. Visually, dilation expands the bright portions of the image (Salembier, 1990). The results of applying this operation to the images in Fig. 4.8 are shown in Fig. 4.10. Figures 4.9 (a) and (b) show the domain of two flat structuring elements used in Figs. 4.10, 4.11, 4.12, 4.13, 4.14, 4.16, and 4.17. 1’s in Figs. 4.9 (a) and (b) show the pixels to be included in the morphological operation. (a) (b) Figure 4.9: (a) 1 5 flat structuring element domain used in Figs. 4.10, 4.11, 4.12, 4.13, 4.14, 4.16, and 4.17, (b) 91 flat structuring element domain used in Figs. 4.10, 4.11, 4.12, 4.13, 4.14, 4.16, and 4.17. 94 Cm In Cm In (a) (b) Figure 4.10: (a) Dilation performed by a 1 5 structuring element, (b) dilation performed by a 9 1 structuring element. Erosion Similar to above, the gray-scale erosion is defined as: (I S)(x;y) = min[I(x +x 0 ;y +y 0 )S(x 0 ;y 0 )j(x 0 ;y 0 )2D S ]; (4.26) whereI(x;y) is assumed to be +1 for (x;y)62D S . In the case of flat structuring elements, S(x 0 ;y 0 ) = 0 for (x 0 ;y 0 )2 D S . In fact, erosion shrinks the bright portions of a given image (Salembier, 1990). The results of applying this operation to the images in Fig. 4.8 are shown in Fig. 4.11. Cm In Cm In (a) (b) Figure 4.11: (a) Erosion performed by a 1 5 structuring element, (b) erosion performed by a 9 1 structuring element. 95 Morphological Gradient The morphological gradient is defined as the dilated image minus the eroded version of the image, and it can be used to detect edges since it represents the local variations of an image (Gonzalez et al., 2004). Figure 4.12 shows the results of applying this operation to the images in Fig. 4.8. Cm In Cm In (a) (b) Figure 4.12: (a) Morphological gradient performed by a 1 5 structuring element, (b) morpho- logical gradient performed by a 9 1 structuring element. Opening The gray-scale opening of an imageI by structuring elementS can be written as: IS = (I S)S: (4.27) Figure 4.13 shows the results of applying the opening operation to the images in Fig. 4.8. Sinha and Fieguth (2006b) detected the defects in underground pipe images by thresholding the morphological opening of the pipe images using different structuring elements. Closing Similarly, the closing in gray-scale is defined as: IS = (IS) S: (4.28) 96 Cm In Cm In (a) (b) Figure 4.13: (a) Opening performed by a 1 5 structuring element, (b) opening performed by a 9 1 structuring element. Closing is applied to the images in Fig. 4.8, and the results are shown in Fig. 4.14. Cm In Cm In (a) (b) Figure 4.14: (a) Closing performed by a 1 5 structuring element, (b) closing performed by a 9 1 structuring element. While opening is usually used to eliminate sharp bright details, closing is used to remove dark details provided that the structuring element is larger than the details. These prop- erties make the combination of opening and closing very suitable for noise removal and image blurring (Gonzalez et al., 2004). Figure 4.15 shows the one-dimensional opening and closing operations. While the curve represents the gray-scale level, the circles (struc- turing elements) beneath the curve pushing it up illustrate the opening operation, and the circles above the curve pushing it down represent the closing operation. 97 Figure 4.15: One-dimensional scheme of the opening and closing operations. The curve rep- resents the gray-scale level, the circles (structuring elements) beneath the curve pushing it up illustrate the opening operation, and the circles above the curve pushing it down represent the closing operation. In Fig. 4.15, higher values of the curve show brighter pixel. One can conclude from Fig. 4.15 that the subtraction of the opened image from its original version will result in the detection of bright defects. This operation is called top-hat transform (Meyer, 1986, Serra, 1982) and its formula is: T =I (IS): (4.29) The dual form of the above equation is called bottom-hat which is appropriate for detecting dark defects (see Fig. 4.15). Its formulation is: T = (IS)I: (4.30) An example of bottom-hat operation applied to the images in Fig. 4.8 is shown in Fig. 4.16. These two transformations also can be used for contrast enhancement. Giakoumis et al. (2006) detected the cracks in digitized paintings by thresholding the output of the top-hat transform. The shape and size of the structuring element depend on the defect of interest. The structuring element 98 (a) (b) Figure 4.16: (a) Bottom-hat operation performed by a 1 5 structuring element, (b) bottom-hat operation performed by a 9 1 structuring element. has an important role in defect detection using the above morphological image processing tech- niques. For example, to detect small defects such as edges, small structuring elements, preferably flat ones, should be used (Salembier, 1990). Salembier (1990) has proposed and compared different algorithms to improve morpholog- ical defect detection based on top-hat and bottom-hat transformations. He has concluded that the algorithms shown in Eqns. (4.31) and (4.32) can be used to detect bright and dark defects respectively: T =Imin[(IS)S;I]; (4.31) T =max[(IS)S;I]I: (4.32) Applying Eqn. (4.32) to the images in Fig. 4.8 leads to the segmentation of dark cracks as shown in Fig. 4.17. After postprocessing, the final segmented cracks are shown in Fig. 4.18. Nieniewski et al. (1999) used the above equations to extract cracks in ferrites. They used two flat structuring elements: a 1 5 row element for the vertical cracks and a 5 1 column element for the horizontal ones. The authors of the current study applied a square structuring element to truss model images to detect cracks; the results after noise removal were excellent. Even in the presence of occlusions 99 (a) (b) Figure 4.17: (a) Vertical dark crack segmented using morphological techniques, (b) horizontal dark crack segmented using morphological techniques. Cm In Cm In (a) (b) Figure 4.18: Crack segmentation. and different backgrounds, the algorithm is highly successful provided that the camera captures high resolution images and focuses its lens on the specific structural member of interest. Now-a- days almost all commercial digital cameras have active or passive auto-focusing capabilities that can assist in the acquisition of suitable images (Schlag et al., 1983). Figure 4.19 represents examples of applying the above technique on actual images. Figures 4.19 (a), (c), (e), (g), and (i) are the original real steel structural members. Figure 4.19 (i) is a magnification of Fig. 4.19 (g) to give a better view of the structure’s crack. The true cracks that were segmented in Figs. 4.19 (b), (d), (f), (h), and (j) are white and the other segmented objects are non-white ( after postprocessing and noise removal). The domain of flat structuring elements 100 that were used to segment the cracks are as follows: a 70 70 matrix with (both main and minor) diagonal members of 1 and non-diagonal members of 0 (Fig. 4.19 (b)), a 10 10 matrix with 1’s in the minor diagonal and 0’s everywhere else (Fig. 4.19 (d)), a unit matrix of 7 7 (Fig. 4.19 (f)), and a 1 5 structuring element (Figs. 4.19 (h) and (j)). The challenge is to find the appropriate size and format of the structuring element. When the structuring element has a line format, it can segment cracks that are perpendicular to it (see the structuring elements used in Figs. 4.18 and 4.19). A good example of such study is presented by Sinha and Fieguth (2006b), in which they tried to find the optimal size of structuring elements to segment and classify cracks, holes, laterals, and joints in underground pipe images. The morphological operations discussed above, segment the cracks more efficiently than the edge detection operators reviewed in Section 4.4.1. Edge-based techniques extract all the edges in an image, which makes the classification task harder. Basically, edge-based techniques will generate more noise than morphological techniques (compare Figs. 4.17 and 4.20). After extracting reliable, independent and discriminating set of features from the segmented objects, it is the classifier’s task to label each of the segmented objects as crack or non-crack (Section 3.3). Sinha (2000) has used area, number of objects, major axis length, minor axis length, mean and variance of pixels projected in four directions (0 , 45 , 90 , and 135 ) as features for crack detection in underground pipeline systems. 4.4.2 Morphological Operation The proposed morphological operation by Salembier (1990) is slightly modified here to enhance its capability for crack extraction in different orientations. The proposed operation is shown in Eqn. (4.33): T =max[(IS f0 ;45 ;90 ;135 g )S f0 ;45 ;90 ;135 g ;I]I; (4.33) 101 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Figure 4.19: Examples of crack segmentation in real structures: (a) original structural member 1, (b) crack segmentation of image (a) using a 70 70 matrix with (both main and minor) diagonal members of 1 and non-diagonal members of 0 as the structuring element, (c) original structural member 2, (d) crack segmentation of image (c) using a 10 10 matrix with 1’s on the minor diagonal and 0’s everywhere else as the structuring element, (e) original structural member 3, (f) crack segmentation of image (e) using a unit matrix of 77 as the structuring element, (g) original structural member 4, (h) crack segmentation of image (g) using a 1 5 structuring element, (i) magnification of image (g) , (j) crack segmentation of image (i) using a 1 5 structuring element. 102 Cm In Cm In Cm In Cm In (a) (b) Figure 4.20: Extracted edges using Sobel edge detection technique. whereI is the grayscale image,S is the structuring element that defines which neighboring pixels are included in the operation, ‘’ is the morphological opening, and ‘’ is the morphological closing. The output imageT is then binarized using Otsu’s thresholding method (Otsu, 1979) to segment potential crack-like dark regions from the rest of the image. This nonlinear filter extracts the whole crack as opposed to edge detection approaches where just the edges are segmented. This characteristic makes this approach appropriate for crack thickness quantification discussed in Section 4.5. Furthermore, small extracted patterns are eliminated as noise. For this purpose, if the length of a segmented pattern is less than a minimum length, specified by the user, that pattern is elimi- nated. In order to convert minimum length of interest in unit length to minimum length in pixels, Eqn. (4.34) is used: l p = FL WD l; (4.34) where l is the defined length by the user in unit length, FL and WD (obtained from SfM and scaling, as described in Section 4.3) are in pixels and unit length, respectively, andl p is the length in pixels. 103 Figure 4.21 represents the results of applying the above methodology on concrete beam im- ages. The red, green and blue boxes represent the detected cracks of 2, 4 and 6 pixel thickness, respectively. Figure 4.22 represents the crack detection results using the proposed system. The crack thick- ness in this image is 0.4 mm. The camera is located 3 meters away from the concrete model. The red, green and blue boxes represent the detected cracks of 2, 4 and 6 pixel thickness, respectively. There are few false-positive detections. Structuring Element By choosing the size and shape of the structuring element (i.e., neighborhood), a filter that is sensitive to a specific shape can be constructed. When the structuring element has a line format, it can segment cracks that are perpendicular to it. If the length of the structuring element (in pixels) exceeds the thickness of a dark object in an image, then this object can be segmented by the operation in Eqn. (4.33). Consequently, linear structuring elements are defined in 0 , 45 , 90 , and 135 orientations. The challenge is to find the appropriate size for the structuring element. By having the scaled working distance obtained from Section 4.3, the derived formula in Eqn. (4.1) is used to compute the appropriate structuring element. Using this equation, the size of the appropriate structuring element is computed based on the crack size of interest (n is the structuring element size). Figure 4.23 shows the relationship between these parameters and can be used to determine appropriate image acquisition system parameters. 4.4.3 Feature Extraction After segmenting the patterns of interest, it is time to assign them a set of finite values representing quantitative attributes or properties called features. These features should represent the important 104 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Figure 4.21: (a), (c), (e), (g) and (i) are the original crack images obtained from breaking a concrete beam in laboratory. (b), (d), (f), (h) and (j) show the results of the proposed crack detection methodology. Red, green and blue boxes locate the detected cracks of 2, 4 and 6 pixels, respectively. 105 Figure 4.22: Crack of 0.4 mm detected from a 3 m distance. Red, green and blue boxes locate the detected cracks of 2, 4 and 6 pixels, respectively. characteristics that help identify similar patterns. To determine discriminative features useful for classification purposes, this study initially defined and analyzed twenty nine features. Eleven of these features were selected as potentially appropriate features for further analysis. Finally, using the LDA (Fisher, 1936) approach, the following five features were found to be discriminately appropriate (i.e., preserving 99.4% of the cumulative feature ranking criteria) for classification: (1) eccentricity (a scalar that specifies the eccentricity of the ellipse that has the same second- moments as the segmented object), (2) area of the segmented object divided by the area of the above ellipse, (3) solidity (a scalar specifying the proportion of pixels in the convex hull that also belong to the segmented object), (4) absolute value of the correlation coefficient (here, correlation 106 Working Distance / Focal Length (mm/mm) Sructuring Element Size / Crack Size (pixel/mm) 100 200 300 400 500 600 700 0 15 30 45 60 75 90 105 120 135 150 150 0 5 10 15 20 25 30 Sensor Resolution / Sensor Size (pixel/mm) Figure 4.23: Relationship between structuring element size, camera focal length, working dis- tance, crack size, camera sensor size, and camera sensor resolution for a simple pinhole camera model. is defined as the relationship between the horizontal and vertical pixel coordinates), and (5) com- pactness (the ratio between the square root of the extracted area and its perimeter). The convex hull for a segmented object is defined as the smallest convex polygon that can contain the object. The above features are computed for each segmented pattern. 4.4.4 Classification In this study, a feature set consisting of 1,910 non-crack feature vectors and 3,961 synthetic crack feature vectors was generated to train and evaluate the classifiers. About 60% of this set was used for training, while the remaining feature vectors were used for validation and testing. Note that due to the lack of access to a large number of real cracks, randomized synthetic cracks were generated to augment the training database. For this reason, real cracks were manually segmented and an algorithm was developed to randomly generate cracks from them. The non-crack feature vectors were extracted from actual scenes. The performance of several SVM and NN classifiers was evaluated. Eventually, a SVM with a 3rd order polynomial kernel and a 3-layer feedforward 107 Table 4.1: The performance of different classifiers on synthetic data Classifier Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Neural Network 95.57 95.91 97.60 91.36 Support Vector Machine 95.06 94.98 97.85 89.27 Nearest-Neighbor 88.76 91.15 92.30 81.41 NN with 10 neurons in the hidden layer and 2 output neurons were used for classification. A nearest-neighbor classifier was used to evaluate the performance of the above classifiers. Table (4.1) summarizes the performances of these three classifiers. In this table, ‘accuracy’ is the proportion of true classifications in the test set, ‘precision’ is the proportion of true positive classifications against all positive classifications, ‘sensitivity’ is the proportion of actual positives that were correctly classified, and ‘specificity’ is the proportion of negatives that were correctly classified. Since the latter two quantities are insensitive to changes in the class distribution, they were used to evaluate the classifier performances in this study. This table shows that the proposed SVM and NN approaches have very close performances. Their performance is better than a nearest-neighbor classifier. Note that the SVM method is a discrete classifier, whereas the proposed NN approach needs a threshold to act as a discrete classifier. In this study, if the value of the crack output neuron was found to be greater than the value of the non-crack neuron, the pattern was classified as a crack, otherwise, as a non-crack. This is identical to set the threshold equal to 0.5. Figure 4.24 shows the effect of changing the decision making threshold on different perfor- mance indices for the specific NN used in this study. In this figure, ‘positive predictive value’ is the proportion of the correctly classified positives (i.e., cracks), and ‘negative predictive value’ is the proportion of the correctly classified negatives (i.e., non-cracks). For applications where it is expensive to miss a crack (e.g., inspection purposes), it is recommended to select a more conser- vative threshold (i.e., a threshold less than 0.5). As a threshold moves toward one, specificity and positive predictive rates increase while sensitivity and negative predictive rates decrease. This 108 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 40 50 60 70 80 90 100 Classifier threshold Performance index (%) Correct rate Positive predictive value Negative predictive value Sensitivity Specificity Figure 4.24: Effect of decision making threshold on different performance indices for the pro- posed NN. means there will be more false negatives and less false positives. For less sensitive applications, one may select a threshold greater than 0.5. Moreover, Fig. 4.24 helps decide about the appropri- ate threshold for a specific application by considering the performance indices. It is worth noting that if the training set size is infinite, the outputs of the above backpropagation NN can converge to the true a posteriori probabilities (Duda et al., 2001). 4.4.5 Multi-Scale Crack Map In order to obtain a crack map, the crack detection procedure described above was repeated using different structuring elements (i.e., different scales). Note that the extracted multi-scale binary crack map is the union of the detected cracks using different structuring elements. The proposed crack map can be formulated as: J m (u;v) = 8 > < > : 1 9k2 [S min ;m]; C k (u;v) = 1; 0 Otherwise; (4.35) 109 whereJ m is the crack map at scale (i.e., structuring element)m,S min is the minimum structuring element size,C k is the binary crack image obtained by usingk as the structuring element, andu andv are the pixel coordinates of the crack map image. In this study, the structuring elements ofdn min e + 2 todn max e + 10 were used for generating the crack map wherede is the ceiling function, andn min andn max are the structuring element sizes corresponding to the minimum and maximum crack size of interest, respectively. The crack map was used for crack localization as well as quantification. 4.5 Crack Thickness Quantification In many applications (e.g., bridge inspection), crack thickness is measured visually by an inspec- tor. In the case of inaccessible remote regions (e.g., nuclear power plant structures), an inspector uses binoculars to detect and quantify crack thickness. This approach is subjective and highly qualitative. Note that there is no vision-based contact-less reliable crack thickness quantification method that is available in the open literature. Here, a robust quantitative approach for measuring crack thickness is introduced. First, the crack map was thinned using morphological thinning. The remaining pixels were considered as the centerlines of the cracks. In order to measure a crack thickness, the perpendicular orientation to the crack pattern at each centerline pixel had to be identified. To reach this goal, the thinned crack map was correlated with 35 kernels, where these kernels represent equally-incremented orientations from 0 to 175 . Figure 4.25 shows the kernels from 0 to 45 . Other kernels can be constructed based on these kernels. For each centerline pixel, the kernel corresponding to the maximum correlation value repre- sents the tangential orientation of the centerline. Thickness orientation was then defined as the perpendicular orientation to the detected tangential direction. Next, for each centerline pixel, the pixels in the original crack map that are aligned with the corresponding thickness orientation 110 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 4.25: Kernels representing orientations. From the top left to the bottom right column, kernels of 0 , 5 , 10 , 15 , 20 , 25 , 30 , 35 , 40 , and 45 are shown, respectively. Other orientation kernels can be built based on the shown kernels. These kernels are correlated with the thinned crack map to identify the orientation of a crack at each centerline pixel. 111 were counted in the horizontal and vertical directions. Using these two values, the hypotenuse was computed and considered to be the crack thickness. Finally, the crack thickness can be quan- tified in unit length using Eqn. (4.1) or (4.2). Figure 4.26 shows an example of the thickness quantification method described above. The white squares are crack pixels of a larger crack im- age. The blue squares represent the centerline obtained by thinning the crack object. The kernel corresponding to 45 , centered at the red square, has the highest correlation with the thinned pix- els. Consequently, the green squares, which correspond to 135 direction, indicate the thickness orientation at the red square. It is seen that the number of the thickness pixels in the horizontal and vertical directions are both 6 pixels, and the crack thickness at the red square is estimated as 8.5 pixels. This thickness has to be scaled to obtain the thickness in unit length. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 4.26: An example of the proposed thickness quantification method. The white squares are crack pixels of a larger crack image. The blue squares represent the centerline obtained by thinning the crack object. The green squares, which correspond to 135 direction, indicate the thickness orientation at the red square. The number of the thickness pixels in the horizontal and vertical directions are both 6 pixels, and the crack thickness at the red square is estimated as 8.5 pixels. This approach is valid if the camera orientation is perpendicular to the plane of the object under inspection. If this plane is not perpendicular to the camera orientation (i.e., the projection surface and the object plane are not parallel), a perspective error will occur (see Fig. 4.27). In 112 y x z x z ax ax (a) (b) (c) Figure 4.27: Perspective error: (a) the camera orientation is perpendicular to the plane of the object (perspective error does not exist), (b) the camera orientation is not perpendicular to the plane of the object (perspective error exists), and (c) 2D representation of the perspective error about the camera’s ‘y’ axis. order to overcome the perspective error, the camera orientation vector and the normal vector of the object plane are needed. The camera orientation vector was already retrieved using SfM (Section 4.3), and the normal plane can be computed by fitting a plane to the reconstructed 3D points, seen in the corresponding view, by excluding outliers using the RANSAC algorithm. For each centerline pixel, the number of pixels that are aligned with the corresponding thickness orientation are counted in the horizontal and vertical directions. Next, the perspective error compensation for each component is computed as: 0 x = x cos x ; (4.36) where 0 x is the perspective-free component of the crack thickness (for each centerline pixel), x is the measured crack thickness, x is the angle between the camera orientation vector and the fitted plane’s normal vector in thex direction, andx represents either the horizontal or vertical directions. For each centerline pixel, the resultant of the two perspective-free components is the crack thickness. Finally, the crack thickness at each centerline pixel is quantified in millimeters using Eqn. (4.1) or (4.2). 113 A user can interactively select a portion of a crack and the proposed system will average the crack thicknesses for that region. This will improve the robustness of the system in the presence of noise. Algorithm Input: n images of a scene and the camera distance between two of the views For each view: 1) Establish the working distance and camera parameters by solving the SfM problem and scaling the reconstructed scene Crack Detection 2) Establish the appropriate structuring element based on the working distance and the focal length of the view, as well, as the crack thickness of interest 3) Segment the potential crack patterns by applying the described morphological oper- ation in Eqn. (4.33) on the image 4) Compute and assign appropriate features to each segmented pattern 5) Classify cracks from non-crack patterns using a trained classifier (NN or SVM) 6) Repeat steps 2 through 5 for different crack thicknesses of interest and generate the multi-scale crack map as the union of all extracted crack pixels Output: the multi-scale crack map Crack Quantification 7) Extract the centerline of each extracted crack using the morphological thinning op- eration 8) Find the tangential orientation for each centerline pixel by correlating different ori- entational kernels with the binary crack map 9) Estimate the thickness orientation as the perpendicular orientation to the tangent at each centerline pixel 10) For each centerline pixel, compute the crack pixels that are aligned with the thickness orientation 11) Compensate for the perspective error by aligning the view plane with the object plane 12) Compute the thickness in unit length by multiplying the measured thickness in pixels by the ratio between the working distance and the focal length 13) Average the measured thicknesses in a small neighborhood to improve the robustness of the quantification system Output: the crack thickness values 114 Table 4.2: The overall performance of the proposed system using real data Classifier Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Neural Network 79.5 78.4 84.1 74.5 Support Vector Machine 78.3 76.8 84.1 72.0 4.6 Experimental Results and Discussion In order to evaluate the overall performance of the proposed crack detection algorithm, a test set consisted of 220 real concrete crack and 200 non-crack images was used. Table 4.2 summarizes the performance of the detection system for real patterns. The performance of the system based on NN is slightly better than the one based on SVM. So, the former system is used for the rest of the experiments in this study. The minimum length of the detected cracks was set to 10 mm. Figure 4.28 (a) is an image captured 20 meters away from the concrete model. Figure 4.28 (b) shows detected cracks using the proposed crack detection approach. No scene reconstruction took place. The working distance was manually measured as 20 meters. A Canon EOS 7D along with an EF 600mm f/4L IS USM super telephoto lens (i.e., 600 mm focal length) were used to capture the image. The image resolution was 5184 3456 pixels. The minimum thickness of the detected cracks in this figure was 0.1 mm. The image acquisition system was selected based on the guidelines introduced in Section 4.2. The minimum length of the detected cracks was set to 15 mm. This figure includes some background objects, as well as some annexed structural systems, which make it hard to identify cracks; however, the performance of the proposed system is remarkable where almost all cracks were detected and few false negative alarms took place. Note that there are several edges and rusted regions in this figure which correctly have not been detected as cracks. This example shows the capability of the proposed crack detection approach for detecting tiny cracks from a far distance with the presence of several edges, rusted regions, and background objects. 115 (a) (b) Figure 4.28: Detected cracks from a far distance: (a) a concrete model situated 20 meters away from the image acquisition system, (b) the detected cracks are shown in red. Each black box illustrates the boundaries of a correctly detected crack. False negative alarms are surrounded by dashed lines. A Canon EOS 7D along with an EF 600mm f/4L IS USM super telephoto lens were used to capture this image. The minimum thickness of the detected cracks in this figure was 0.1 mm and the working distance was 20,000 mm. For more details, see Fig. 4.29. 116 Figure 4.29 is a gray scale of Fig. 4.28 (a) where the detected cracks are shown in red (dark) color. Note that few objects are false-positive detections in this figure. Figure 4.29: Cracks with 0.1 mm thickness are detected from a 20-meter distance. In order to evaluate the performance of the proposed thickness quantification approach, an experiment was performed as follows. Random shaped synthetic cracks with thicknesses of 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, and 2.0 mm were generated using AutoCAD, and printed using a 600 dpi HP LaserJet printer. Eighteen images with different camera poses were captured of the printed lines to form six image sets. Each set consisted of three views, where the distance between two 117 of the camera centers was known. The images were captured by a Canon PowerShot SX20 IS with a resolution of 2592 1944 pixels. For each image set, the SfM problem was solved and the camera-object distance was retrieved (as explained in Section 4.3). The working distances in this experiment varied between 725 mm to 1,760 mm. First, the cracks were extracted by the multi-scale crack detection approach described in Section 4.4.5. More than 10,000 measurements for each of the above thicknesses were carried out. A total of 94,390 thickness measurements were performed. To increase the robustness of the proposed thickness quantification system, thicknesses within a 5 5 neighborhood of each centerline were averaged. Figure 4.30 shows the mean estimated thicknesses plotted versus the actual thicknesses. The lengths of the confidence intervals are four times the standard deviation of the estimated thick- nesses for each point. The green line shows the ideal relationship between the actual and estimated thicknesses. 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Actual thickness (mm) Mean estimated thickness (mm) Figure 4.30: Mean estimated thicknesses versus actual thicknesses: a total of 94,390 estimations (more than 10,000 estimations for each thickness). The mean estimated thickness for each actual thickness is shown by a ‘’ symbol. The lengths of confidence intervals shown in this figure are four times the standard deviation of estimated thicknesses for each point. The green line shows the ideal relationship between the actual and estimated thicknesses. 118 Figure 4.31 shows the histograms of the estimated thicknesses presented in Fig. 4.30. The horizontal axis of each histogram is normalized by subtracting the mean and then dividing by the standard deviation. A normal histogram with the same mean and standard deviation is superposed on each histogram for comparison purposes. This figure indicates that the majority of estimations are clustered around the mean estimated value. Note that less than 0.15% of the estimated values were located outside of the shown range. These values can be regarded as outliers. −2 0 2 0 200 400 600 800 Frequency Actual thickness 0.4 (mm) µ = 0.52 mm σ = 0.09 mm µ = 0.52 mm σ = 0.09 mm µ = 0.52 mm σ = 0.09 mm µ = 0.52 mm σ = 0.09 mm −2 0 2 0 200 400 600 800 Actual thickness 0.6 (mm) µ = 0.71 mm σ = 0.08 mm µ = 0.71 mm σ = 0.08 mm µ = 0.71 mm σ = 0.08 mm µ = 0.71 mm σ = 0.08 mm −2 0 2 0 200 400 600 800 Actual thickness 0.8 (mm) µ = 0.90 mm σ = 0.09 mm µ = 0.90 mm σ = 0.09 mm µ = 0.90 mm σ = 0.09 mm µ = 0.90 mm σ = 0.09 mm −2 0 2 0 200 400 600 800 Actual thickness 1.0 (mm) µ = 1.08 mm σ = 0.09 mm µ = 1.08 mm σ = 0.09 mm µ = 1.08 mm σ = 0.09 mm µ = 1.08 mm σ = 0.09 mm −2 0 2 0 200 400 600 800 Frequency Normalized estimated thickness Actual thickness 1.2 (mm) µ = 1.22 mm σ = 0.10 mm µ = 1.22 mm σ = 0.10 mm µ = 1.22 mm σ = 0.10 mm µ = 1.22 mm σ = 0.10 mm −2 0 2 0 200 400 600 800 Normalized estimated thickness Actual thickness 1.4 (mm) µ = 1.40 mm σ = 0.11 mm µ = 1.40 mm σ = 0.11 mm µ = 1.40 mm σ = 0.11 mm µ = 1.40 mm σ = 0.11 mm −2 0 2 0 200 400 600 800 Normalized estimated thickness Actual thickness 1.6 (mm) µ = 1.55 mm σ = 0.13 mm µ = 1.55 mm σ = 0.13 mm µ = 1.55 mm σ = 0.13 mm µ = 1.55 mm σ = 0.13 mm −2 0 2 0 200 400 600 800 Normalized estimated thickness Actual thickness 2.0 (mm) µ = 1.95 mm σ = 0.20 mm µ = 1.95 mm σ = 0.20 mm µ = 1.95 mm σ = 0.20 mm µ = 1.95 mm σ = 0.20 mm Figure 4.31: Histograms of estimated thicknesses. The horizontal axis of each histogram is nor- malized by subtracting the mean and dividing by the standard deviation. A normal histogram with the same mean and standard deviation as the estimated values is superposed on each histogram for comparison purposes. Figure 4.32 shows the mean relative errors of the estimated thicknesses shown in Fig. 4.30. The lengths of the confidence intervals shown in this figure are twice the standard deviation of the relative estimation errors for each point. As can be seen, except for the 0.4 mm thickness, the mean errors fall below 20%. As the estimated thickness increases, the mean errors fall below 10%. This is due to an increase in the number of the pixels representing a specific thickness. The higher the number of pixels representing a thickness, the higher the achieved accuracy. In some of the estimations in this experiment, a 0.4 mm thickness is represented by only two pixels. 119 Consequently, the mean error for the 0.4 mm thickness is relatively high. In order to improve the thickness estimation accuracy, either the working distance can be decreased, the focal length can be increased (zoom), the image resolution can be increased, or a combination of these actions can be applied. Based on this study, it is recommended that these parameters be selected in such a way so that a target thickness would be represented with a greater number of pixels (we recommend a minimum of six pixels representing a crack thickness (see Eqn. (4.1)). The mean and standard deviation of relative error for the above 94,390 thickness estimations are 12.24% and 12.64%, respectively. There are many sources of error when quantifying a crack thickness using the above proce- dure, including bundle adjustment errors, scaling errors, crack orientation errors, and pixel repre- sentation errors (i.e., the number of pixels representing a thickness); however, the results of this study indicate that the errors are quite reasonable, and they are amenable to improvement. Due to some rare irregularities in an extracted pattern, a small portion of the thinned image might not rep- resent the exact centerline, which causes errors too. Averaging the neighboring thickness values will help get rid of these outliers. Furthermore, in order to estimate a crack length, the number of pixels representing a crack in the thinned crack map can be computed and scaled through a similar procedure outlined above. In order to illustrate the capabilities, as well as the limitations, of the proposed system, a real crack detection and quantification experiment was performed as follows. Five images were taken from a crack on a concrete surface. The image acquisition system was identical to the one that was used in the first experiment. These images are shown in Fig. 4.33 (a), (b), (c), (d), and (e). Figure 4.33 (f) shows the reconstructed scene and recovered camera poses. The camera distance between the two side views (i.e., Fig. 4.33 (a) and (e)) was 1600 mm. Here, Fig. 4.33 (c) is an example used to detect and quantify cracks. The retrieved working distance and focal length for this view were 966 mm and 15759 pixels, respectively. The working 120 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0 10 20 30 40 50 60 70 80 90 100 Mean relative errors of estimated thicknesses (%) Actual thickness (mm) Figure 4.32: Mean relative errors of thickness estimations. The mean relative errors are shown by ‘’ symbols. Lengths of confidence intervals shown in this figure are twice the standard deviation of the thickness estimation errors at each point. (a) (b) (c) (d) (e) (f) Figure 4.33: Concrete crack: (a), (b), (c), (d), and (e) are crack images on a concrete surface taken from different angles. (f) is the sparse 3D scene reconstruction and recovery of the camera poses. 121 distance varied from 800 mm to 1400 mm. The minimum and maximum crack thicknesses of interest were 0.4 mm and 1.0 mm, respectively. The corresponding minimum and maximum structuring element sizes were 7 and 17 pixels. The minimum length of the detected cracks was set to 15 mm. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 4.34: Crack detection: from left to right, the images in each column correspond to the structural element sizes 9, 11, 15, and 22 pixels, respectively. (a), (b), (c), and (d) are the extracted patterns in Fig. 4.33 (c) using Eqn. (4.33). (e), (f), (g), and (h) are the binarized images using Otsu threshold. (i), (j), (k), and (l) are the multi-scale crack maps. Figure 4.34 (a), (b), (c), and (d) are the extracted patterns using Eqn. (4.33). The correspond- ing structural element sizes were 9, 11, 15, and 22 pixels, respectively. Figure 4.34 (e), (f), (g), and (h) show the corresponding binarized images using the Otsu’s thresholding method (Otsu, 1979). These images have less noise in comparison with their corresponding original ones (Fig. 4.34 (a), (b), (c), and (d)). Figure 4.34 (i), (j), (k), and (l) show the evolution of the multi-scale crack map at different scales (i.e., the structural element sizes of 9, 11, 15, and 22 pixels). To 122 obtain the crack maps, patterns less than 15 mm in length were removed autonomously. To do this, corresponding segmented patterns with less than 244 pixels in a thinned binarized images were removed from the original binarized image. Then, the classification approach explained in Section 4.4 was used to get rid of the rest of non-crack patterns. A NN was used as the classifier. Detected cracks are shown in Fig. 4.35 in red. Each blue box illustrates the boundary of a detected crack. As can be seen, a tiny portion of the crack, where it splits, is not detected. In fact, this pattern was initially extracted, but later removed as its length was less than 15 mm. Moreover, there are several grooves in this image, but only one was detected as a crack (near the top crack). This example shows the robustness of the proposed crack detection system in the presence of noise and crack-like patterns. Figure 4.35: Detected crack: the detected cracks are shown in red. Each blue box illustrates the boundaries of a detected crack. In order to further evaluate the performance of the proposed crack quantification approach, the thicknesses at 15 regions of interest, shown in Fig. 4.36, were computed. To have a reference measurement, a ‘pixel-counting’ method was used. In this approach, a known length was attached to the region under inspection. Then, an image was captured where the view plane is parallel to the scene plane. The scale was computed as the proportion between the known length and number of pixels representing it. Finally, a thickness was determined by multiplying the number of pixels representing the thickness (which was counted manually) by the computed scale. 123 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Scale Diameter = 30 mm Figure 4.36: Thickness measurements at 15 different regions. A ‘pixel-counting’ method is used to obtain the reference measurements at each point. Figure 4.37 shows the result of the computed crack thicknesses using the ‘pixel-counting’ approach and the proposed approach for each of the 15 regions shown in Fig. 4.36. The results of the current study are close to the reference measurements (i.e., ‘pixel-counting’ approach). The maximum difference between the two approaches is less than 0.15 mm. Thus, the proposed approach had been able to quantify real cracks with a reasonable accuracy. Furthermore, in most cases, the proposed approach quantifies the thickness slightly greater than its actual thickness, which is desirable (i.e., conservative) for crack monitoring applications. Note that the proposed study is autonomous and contact-less, as opposed to the ‘pixel-counting’ approach. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Measured thickness (mm) Measurement number Pixel−counting method Current study Figure 4.37: Crack thickness quantification of 15 different regions indicated in Fig. 4.36. 124 On an AMD Athlon II X4 (2.6 GHz) processor, it took 206 seconds to open the five overlap- ping images, extract the keypoints, match them, and solve the SfM problem. More than 85% of this time was dedicated to extracting the keypoints. These components were all implemented in C and C++. No parallel processing was used. Failure modes of the SfM problem include insufficient overlap or texture, ambiguous or repeating textures, bad initialization, and cascading errors (i.e., error propagation due to the algorithm’s mistake in the placement of a camera) (Snavely, 2008). On the same machine, it took 134 seconds to process one of the images for detecting cracks. It took 72 seconds to extract all the thickness information of the cracks. The processing time highly depends on the maximum and minimum crack sizes of interest, and the resolution and the texture of the image under inspection. No parallel processing was used. The crack detection and quan- tification algorithms used in this study were all implemented in MATLAB. The implementation of these algorithms in C or C++ will highly enhance the computational efficiency. 4.7 Summary Current visual inspection of civil structures, which is the predominant inspection method, is highly qualitative. An inspector has to visually assess the condition of a structure. If a region is inaccessible, an inspector uses binoculars to detect and characterize defects. There is an urgent need for developing autonomous quantitative approaches in this field. In this study, a novel crack detection and quantification procedure is introduced. First, images of a scene are captured from different views. By solving the SfM problem, the sparse structure of the scene as well as the camera position, orientation, and internal parameters for each view are determined. By scaling the reconstructed sparse 3D model of the scene, the depth perception of the scene is obtained. A morphological crack segmentation operator is introduced to extract crack-like patterns. The structuring element parameter for this operator is automatically adjusted based on the camera fo- cal length, object-camera distance, camera resolution, camera sensor size, and the desired crack 125 thickness. Appropriate features are extracted and selected for each segmented pattern using the LDA approach. The performances of a NN, a SVM, and a nearest-neighbor classifier are evalu- ated to classify cracks from non-crack patterns. A multi-scale crack map is obtained to represent the detected cracks as well as to quantify the crack thicknesses. In order to quantify a crack thickness, the thickness orientation at each crack centerline is determined. Then, for each centerline pixel, the pixels in the original crack map that are aligned with the corresponding thickness orientation are counted in the horizontal and vertical directions. The hypotenuse length, computed based on these two values, is considered as the crack thickness. In order to obtain more realistic results, an algorithm is proposed to compensate for the perspective errors. Validation tests were performed to evaluate the capabilities, as well as the limitations, of the methodology discussed in this paper. An example of real concrete cracks was also presented to illustrate the performance of the proposed system in the presence of noise. This system is appropriate for incorporation with autonomous or semi-autonomous robotic systems. 126 Chapter 5 Effect of Depth Perception on Texture Analysis: An Application to Corrosion Detection 5.1 Introduction There are few papers published on corrosion detection based on image processing techniques alone; however, the capability of image-understanding algorithms is of great interest since they are contactless, nondestructive methods. Because the corrosion process can deteriorate the sur- face of metals, the corroded surface has a different texture than the rest of the image. Texture can be regarded as the measurement of smoothness, coarseness, and regularity (Gonzalez and Woods, 1992). Most texture segmentation techniques are based on the pattern recognition concepts de- scribed in Section 3.3. Pal and Pal (1993) and Reed and Dubuf (1993) provide a comprehensive review of different segmentation techniques. Although the computational cost is high for large windows (Pratt, 2001), discrete wavelet transform coefficients are powerful tools to characterize the appropriate features for texture classification, since they localize the spatial and frequency characteristics very well (Gunatilake et al., 1997). For subsurface corrosion, a defective area can be recognized based on changes in its surface shape rather than its texture. Stereo cameras are appropriate tools to detect the changes in surface 127 shape, which is useful for detecting the subsurface corrosion. Hanji et al. (2003) used this ap- proach to measure the 3D shape of corroded surface of steel plates. They used a stereoadapter on a regular camera to have the stereovision of the corroded surface. The corresponding regions in the stereo images are detected as the high correlative areas in both images. The 3D model of the surface is then reconstructed for measurement purposes. The results of the technique are in good agreement with the measurements of the laser displacement meter (Hanji et al., 2003). Color is another important attribute of digital image-based corrosion detection. Color image segmentation surveys are provided by Skarbek and Koschen (1994) and Cheng et al. (2001). In this Chapter, the objective is to evaluate the effect of depth perception on detecting corro- sion. The results show that the use of the depth information will improve the performance of the corrosion classification approaches. 5.2 Literature Review Gunatilake et al. (1997) used Daubechies (1992) wavelets of order 6 to detect corrosion areas on aircraft skins. The outcome of their algorithm is a binary image indicating the corroded and non- corroded regions. A three-level wavelet filter bank is used to decompose the image as described in Section 3.4. The low-pass and high-pass filters for this wavelet are shown in Eqn. (5.1): h ' = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0:33267 0:80689 0:45988 0:13501 0:08544 0:03523 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 ; h = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 0:03523 0:08544 0:13501 0:45988 0:80689 0:33267 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 : (5.1) 128 The image is divided into 8 8 pixel blocks. Block-based feature elements lead to a high signal- to-noise ratio and decrease the false detection of corrosion on a surface. Finally, ten features are assigned to any of these non-overlapping blocks. Each feature is the energy of the block computed from the wavelet coefficients from one of the ten decomposed frames. The energy of each decomposed frame is defined as the sum of the square of all pixel values belonging to that frame divided by the sum of the square of all pixel values belonging to all decomposed frames of the block. For a better understanding of this definition, a schematic three-level decomposition of an image is shown in Fig. 5.1, where “L” and “H” stand for low-pass and high-pass filters respectively. Figure 5.1: Three-level wavelet decomposition notation used by Gunatilake et al. (1997). Each feature can be written mathematically as: f j (i) = X (m;n)2B(i) (W j (m;n)) 2 10 X j=1 X (m;n)2B(i) (W j (m;n)) 2 ; (5.2) wheref j (i) is thej th feature of thei th block,W j (m;n) is the wavelet decomposition coefficients of thej th sub-band (as in Fig. 5.1) at (m;n), andB(i) is thei th block. Gunatilake et al. (1997) used a nearest neighbor classifier (as described in Section 3.3.3) to classify corroded regions from corrosion-free areas. The algorithm has a 95% accuracy in detecting corroded regions as reported by the authors. The authors of the current study have used the same features to automatically segment different textures of an image with an unsupervised 129 classifier. The results are acceptable; however, color, a very important attribute, is not included in the feature vector described above. Since corrosion does not always exhibit a repeated texture, and since the lighting in which an image is captured is inconstant, the classical form of texture segmentation described above is incapable of performing reliable defect detection. This problem, along with the aim of integrating several images captured from a single scene to create a corrosion map of a surface, lead to the development of a more sophisticated algorithm that also contains the color characteristics of an image (Siegel and Gunatilake, 1998). In their new algorithm, Siegel and Gunatilake (1998) converted the RGB image into a more uncorrelated color space of YIQ, where Y represents luminance, and I and Q represent chromi- nance information. Equation (5.3) shows how RGB and YIQ color components are related (Buchsbaum, 1968, Pratt, 2001): 0 B B B B B @ Y I Q 1 C C C C C A = 0 B B B B B @ 0:29890 0:58660 0:11448 0:59598 0:27418 0:32180 0:21147 0:52260 0:31110 1 C C C C C A 0 B B B B B @ R G B 1 C C C C C A : (5.3) The Battle-Lemaire (BL) (Strang and Nguyen, 1996) wavelet transform filter is used to obtain a three-level decomposition of theY component without down-sampling at each stage. Conse- quently, ten equal-sized images are obtained as shown in Fig. 5.2. The two-level decomposition of the I and Q components are computed using the normal wavelet transform with down-sampling at each stage. The Y image is divided into 3232 non-overlapping pixel blocks; for each of these 130 (a) (b) Figure 5.2: (a) Two-level wavelet decomposition of an image, (b) three-level wavelet decomposi- tion of an image (notation used by Siegel and Gunatilake (1998)). blocks, ten features from the Y decomposition image and four features from the I and Q decompo- sition images are extracted. Each of the first nine features derived from the Y image is calculated as: f SB k (x;y) = b 2 X i= b 2 b 2 X j= b 2 w:SB 2 k b 2 X i= b 2 b 2 X j= b 2 w:[LH 2 k +HH 2 k +HL 2 k ] ; (5.4) where: w = w(x +i;y +j); (5.5) SB k = SB k (x +i;y +j); (5.6) LH k = LH k (x +i;y +j); (5.7) HH k = HH k (x +i;y +j); (5.8) HL k = HL k (x +i;y +j): (5.9) Equation 5.4 is the ratio of a detailed image, resulting from decomposition and the total energy of all details in that level of decomposition. In this equation,HH,HL, andLH are the decomposed 131 images as shown in Fig. 5.2. The coordinate (x;y) is the center of the blocks in the Y decomposed images,b is the size of the block,SB k is eitherHH,HL, orLH at levelk of the decomposition, andw is a Gaussian weighted mask as described in Section 3.2. In fact, the computation of these nine features is similar to the traditional texture segmentation techniques using wavelet transform where a Gaussian mask is used to define the weighting factor to calculate the energy of each block. The 10 th feature that is extracted from image Y is presented in Eqn. (5.10): f LL 3 (x;y) = 3 X k=1 b 2 X i= b 2 b 2 X j= b 2 w:[(LH 2 k +HH 2 k +HL 2 k ] b 2 X i= b 2 b 2 X j= b 2 w:LL 2 3 ; (5.10) where b is the size of the block, SB k is either HH, HL, or LH at level k of the decomposi- tion, andw is a Gaussian weighted mask. w,SB k ,LH k ,HH k , andHL k are defined by Eqns. (5.5), (5.6), (5.7), (5.8), and (5.9) respectively. Equation (5.10) represents the ratio of the whole energy of all the details and the approximation energy of the Y image after a three-level wavelet decomposition. Four more features are extracted from the two-level wavelet decompositions of the I and Q image components, as shown in Eqns. (5.11) and (5.12): f I k (x 0 k ;y 0 k ) = b 2 X i= b 2 b 2 X j= b 2 w:[LH 2 k +HH 2 k +HL 2 k ] b 2 X i= b 2 b 2 X j= b 2 w:LL 2 2 ; (5.11) 132 f Q k (x 0 k ;y 0 k ) = b 2 X i= b 2 b 2 X j= b 2 w:[LH 2 k +HH 2 k +HL 2 k ] b 2 X i= b 2 b 2 X j= b 2 w:LL 2 2 ; (5.12) where: w = w(x 0 k +i;y 0 k +j); (5.13) SB k = SB k (x 0 k +i;y 0 k +j); (5.14) LH k = LH k (x 0 k +i;y 0 k +j); (5.15) HH k = HH k (x 0 k +i;y 0 k +j); (5.16) HL k = HL k (x 0 k +i;y 0 k +j); (5.17) whereb is the size of the block,SB k is eitherHH,HL, orLH at levelk of the decomposition,w is a Gaussian weighted mask,k is the level of decomposition (herek = 1; 2), and (x 0 k ;y 0 k ) in I and Q is the corresponding coordinate of the block center (x;y) in Y atk th level of decomposition, which can be computed as: (x 0 k ;y 0 k ) = ( x 2 k ; y 2 k ): (5.18) After computing (x 0 k ;y 0 k ), a 32 32 window centered at (x 0 k ;y 0 k ) is selected to calculate the fea- ture values as shown in Eqns. (5.11) and (5.12). The 32 32 windows in I and in Q will overlap as the level of decomposition increases. For the higher levels of decomposition, this process will result in a better estimation of the low-frequency signals in the chrominance characteristics of the image. Equations (5.11) and (5.12) are ratios of the total energy of the details in thek th level of decomposition and the energy of the second level approximation for the I and Q components. After extracting the described features, Siegel and Gunatilake (1998) used a feed-forward neural 133 network consisting of 14 input, 40 hidden, and 2 output neurons. The possible outputs of the algorithm are: corrosion with high confidence, or a corrosion with low confidence, or corrosion- free region. The decision making function is based on the two outputs of the neural network and a thresholdT that is experimentally selected as 0.65. The confidence is defined as the absolute value of the difference between the two outputs: 8 > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > : 8 > > > > < > > > > : output(1)> output(2) & confidence T 9 > > > > = > > > > ; =) corrosion (high confidence) 8 > > > > < > > > > : output(1)> output(2) & confidence< T 9 > > > > = > > > > ; =) corrosion (low confidence) output(1)< output(2) =) corrosion-free (5.19) Because of the above decision-making procedure, it is possible to process multiple images and perform information fusion over captured data. In addition, a single corrosion map can be generated where each region has the largest confidence value extracted from different images. The probability of correct detection for this algorithm is 94% (Siegel and Gunatilake, 1998). Another way to evaluate the color characteristics of a corroded region is to convert the RGB color image into the HSI color space. It is possible to express the color characteristics independent of brightness in the HSI color space. For this reason, the HSI color space is a suitable choice for 134 identifying corroded areas quantitatively (Choi and Kim, 2005). The following equations show how the RGB components of an image can be converted to the HSI components: I = 1 3 (R +G +B); (5.20) S = 1 ( 3 R +G +B )(min(R;G;B)); (5.21) H = cos 1 ( 1 2 [(RG) + (RB)] p (RG) 2 + (RB)(GB) ): (5.22) Figure 5.3 shows the HSI color space in which the saturation varies from 0 to 1, the hue varies from 0 o to 360 o , and the intensity varies from 0 (black) to 1 (white). Figure 5.3: HSI color space. Binarizing the saturation component of an image will segment the pixels that have higher saturation values than the rest of the image, which in many cases can result in segmenting the corroded area. This indicates the significance of the saturation component. Figure 5.4 shows the 135 result of binarizing the saturation component in actual corrosion images using threshold technique described in Section 4.4.1 where the white areas in Figs. 5.4 (b), (d), (f), (h), and (j) are the potential corroded areas. Choi and Kim (2005) classified several types of corrosion defects. They proposed to divide each H and S component into 10 10 pixel blocks, and then treat the histogram of each block like a distribution of random variables. After applying the PCA and varimax approach, it was concluded that the mean H value, the mean S value, the median S value, the skews of the S distribution, and the skews of the I distribution are appropriate features to be assigned to each block for classification (Choi and Kim, 2005). The co-occurrence matrix is used for texture feature extraction based on the azimuth difference of points on a surface. This approach may not be useful for the problem in question since it requires microscopes to capture the images and the magnification factor of the tested images by Choi and Kim (2005) is between 50 and 500, which is far beyond the magnification factor of regular digital cameras. The YCbCr color space, where Y is the luma component and Cb and Cr are the blue-difference and red-difference chroma components, respectively, is also another color space which is used for color analysis in the current study. Equation (5.23) shows how RGB and YCbCr color compo- nents are related (Poynton, 1996): 0 B B B B B @ Y Cb Cr 1 C C C C C A = 0 B B B B B @ 65:481 128:553 24:966 37:797 74:203 112 112 93:786 18:214 1 C C C C C A 0 B B B B B @ R G B 1 C C C C C A + 0 B B B B B @ 16 128 128 1 C C C C C A : (5.23) 5.3 Problem Statement and Conducted Research In the above studies, as well as other texture analysis studies, the depth information (i.e., the interaction of camera focal length, camera-object distance, image resolution, and etc.) is not used for feature extraction and pattern classification. In this study, the objective is to investigate the 136 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Figure 5.4: The result of binarizing the saturation component of original images (a), (c), (e), (g), and (i), in order to extract the corroded area of each image, is presented in images (b), (d), (f), (h), and (j) respectively. 137 effect of depth perception on texture analysis, in particular the detection of corroded regions in steel structures. The traditional approach for texture analysis is to divide the original image into non-overlapping sub-images, compute features for each sub-image, and classify the sub-images as corrosion or non-corrosion. In order to incorporate the color attributes, each sub-image is first decomposed into its color components and features are extracted from each decomposed sub-image. The size of each sub-image is set to be fixed (e.g., 8 8 pixels, 16 16 pixels, or etc.), regardless of the image scale. In this case, an 8 8 sub-image, for instance, may represent a 1mm 2 or 10,000 mm 2 area on the actual object. In this study, a reverse procedure is proposed and evaluated. Instead of having a fixed sub-image size, the physical area on the object is fixed, say a region of 2.5 2.5mm 2 . Then, the corresponding window size for the sub-images is obtained using the depth information. The depth information can be obtained using the 3D scene reconstruction discussed in Section 4.3 or using 3D scanner systems such as Kinect which is developed by Mi- crosoft. Kinect consists of an RGB camera and a depth sensor that provides the depth information for each image pixel. So, in the proposed approach the sub-image window size is not fixed. It is actually variable. The results show that the proposed approach outperforms the traditional ap- proach. Several scenarios based on different color spaces, different number of extracted features and different sub-image sizes are evaluated. In this study, YCbCr, YIQ, and HSI color spaces were used to decompose sub-images and extract features. For each decomposed sub-image, a three-level wavelet filter bank was used to extract ten features. Each feature is the percentage of energy corresponding to the approximation, horizontal, vertical, or diagonal details at each decomposition level. The energy of each decom- posed frame is defined as the sum of the square of all pixel values belonging to that frame divided by the sum of the square of all pixel values belonging to all decomposed frames of the sub-image block. Daubechies (1992) wavelets of order 4 were used to establish wavelet filter banks. The 138 following combinations of the color spaces and features were evaluated to find the discriminative color space and the optimum number of features: YCbCr 30 In this combination, each sub-image was decomposed to its Y , Cb, and Cr components. Ten features were extracted for each decomposed sub-image. The total number of extracted features for each sub-image was 30. YCbCr 20 In this combination, each sub-image was decomposed to its Y , Cb, and Cr components. Ten features were extracted for each of the Cb and Cr decomposed sub-images (i.e., the Y component was excluded for feature selection). The total number of extracted features for each sub-image was 20. YIQ 30 In this combination, each sub-image was decomposed to its Y , I, and Q components. Ten features were extracted for each decomposed sub-image. The total number of extracted features for each sub-image was 30. YIQ 20 In this combination, each sub-image was decomposed to its Y , I, and Q components. Ten fea- tures were extracted for each of the I and Q decomposed sub-images (i.e., the Y component was excluded for feature selection). The total number of extracted features for each sub-image was 20. 139 HSI 30 In this combination, each sub-image was decomposed to its H, S, and I components. Ten features were extracted for each decomposed sub-images. The total number of extracted features for each sub-image was 30. HSI 20 In this combination, each sub-image was decomposed to its H, S, and I components. Ten fea- tures were extracted for each of the H and S decomposed sub-images (i.e., the I component was excluded for feature selection). The total number of extracted features for each sub-image was 20. For each of the above combinations, the following scenarios for the sub-image window were conducted to extract features: 5.3.1 Fixed Sub-Image Window In this scenario, each image was subdivided to sub-images using a fixed window size (e.g., 8 8). Then, the texture and color analysis approach described above was used to extract features and classify each sub-image. This is the regular method of texture analysis. The following sub-images were evaluated for this approach: 8 8, 16 16, 24 24, 32 32, 40 40, 48 48, 56 56, and 64 64. The unit is pixel pixel. 5.3.2 Variable Sub-Image Window In this scenario, each image was subdivided to sub-images where each sub-image represents the same amount of area on the object under inspection. So, depending on depth parameters, the size of the sub-image window is ‘variable’. The following areas of the physical object were evaluated 140 for this approach: 2.5 2.5, 5 5, 7.5 7.5, 10 10, 12.5 12.5, 15 15, 17.5 17.5, and 20 20. The unit is mm mm. 5.3.3 Adaptive Sub-Image Window In this approach, the size of the sub-image was variable as explained above; however, the wavelet high-frequency and low-frequency filters were also resized according to the window size. This scenario is called ‘adaptive’ since the shape and the length of the wavelet filters changes based on the variable sub-image window. Figure 5.5 shows examples of the high-frequency and the low- frequency filters for different sub-image window sizes which were used in the ‘adaptive’ scenario. −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Filter value Adaptive decomposition low−frequency filters 8 × 8 16 × 16 24 × 24 32 × 32 40 × 40 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 Filter value Adaptive decomposition high−frequency filters 8 × 8 16 × 16 24 × 24 32 × 32 40 × 40 (a) (b) Figure 5.5: Adaptive decomposition filters for Daubechies wavelet of order 4 for different sub- image window sizes: (a) low-frequency filters, (b)high-frequency filters. 5.3.4 Resampling Sub-Image Window In this scenario, the sub-image window size was ‘variable’; however, different sub-images were resmapled to represent a 32 32 block. Then, these resampled 32 32 sub-images were used for texture analysis. The idea is to normalize all the variable sub-image windows. 141 5.4 Experimental Results and Discussion In order to train and test the explained scenarios above, several images were captured from cor- roded and non-corroded regions. Then, the corroded and non-corroded pixels were manually classified and labeled. In order to obtain the depth information, a scale was attached to the object under inspection. This process can be replaced by obtaining the depth information from 3D scene reconstruction (see Section 4.3) or using Microsoft Kinect. Several NNs were trained and evaluated for each scenario. The NNs were all three layer feed forward networks. Back propagation method was used to train each network. The input layers consisted of 20 or 30 neurons depending on the number of the selected features. The hidden layer of the NNs were set to have 50 neurons. The output layers consisted of two neurons. To compare the performance of different approaches, the Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) curve was computed for each scenario. ROC is a graphical plot of the true positive rate versus the false positive rate for a binary classifier system as its discrimination threshold is varied. A classifier with a higher AUC performs better than a classifier with a lower AUC. The total number of 2,723 corrosion sub-images and 3,352 non-corroded images were used to train NNs. From these 6,075 sub-images, 70% of the samples were randomly selected to train the NNs. For validation and initial testing, the remaining 30% samples was randomly split in half. A huge testing set which was not exposed to the NNs during training was used to evaluate the performance of different approaches. In order to normalize the classification performances to have a fair comparison, each classified sub-image was divided into unit blocks of 8 8 and the AUC was computed considering these classified blocks. The test set consisted of 845,470 unit blocks. In order to minimize the effect of the training set’s distribution on the NNs, the training and testing were repeated five times. 142 Figure 5.6 shows the effect of different color spaces, number of features and different ‘fixed’ sub-image windows on classification performance. Figure 5.6 (a) shows the mean of AUCs ob- tained from five trained classifiers for different combinations of the color spaces and features. From this figure, it is obvious that the HSI color space, which is closest to human vision system, has the worst performance. This could be due to the fact that this color space is highly nonlinear. Figure 5.6 (b) shows the standard deviation of AUCs. The high variation of standard deviation for the HSI color space, confirms that this color space is not a reliable choice for the described texture analysis approach. In the majority of cases (excluding HSI color space), the 30-feature approach is superior to the 20-feature one. Eventually, the YIQ 30 with 8 8 sub-image window has the highest performance. YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Mean of AUCs Effect of color space, number of features and fixed subimage window on AUC 8 x 8 16 x 16 24 x 24 32 x 32 40 x 40 48 x 48 56 x 56 64 x 64 YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Standard deviation of AUCs Effect of color space, number of features and fixed subimage window on AUC 8 x 8 16 x 16 24 x 24 32 x 32 40 x 40 48 x 48 56 x 56 64 x 64 (a) (b) Figure 5.6: Effect of different color spaces, number of features and different ‘fixed’ sub-image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. Figure 5.7 shows the effect of different color spaces, number of features and different ‘vari- able’ sub-image windows on classification performance. Figure 5.7 (a) shows the mean of AUCs obtained from five trained classifiers for different combinations of the color spaces and features. Figure 5.7 (b) shows the standard deviation of AUCs. From these figures, it is concluded that the HSI color space has the worst performance among other color spaces. In the majority of cases 143 (excluding HSI color space), the 30-feature approach is superior to the 20-feature one. The YIQ 30 with a ‘variable’ window representing a 2.5 2.5mm 2 region has the highest performance. YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Mean of AUCs Effect of color space, number of features and variable subimage window on AUC 2.5 x 2.5 5.0 x 5.0 7.5 x 7.5 10.0 x 10.0 12.5 x 12.5 15.0 x 15.0 17.5 x 17.5 20.0 x 20.0 YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Standard deviation of AUCs Effect of color space, number of features and variable subimage window on AUC 2.5 x 2.5 5.0 x 5.0 7.5 x 7.5 10.0 x 10.0 12.5 x 12.5 15.0 x 15.0 17.5 x 17.5 20.0 x 20.0 (a) (b) Figure 5.7: Effect of different color spaces, number of features and different ‘variable’ sub-image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. Figure 5.8 shows the effect of different color spaces, number of features and different ‘adap- tive’ sub-imaging approach on classification performance. Figure 5.8 (a) shows the mean of AUCs obtained from five trained classifiers for different combinations of the color spaces and features. Figure 5.8 (b) shows the standard deviation of AUCs. From these figures, it is clear that the HSI color space has the worst performance among all other color spaces. By comparing Figs. 5.7 and 5.8, it is concluded that in all cases except for YCbCr 30 with sub-image window of 8 8, the performance of the ‘adaptive’ approach is worst than the ‘variable’ method. In the ‘adaptive’ approach, when the size of the window increases, the corresponding wavelet filters act as an av- eraging filter which essentially discards a portion of the data. That’s why the ‘variable’ approach performs better. Furthermore, the 2.5 2.5 window for the ‘adaptive’ approach has close perfor- mance to the 2.5 2.5 window for the ‘variable’ approach. It is due to the fact that the 2.5 2.5 window corresponds to a smaller averaging window which leads to less discard of the data. The 144 YCbCr 30 and YIQ 30 with an ‘adaptive’ window representing a 2.5 2.5mm 2 region have the highest performances. YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Mean of AUCs Effect of color space, number of features and adaptive subimage window on AUC 2.5 x 2.5 5.0 x 5.0 7.5 x 7.5 10.0 x 10.0 12.5 x 12.5 15.0 x 15.0 17.5 x 17.5 20.0 x 20.0 YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Standard deviation of AUCs Effect of color space, number of features and adaptive subimage window on AUC 2.5 x 2.5 5.0 x 5.0 7.5 x 7.5 10.0 x 10.0 12.5 x 12.5 15.0 x 15.0 17.5 x 17.5 20.0 x 20.0 (a) (b) Figure 5.8: Effect of different color spaces, number of features and different ‘adaptive’ sub-image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. Figure 5.9 shows the effect of different color spaces, number of features and different ‘re- sampling’ sub-imaging approach on classification performance. Figure 5.9 (a) shows the mean of AUCs obtained from five trained classifiers for different combinations of the color spaces and features. Figure 5.9 (b) shows the standard deviation of AUCs. From these figures, it is clear that the HSI color space has the worst performance among other color spaces. By comparing Figs. 5.7 and 5.9, it is concluded that in all cases except for YIQ 30 and YIQ 20 with ‘variable’ sub-image windows of 5 5, the performance of the ‘resampling’ approach is worst than the ‘variable’ method. In the ‘resampling’ approach, when the size of the window is greater than 32 32, the sub-image is down-sampled (i.e., the data is discarded). That’s why the ‘variable’ approach performs better. By comparing Figs. 5.8 and 5.9, it is found that the ‘resampling’ approach has superiority over the ‘adaptive’ approach for all YIQ color space scenarios. For the majority of the YCbCr scenarios the conclusion is reveres. 145 YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Mean of AUCs Effect of color space, number of features and resampling subimage window on AUC 2.5 x 2.5 5.0 x 5.0 7.5 x 7.5 10.0 x 10.0 12.5 x 12.5 15.0 x 15.0 17.5 x 17.5 20.0 x 20.0 YCbCr 30 YCbCr 20 YIQ 30 YIQ 20 HSI 30 HSI 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Color space and number of features Standard deviation of AUCs Effect of color space, number of features and resampling subimage window on AUC 2.5 x 2.5 5.0 x 5.0 7.5 x 7.5 10.0 x 10.0 12.5 x 12.5 15.0 x 15.0 17.5 x 17.5 20.0 x 20.0 (a) (b) Figure 5.9: Effect of different color spaces, number of features and different ‘resampling’ sub- image windows on classification performance: (a) mean of AUCs, (b) standard deviation of AUCs. From the above analyses, the following scenarios for the YIQ color space with 30 features, which had the highest performances, were selected for further analysis: Fixed sub-image window of 8 8 Adaptive sub-image window of 2.5 2.5 Variable sub-image window of 2.5 2.5 Features were extracted from the same training set as before and for each of the above scenarios, NNs were trained and tested ten times. We call this approach the ‘single NN’ approach because one single NN was trained for the whole training data and that NN was used to evaluate the performance of the system. In order to benefit from the depth information associated with each sub-image, a depth index was defined as the physical size that a pixel represents on the object under inspection. Eight ranges where defined and the training set was categorized based on these depth indices. These ranges are shown in Table 5.1. 146 Table 5.1: The categorization of the training samples based on the depth index Category Depth index interval (mm) No. of samples 1 0.046 0.081 213 2 0.081 0.115 540 3 0.115 0.150 625 4 0.150 0.184 1065 5 0.184 0.219 958 6 0.219 0.253 613 7 0.253 0.288 1436 8 0.288 0.323 624 For each depth index, a NN was trained using the corresponding training images in that depth index range. In this way, instead of using one single NN for the whole training set, eight different NNs were trained. We call this approach the ’multiple NNs’ approach. For testing, the proposed system would select the corresponding trained NN based on the depth index of the testing images. The performance of the whole system was evaluated based on the classifications obtained for these NNs. The ‘multiple NNs’ approach was performed for different sub-image windows that are listed above. The training for each NN was repeated ten times using random training samples. All the NNs consisted of a 30-neuron input layer, a 50-neuron hidden layer, and a 2-neuron output layer. Figure 5.10 shows the performance of different proposed approaches. Each bar represents the average AUCs for ten evaluations of the system. Each error bar in this figure represents twice of the standard deviation of AUCs towards the lower and upper bounds (i.e., the length of each error bar is four times of the standard deviation of AUCs). From this figure, it is obvious that the use of ‘multiple NNs’ approach decreases the standard deviation of AUCs dramatically. This means that the ‘multiple NNs’ approach increases the reliability of the system. The performance of the adaptive window is slightly better than the fixed window; however, the variable window increases the performance and reduces the standard deviation (i.e., increases the reliability) more significantly. It is interesting that the ‘single NN’ approach with the variable window outperforms 147 the ’multiple NNs’ approach; however, the reliability of the ‘multiple NNs’ approach is higher. Nonetheless, the reliability of the single NN approach with variable window is much higher than the single NN approaches with fixed and adaptive windows. The slightly lower performance of the multiple NNs approach with variable window with respect to the ‘single NN’ approach with variable window could be due to smaller training samples that were used to train each NN for the ‘multiple NNs’ approach (i.e., training samples which were used for each NN were in fact a fraction of the original training set). The performance of the traditional ‘single NN’ approach with fixed window was improved 8.8% using the proposed ‘single NN’ approach with the variable window. This shows that the proposed variable approach has superiority over the traditional texture analysis approaches since it incorporates the depth perception into the texture analysis. 1 2 3 4 5 6 0.7 0.75 0.8 0.85 0.9 0.95 1 Scenario number AUC Single NN − fixed window Multiple NNs − fixed window Single NN − adaptive window Multiple NNs − adaptive window Single NN − variable window Multiple NNs − variable window Figure 5.10: The performance of different proposed scenarios obtained from ten times training and testing of the NNs. Figure 5.11 shows two corrosion images (Figs. 5.11 (a), (d) and (g)) and the processed images using the ‘single NN’ approach with ‘fixed’ sub-image (Figs. 5.11 (b),(e) and (h)) and ‘variable’ sub-image (Figs. 5.11 (c), (f) and (i)). In the processed images, if a sub-image is classified as corrosion, its color values are preserved. Otherwise, they are changed to zero (i.e., black blocks). This figure indicates the superiority of the proposed ‘variable’ sub-image window where less false 148 positives are detected. The depth index for Figs. 5.11 (c), (f) and (i) are 0.207, 0.069 and 0.60, respectively. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 5.11: Processed images using ‘fixed’ and ‘variable’ sub-image window: (a), (d) and (g) are the original images, (b),(e) and (e) are the processed images using the ‘fixed’ sub-image approach, and (c), (f) and (i) are the processed images using the ‘variable’ sub-image approach. 5.5 Corrosion Quantification After classifying the corrosion sub-images, the physical area that each corrosion sub-image rep- resents can be quantified upon the availability of depth information from 3D scene reconstruction 149 or using Microsoft Kinect. Equation (4.1) can be used to find the physical size of the corroded regions. This will help the decision makers to have a better understanding of the severity of a defected region. 5.6 Summary and Conclusions In order to segment corrosion-like regions from the rest of the image, both texture and color analysis should be used. Multi-resolution wavelet analysis is a powerful tool to characterize the appropriate features for texture classification. In this study, the traditional way of texture analysis is compared with three proposed approaches which are based on the depth information. In traditional wavelet analysis for corrosion detection, an image is divided into sub-images of fixed sizes (e.g., 8 8). In this study, it is shown that if a variable sub-image window size that represents a fixed area on the object under inspection is used as the datum, the classification performance and reliability of the system will dramatically increase. Through several experimental results, it is shown that the incorporation of the depth perception can improve the performance of a corrosion detection system significantly. Several experiments are carried out and the results are presented to evaluate the performance of different proposed scenarios. Different color spaces including YCbCr, YIQ, and HSI are evaluated. The HSI color space is found to be less consistent and it is not appropriate to be used for the proposed multi-resolution wavelet analysis. The effect of the number of features is also investigated. The results show that the performance of the system will improve when the features from all color channels are included in the feature selection process. It is also found that having multiple trained NNs for each depth interval, as opposed to use one single NN for all the depth intervals, will significantly improve the reliability of the system for detectng corrosion. Finally, the corroded regions can be quantified to represent the severity of the defected regions. This system is ideal to be incorporated with mobile robotic systems such as UA Vs. 150 Chapter 6 A Study of an Autonomous Image-Based Interpretation System for Condition Assessment of Sewer Pipelines 6.1 Introduction Since sewer pipelines are mainly buried under ground, their conditions cannot be easily assessed. Approximately 800,000 miles of underground sewer pipelines exist in the United States (Chae and Abraham, 2001). In addition, according to a survey carried out on 64,650 miles of sewer pipelines in the United States, 92% of pipes have a diameter less than 36 inches, which makes them inaccessible for direct human inspections (Hashemi and Najafi, 2007). Non-Destructive Testing (NDT) methods are appropriate methods for in-service defect assess- ment of pipelines. Acoustic emission monitoring, Eddy current testing, and ultrasonic inspection (sonar) are some of the non-visual NDT methods that can be used for pipeline defect assessment (Sinha et al., 2003). In this study, however, the emphasis is on visual NDT methods. 151 6.1.1 Motivation The total U.S. investment in civil infrastructure systems has been estimated to be $20 trillion by the National Science Foundation (NSF) (Fieguth and Sinha, 1999) and sewerage systems are one of the six major infrastructure systems in North America (Reyna et al., 1994). According to the Infrastructure Report Card of the American Society of Civil Engineers (ASCE), $1.6 trillion is needed to uplift the nations infrastructure to “good” condition. This budget needs to be spent over a five-year period. The current national wastewater infrastructure grade is “D ” (ASCE, 2005). According to a survey by the U.S. Environmental Protection Agency (EPA) and the states in August 2003, an additional $181 billion must be invested in sewer treatment projects. Sewer pipelines are a major component of the sewer treatment systems. The EPA reported that between three to ten billion gallons of untreated sewage is released each year because of sanitary sewer overflows caused by blocked or broken sewer pipes (EPA, 2004). California’s 100,000 miles of sewers discharge four billion gallons of wastewater daily. In order to uphold California’s wastewater infrastructure at a “B” grade, an annual capital invest- ment of $2.3 billion is needed (ASCE, 2006). Identification and rehabilitation of the wastewater infrastructure’s critical regions are the key tasks of wastewater infrastructure management. Fur- thermore, municipalities are under pressure to apply efficient strategies for managing wastewater infrastructures (Chae, 2001). The average service lifetime of a sewer pipeline is 70 years (Shehab-Eldeen, 2001), and many existing underground pipes were installed 50 years ago (Hashemi and Najafi, 2007). Insufficient inspection and maintenance of sewer pipes are the main reasons behind the poor condition of sewer pipes in Northern America. Closed Circuit Television (CCTV) survey is the most commonly used method for condition assessment of sewer pipelines in the United States (Gokhale et al., 1997). In this technique, a remote control robot carries a television camera into the pipeline. As it moves, the camera 152 captures the inside scene of the pipe. Inspection processes based on CCTV are done manually by human technicians who have to inspect thousands of miles of videotaped pipes to detect defects to prepare an assessment report. This inspection technique is subjective to an inspector’s experience, concentration, and fatigue. Furthermore, this approach is costly. The inspection of videotapes consumes 30% of the total cost for this procedure (Shehab-Eldeen, 2001). An autonomous system that can interpret the video captured by the CCTV can significantly decrease the subjectivity and inconsistency in interpretation, as well as the time and costs of in- spections. Furthermore, pipes can be inspected more regularly by using such a system. Effective maintenance programs, which require regular inspections, will improve the performance and life- time of sewer pipes, and consequently save large amounts of money. 6.1.2 Scope In Section 6.2, several sewer pipeline inspection systems are introduced. Proposed automatic condition assessment systems for sewer pipeline assessment, based on the SSET scans and the CCTV surveys, are reviewed in Section 6.3. The summary of state-of-the-art is presented in Section 6.4. Finally, the proposed future study for this chapter is described in Section 6.5. 6.2 Sewer Pipeline Inspection Systems 6.2.1 CCTV The condition of most underground sewer systems was unknown until the development of the CCTV in the 1960s (Reyna et al., 1994). There are two types of CCTV methods used for pipeline defect assessment. In the first less common approach, a stationary television camera is placed at the manhole and the condition of the pipe is assessed by zooming the camera. In the second approach, the camera is mounted on a mobile remote control robot that can move along the pipe. 153 The inside view of the pipe is captured by the camera and recorded on videotapes or in a digi- tized format. An operator can control the robot and the camera to look at suspicious defects more closely; however, the main defect assessment is performed off-line by an expert inspector, based on documented criteria. This method works well for gross defects; however, the inspector’s (or operator’s) distractions or inexperience makes this method error-prone. Inconsistency is another disadvantage of this type of assessment. Different inspectors may assess the same videotape of a pipeline differently, due to their different experiences or personal judgements. In fact, an inspector may assess the same scene differently at different times. This method is also very time-consuming since the inspector has to watch miles of videotaped pipelines to prepare the inspection report. A long and tedious inspection process can adversely affect the inspector’s level of concentration. Nonetheless, this method is the most accepted technique for sewer pipeline assessment by mu- nicipalities in North America, because it costs less than other approaches; however, it does not provide a high quality view of the inner side of sewer pipes. Figure 6.1 (a), (b), (c), and (d) show a CCTV underground inspection process, a CCTV camera, the inside of a CCTV trailer, and a CCTV video monitor, respectively. Henry and Luxmoore (1996) developed a profiling attachment for CCTV cameras. A ring of light is projected into a pipeline and viewed by the camera. The shape of the light ring is analyzed mathematically to detect any distortion in the pipeline. Duran et al. (2003) proposed incorporating a laser-based transducer into the CCTV system to overcome some of the shortcomings of the CCTV system. The proposed system identifies the defects and categorizes them using a neural network system. The main contribution of the proposed system is the creation of an autonomous inspection system based on the extraction and fusion of image intensity and positional data from a camer/laser-based profiling sensor (Duran et al., 2007) . 154 CCTV Camera CCTV Trailer Manhole Sewer Pipeline Multifunction Cable (a) (b) (c) (d) Figure 6.1: (a) CCTV underground inspection process, (b) CCTV camera, (c) inside CCTV trailer, (c) CCTV video monitor. 6.2.2 KARO KARO is a semiautonomous multi-sensor robotic inspection system developed by industrial re- search institutes in Germany. The KARO robot has four wheels that are controlled independently. To avoid undesired direction changes due to the slippery surfaces of pipes, a smart anti-slip mech- anism is embedded in the KARO’s robotic system (Kuntze and Haffner, 1998). A high resolution TV camera as well as ultrasonic sensors, a microwave sensor, and a 3D optical sensor are mounted on the system (Gokhale et al., 1997). Wall thickness and defects covered by mud layers are de- tected using the ultrasonic sensor. Damages that are behind pipe walls are detected using the microwave sensor. The 3D optical sensor is utilized to detect surface defects (Shehab-Eldeen, 2001). The KARO system is capable of detecting deformations, obstacles, cracks, wall thickness, and leakage holes (Chae, 2001, Shehab-Eldeen, 2001). A fuzzy-logic sensor fusion system is 155 used as the damage diagnosis technique within the KARO system (Kuntze and Haffner, 1998). The KARO system relies on the operator to activate the appropriate sensors when detecting a type of defect, a process that is subjective to the operator’s experience. 6.2.3 MAKRO MAKRO is an autonomous and untethered multi-segment robot platform. The MAKRO robot consists of six segments; most of the external sensors are mounted on the two identical end seg- ments of the robot. The system also has an embedded computer, control software, batteries, video equipment, 21 motors, external sensors, and many internal sensors (which serve to control the posture of the robot). Given a starting and ending point and a topography map of the sewer, the MAKRO inspects the pipe’s inner sourface automatically. Figure 6.2 (a) and (b) show the front view and top view of a MAKRO robot, respectively. The ultrasound range sensor at the end (as shown in Fig. 6.2 (a)) is used to detect obstacles. Two infrared sensors are mounted on the robot to detect laterals. More details about the MAKRO project are presented by Rome et al. (1999). The MAKRO project was active between 1997 and 2000. Since 2001, the MAKRO project has been continued as an internal project at the Fraunhofer Institute for Autonomous Intelligent Sys- tems (AIS) and Forschungszentrum Informationstechnik (FZI). The system is currently not for commercial use. 6.2.4 PIRAT Upon investigation by the Commonwealth Scientific and Industrial Research Organization (CSIRO) (Australia’s national science agency) regarding the feasibility of automatic assessment of CCTV videotapes, Pipe Inspection Real-Time Assessment Technique (PIRAT) was developed by CSIRO and Melbourne Water (Kirkham et al., 2000). The PIRAT system utilizes a sonar sensor to de- termine the pipe’s radius and a laser sensor to detect surface defects (Chae, 2001). It can also 156 LED light mount Right camera Right IR sensor Left camera Left IR sensor Left laser Right laser Ultrasound sensor (a) (b) Figure 6.2: (a) Front view of the MAKRO robot (Rome et al., 1999), (b) top view of the MAKRO robot. construct a cylindrical-polar geometric model of the pipeline’s inner surface. Data preprocess- ing, segmentation, neural network classification, and a knowledge-based interpretation system are used to analyze this model and generate a report (Kirkham et al., 2000). This system was tested over 4.5 km of sewer pipeline in Melbourne. The PIRAT system can only be used in pipes with diameters larger than 600mm, which limits the usage of this technology. The development of PIRAT ceased in 1996 due a change of priorities at Melbourne Water (Shehab-Eldeen, 2001). 6.2.5 SSET The Sewer Scanner and Evaluation Technology (SSET) was developed in 1994 by three Japanese companies: by TAO Grout, CORE, and TGS. The most recent version of the SSET system con- sists of a fisheye lens camera, a gyroscope, and a CCTV camera. The SSET system scans the pipe circumferentially and outputs an unwrapped, digital image. Circumferential scanning eliminates the need for on-site investigation of defects. The gyroscope provides information about the pipe’s shape and any deformations. The CCTV video record of the SSET system gives a better under- standing of the pipe’s inner surface. The speed rate of the SSET system is slow but constant. There is no need for more on-site investigation at any defect encountered. Figure 6.3 shows a 157 SSET sample report. The top row shows an unwrapped circumferential image. The second row shows the location of cracks and joints. The third row is the inclination information provided by the gyroscope, and the CCTV video records are shown at the bottom. High resolution images of inner surfaces make the SSET system an appropriate candidate for autonomous sewer pipeline inspection; however, based on an extensive technical evaluation report (Civil Engineering Research Foundation (CERF), 2001) where the system was evaluated in 13 municipalities in Northern America, this technology is more expensive than the CCTV system. Consequently, CCTV has remained as the dominant technology in the inspection of pipeline systems in the U.S. SSET Side Scan View TM 7 8 9 10 11 12 1 2 3 4 5 6 3 ft 4 ft 5 ft 6 ft 7 ft 8 ft 9 ft 10 ft 11 ft 12 ft 13 ft 14 ft 15 ft 16 ft 17 ft 18 ft 19 ft 20 ft 21 ft 22 ft 7 8 9 10 11 12 1 2 3 4 5 6 3 ft 4 ft 5 ft 6 ft 7 ft 8 ft 9 ft 10 ft 11 ft 12 ft 13 ft 14 ft 15 ft 16 ft 17 ft 18 ft 19 ft 20 ft 21 ft 22 ft -1.0 -0.5 0.0 0.5 1.0 -2.0 -0.8 0.5 1.8 3.0 Meander (inch) Inclination (inch) 3 ft 4 ft 5 ft 6 ft 7 ft 8 ft 9 ft 10 ft 11 ft 12 ft 13 ft 14 ft 15 ft 16 ft 17 ft 18 ft 19 ft 20 ft 21 ft 22 ft 3' 8" - 4' 10" 9 - 3 o'c 4' 3" 11' 2" Miscellaneous Damaged_Areas Joints Inspection ID: MFV3-0262-F Start MH: 4237 End MH: 4231 Figure 6.3: SSET sample report prepared by Blackhawk-PAS, Inc. 6.2.6 PANORAMO PANORAMO is a new technology for sewer pipe inspection invented by the IBAK company in Germany. The PANORAMO system consists of two high-resolution digital cameras that are in- stalled at both the front and back ends of a robot. The cameras use 185 wide-angle fisheye lenses. 158 As the robot moves at a speed of 14 inches per second, each point in the pipe is photographed from different view angles. In this way, the whole pipe wall is scanned (as shown in figure 6.4). An unwrapped image (similar to the SSET output) of the inner surface of the pipe is provided for off-site inspection, which is useful for computer-aided measurement of the size and the location of defects. The viewer software of the PANORAMO system provides a virtual environment for the inspector off-site, where he has a 360 panorama view. The inspector can stop at any spot, turn full circles, and zoom the virtual camera. An autonomous defect detection software is provided for the inspector to help him assess the pipes. More information about this technology is provided by M¨ uller and Fischer (2007). (a) (b) Figure 6.4: (a) PANORAMO robot, (b) inspection of a single point from two different positions by the PANORAMO system (M¨ uller and Fischer, 2007). 6.3 Automatic Detection and Classification of Defects in Sewer Pipeline Systems Xu et al. (1998) proposed a fully automated pipe joint inspection system using CCTV videotapes. The procedure involves edge detection, thresholding, thinning, cleaning, and profile analysis using Fourier transform. Although they were unable to achieve a fully automated inspection system, they developed and demonstrated the key principals required. 159 Sinha et al. (1999) developed an automated underground pipe inspection system to interpret the SSET scans. This system uses image processing, segmentation, feature extraction, pattern recognition, and a neuro-fuzzy network for classification. Fuzzy membership functions are used to fuzzify the input features before feeding them into the neural network. This system is designed to classify each pixel of an image into one of the following classes: background, crack, hole, joint, and lateral. Fisher’s linear discriminant is used to enhance the image’s contrast as a preprocess- ing step. Morphological opening (with various circular and rectangular structuring elements) and thresholding are used to segment objects of interest from the image background. The optimum size of the structuring elements is determined to distinguish and segment various defects. The re- sults of the Yakimovsky edge detector (Yakimovsky, 1976) and a ratio edge detector (Touzi et al., 1988) are fused to extract cracks. Both of these crack detectors are directional. The extracted results are post-processed by cleaning and linking operations to segment cracks. Twelve features including area, number of objects, major axis length, minor axis length, mean and variance of pixels projected at 0 , 45 , 90 , and 135 directions are assigned to each segmented crack or hole in the image. These features are used to classify the segmented objects as transverse cracks, longitudinal cracks, diagonal cracks, multiple cracks, mushroom cracks, minor holes, or major holes. The area, number of objects, elongation (ratio of major and minor axis length), extent (ratio of net area and bounding rectangular area), mean and variance of pixels projected at 0 and 90 directions are used as features to classify each segmented joint into three classes: perfect joints, eroded joints, or misaligned joints. The area, number of objects, roundness (calculated as 4*area /(*length 2 )), form-factor (calculated as 4*area/perimeter 2 ), and aspect ratio are used as features to classify segmented laterals into perfect laterals, eroded laterals, or collapsed laterals (Fieguth and Sinha, 1999, Sinha, 2000, Sinha and Fieguth, 2006a,b,c, Sinha et al., 2003, Sinha and Karray, 2002, Sinha et al., 1999). 160 Moselhi and Shehab-Eldeen (1999) proposed a neural network-based system that classifies cracks, infiltration, deposits, cross-sectional reductions, and misalignments in CCTV image frames. Three independent neural networks are proposed for each of the defect classes mentioned above. In fact, using multiple neural networks for each defect class is similar to using multiple inspection experts. If two of the neural networks do not have agreement (similar classification outputs), the output of the third network is used to confirm one of the other two’s classifications. In order to extract cracks, background subtraction, edge detection, dilation, thresholding, and analysis are carried out sequentially. For the case of infiltration extraction, dilation, background subtraction, thresholding, segmentation, and analysis are performed sequentially on the image. For the rest of the defects, a similar procedure is used on the image except that the image is first inverted. Sev- eral feedforward neural networks are developed with a different number of neurons in the input layers and hidden layers. The parameters of the neural networks are optimized by trial and er- ror (Moselhi and Shehab-Eldeen, 1999, 2000, Shehab-Eldeen, 2001). In this system, each frame is treated independently from other frames and the sequential nature of the video streams is not taken under consideration, which can help improve the overall performance of the system. Abraham et al. (2000) and Chae and Abraham (2001) developed a neuro-fuzzy system for the autonomous sewer pipeline condition assessment of the SSET scans. The images are prepro- cessed by changing them to gray scale, using edge detection techniques, and thresholding. Three separate neural networks are used for joints, laterals, and cracks. For the segmented joint images and lateral images, which are binary images, the whole image’s pixel values are fed into the neu- ral networks. Consequently, the neural networks used to extract the attributes of the joints and laterals (such as the number of joints, width of the joints, existence of cracks around the joints, number of laterals, shape factors of laterals, and size of the laterals) has 39,750 (159 250) input neurons. The hidden layers of these feedforward networks consist of five neurons. The output of the neural network depends on the defined attributes of interest (as mentioned above). Since 161 the background noises and cracks have similar intensity values, the preprocessing technique men- tioned above is not appropriate for crack segmentation. Instead, feature extraction using edge detection techniques is used to overcome the complexity of the problem. Fifty nine features are extracted to classify the object of interest as possible noise, joint, or crack. Finally, a fuzzy logic system is used to fuse all the outputs of these neural networks in order to identify, classify, and rate the pipe’s defects (Abraham et al., 2000, Chae, 2001, Chae and Abraham, 2001). None of the above systems are used by the municipalities to automatically detect defects in sewer pipe systems. The SSET system is not used mainly due to its high costs and its low rate of data acquisition. 6.4 Summary of state-of-the-art Approximately 800,000 miles of underground sewer pipelines exist in the United States. Since sewer pipelines are mainly buried under ground, their conditions cannot be easily assessed. Insuf- ficient inspection inadequate and maintenance are generally the major causes of the poor condition of sewer pipes in Northern America. Several sewer pipeline inspection systems including CCTV , KARO, MAKRO, PIRAT, SSET, and PANORAMO are introduced in this chapter. Attempts to develope a system that autonomously assesses sewer pipeline condition by Xu et al. (1998), Sinha et al. (1999), Moselhi and Shehab- Eldeen (1999), and Abraham et al. (2000) are reviewed. None of the developed systems men- tioned in this study are currently used by municipalities in the United States. The majority of these developed systems are based on high-resolution SSET scans and are not suitable for CCTV sur- veys. Although tedious and costly, the CCTV survey system is the most widely used method for condition assessment of sewer pipelines in United States. Inspection processes based on CCTV are done manually by a human technician who must inspect thousands of miles of videotaped 162 pipes to detect defects and prepare an assessment report. This inspection technique is subjective to an inspector’s level of experience, concentration, and fatigue. An autonomous condition assessment of sewer pipes based on CCTV surveys can overcome the above shortcomings. The system should consist of image processing, segmentation, feature extraction, classification, and interpretation modules. The system should utilize the sequential nature of the video frames of the CCTV surveys. The use of neuro-fuzzy systems for classification purposes is a promising approach according to proposed systems’ performance in the former studies. 6.5 Proposed Study We propose the development of an autonomous system that interprets CCTV surveys. Our goal is to provide inspectors a tool that assesses the condition of sewer pipelines and gives information about the video frames captured by the CCTV camera. In this way, the inspector is not required to watch the entire CCTV video. Rather, he can directly check the video intervals that the proposed system suggests. In order to reach this goal, we propose to utilize the sequential nature of the video frames and reading the odometer value at each frame. This has never been attempted in previous studies, but it provides valuable information to inspectors. By reading the odometer, the relative location of the robot is known from the starting manhole. The CCTV robot, which is controlled by an operator, stops at each lateral or defect (e.g., crack, collapsed lateral, etc.) to take a closer look by rotating the camera head. Then, the odometer value is read from the CCTV video screen, and the speed of the robot is computed. Wherever the camera stops, there is a high probability that there exists a lateral or a defect. Note that the frame rate for the CCTV cameras is usually 30 frames per second. This is one of the advantages of reading the odometer value on the video screen. Furthermore, this value can be used to locate the joints, laterals, and other defects for the output report. Figure 6.5 (a) shows a typical CCTV snapshot. The odometer 163 value is printed on the bottom right hand side of this figure. Based on testing several algorithms, we concluded that using the opening morphological operator (described in Section 4.4.1) and thresholding will help find the coordinates of the odometer (Fig. 6.5 (b)). By searching for the first black pixel in the bottom right hand side of Fig. 6.5 (b), the odometer value is found. Since the black area containing the odometer consists of a fixed number of pixels, the odometer box can be segmented from the rest of the scene as shown in Fig. 6.5 (c). By knowing the number (a) (b) (c) 0.914 0.439 1.000 0.039 1.000 0.239 0.173 1.000 0.043 1.000 0.063 0.051 1.000 0.200 0.988 0.063 0.055 1.000 0.784 0.200 0.059 0.043 0.451 0.859 1.000 0.353 0.914 0.200 1.000 0.243 1.000 0.506 1.000 0.628 0.286 (d) (e) Figure 6.5: (a) A typical snapshot of the CCTV video, (b) opening and thresholding of image (a), (c) gray scale segmented odometer area, (d) the segmented digit 5, (e) intensity values of image (d). 164 of pixels for each digit, it is possible to segment the individual digits. Each digit in Figs. 6.5 (a) and (c) is shown by a 7 5 pixel block. Figure 6.5 (d) and (e) show the segmented number 5 and the intensity value of it, respectively. We propose to use these 35 pixel values for each digit block as the feature vector of that block and classify each block as a number between 0 and 9. In this way, the distance of each frame with respect to the starting point is computed. We propose using the nearest neighbor classifier. Our initial evaluation shows that, if the quality of the video is reasonable, the correct rate of the odometer reading with the above technique is more than 90%. We found the performance of the nearest neighbor classifier better than the neural network in this case; however, we intend to improve the performance of the distance reading algorithm by comparing each read value with the values read from the neighboring frames. This will help avoid unreasonable jumps in distance reading values. Joints are important components of sewer pipelines because the majority of cracks and roots exist at the joints. Furthermore, misalignments and infiltrations take place at the joints. For these reasons, the detection of joints is an important task. In CCTV videos, joints usually have brighter intensities than the backgrounds which is useful for joint extraction. We have evaluated several algorithms to extract joints. The algorithms that performed the best were subtraction of the eroded image from the original image, the morphological top-hat operation (Eqn. (4.29)), and the morphological operation shown in Eqn. (4.31). The latter one is the most robust one and needs the least postprocessing and cleaning processes (as in Fig. 6.6 (d)). Figure 6.6 shows an extracted joint using the morphological algorithms mentioned above. The structuring element used in this figure is a 15 15 square. No preprocessing or postprocessing is applied. Note that the colors of Figs. 6.6 (b), (c), and (d) are thresholded and inverted. Once the joint has been extracted, feature extraction, and classification are performed to distinguish joints from non-joint objects. The extracted object’s arc shape, small width-to-length ratio, and centroid coordinates are appropriate features for a joint detection procedure. The structuring element’s direct effect 165 (a) (b) (c) (d) Figure 6.6: (a) A joint on a Vitrified Clay Pipe (VCP) captured on a CCTV video frame, (b) extraction of the joint by subtracting the eroded image from the original image, (c) extraction of the joint by applying the morphological top-hat operation on the original image, (d) extraction of the joint by applying the morphological operation described in Eqn. (4.31) on the original image. The structuring element used is a 15 15 square. Note that the colors of images (b), (c), and (d) are thresholded and inverted. on joint extraction should be considered by optimizing its shape and size. Fusing these three morphological techniques is an appropriate strategy for improving the performance of the joint extractor module. In addition, using an appropriate preprocessing technique will increase the contrast between the joints and the background. Figure 6.7 (a) shows a concrete sewer pipe, and Fig. 6.7 (b) shows an extracted joint using the morphological operation described in Eqn. (4.31). When the CCTV camera reaches a lateral, the operator has to stop the vehicle and rotate the camera’s head to closely view the lateral’s inside. The extraction of laterals helps the system recognize that the vehicle’s pause is due to the closer inspection of a lateral. It is also useful to 166 (a) (b) Figure 6.7: (a) The joint of a concrete pipe captured on a CCTV video frame, (b) extraction of the joint by applying the morphological operation described in Eqn. (4.31) on the original image. The results of using four structuring elements are fused. Vertical, horizontal, and two diagonal elements of length 15 and width 1 are used as the structuring elements. Note that the color of image (b) is thresholded and inverted. reconstruct a pipeline map and locate laterals. Usually, laterals have darker intensities compared to the pipe’s walls. This is due to the shadows created by the camera’s light. Based on our evaluation of several combinations of morphological operators, it is concluded that the bottom- hat operation (Eqn. 4.30) provides the best performance. Figure 6.8 shows the result of applying this operator on a CCTV video frame that contains a lateral. Similar to the case of joints, the arc shape, small width to length ratio, and the centroid coordinates of the extracted object are appropriate features for a lateral detection procedure. Once again, the structuring element’s size and shape must be optimized due to its direct impact on the lateral extraction process. Cracks are crucial defects of sewer pipelines. Cracks are formed generally due to overloading, improper installation, or a change in the soil around the pipe (e.g., settlement of the soil). Cracks can cause infiltration and root penetration into the pipe, which may cause blockage of the pipe. After evaluating different algorithms, we concluded that the morphological bottom-hat operation segments cracks better than other techniques. Figure 6.9(a) shows a typical crack, Fig. 6.9 (b) shows the result of applying the morphological bottom-hat operation on the original image using 167 (a) (b) (c) Figure 6.8: (a) A lateral on a concrete pipe captured on a CCTV video frame, (b) extraction of the lateral by applying the morphological bottom-hat operation on the original image, (c) post- processing by applying a 15 15 median filter. The results of using two structuring elements are fused. Vertical and horizontal elements of length 35 and width 1 are used as the structuring elements. Note that the colors of images (b) and (c) are thresholded and inverted. a 3 35 structuring element, Fig. 6.9 (c) shows the postprocessed (using a 10 10 median filter) and thresholded image, and Fig. 6.9 (d) shows the result of applying the morphological thinning operation, respectively. Based on the above results, we propose using the morphological operations to extract the de- fects. Due to limited access to CCTV video files, we could not evaluate and test the extraction algorithms on other types of defects (e.g., roots and infiltration). Selecting an appropriate prepro- cessing technique will improve the performance of the proposed system. The use of neuro-fuzzy systems is ideal for classification purposes. Fuzzy logic plays an important role in this study in 168 (a) (b) (c) (d) Figure 6.9: (a) A crack on a sewer pipe captured on a CCTV video frame, (b) the result of applying the morphological bottom-hat operation on the original image using a 3 35 structuring element, (c) postprocessing (using a 10 10 median filter) and thresholding of image (b), (d) the result of applying the morphological thinning operation. Note that the colors of images (c) and (d) are inverted. order to fuzzily describe the pipe’s condition. Since some features (e.g., area) have relatively big variations, the use of membership functions will help improve the classifier’s performance. 169 Chapter 7 Conclusions and Future Work Among the possible techniques for inspecting civil infrastructure systems, the use of optical in- strumentation that relies on image processing and computer vision is a less time-consuming and inexpensive alternative to current monitoring methods. This study provides a evaluation of some of the promising vision-based approaches for automatic condition assessment of structures. New vision-based inspection methodologies are develop and several examples are presented to illus- trate the utility, as well as the limitations, of the leading approaches. Visual inspection is the predominant method for bridge inspections. The visual inspection of structures is a subjective measure that relies heavily on the inspector’s experience and focus (attention to detail). Furthermore, inspectors who do not have fear of heights and feel comfort- able with height and lift spend more time finishing their inspection and are more likely to locate defects. Difficulties accessing some parts of a bridge adversely affect the transmission of knowl- edge and experience from an inspector to other inspectors. The integration of visual inspection results and the optical instrumentation measurements gives the inspector the chance to inspect the structure remotely by controlling cameras at the bridge site. This approach resolves the above dif- ficulties and avoids costs of traffic detouring during the inspection. Cameras can be appropriately mounted on the structure. Although the cameras are constrained by translation (i.e., attached to a fixed location), they can rotate in two directions. The inspector thus has the appropriate tools to 170 inspect different parts of the structure from different views. As part of this study, a vision-based inspection tool is developed which gives the inspector the ability to compare the current situation of the structure with the results of previous inspections. In order to reach this goal, a database of images captured by a camera is constructed automatically. When the inspector notices a defect in the current view, he can request the reconstruction of the same view from the images captured previously. In this way, the inspector can track the evolution of the defect through time. The correction of radial distortion is not considered in this study. Radial distortion can be modeled using low order polynomials. Furthermore, selection of blending weights based on the sharpness of the captured images is more of interest for inspection purposes whereas in this study, the closeness of a given pixel to the center of the selected images is used to assign blending weights. Eventually, implementing all of the discussed algorithms in a computer language such as C or C++ will dramatically decrease the computation time and will hasten the online usage of the proposed system. In this study, in order to initially match the keypoints an exact k-nearest neighbors approach is used. The usage of a k-d tree to find the approximate nearest neighbors will significantly improve the speed of the proposed system (Beis and Lowe, 1997). Another part of this study, introduces a novel crack detection and quantification procedure. First, images of a scene are captured from different views. By solving the SfM problem, the sparse structure of the scene as well as the camera position, orientation, and internal parameters for each view are determined. By scaling the reconstructed sparse 3D model of the scene, the depth perception of the scene is obtained. A morphological crack segmentation operator is introduced to extract crack-like patterns. The structuring element parameter for this operator is automatically adjusted based on the camera focal length, object-camera distance, camera resolution, camera sensor size, and the desired crack thickness. Appropriate features are extracted and selected for each segmented pattern using the LDA approach. The performances of a NN, a SVM, and a 171 nearest-neighbor classifier are evaluated to classify cracks from non-crack patterns. A multi-scale crack map is obtained to represent the detected cracks as well as to quantify the crack thicknesses. In order to quantify a crack thickness, the thickness orientation at each crack centerline is determined. Then, for each centerline pixel, the pixels in the original crack map that are aligned with the corresponding thickness orientation are counted in the horizontal and vertical directions. The hypotenuse length, computed based on these two values, is considered as the crack thick- ness. In order to obtain more realistic results, an algorithm is proposed to compensate for the perspective errors. Validation tests were performed to evaluate the capabilities, as well as the limitations, of the methodology discussed in this paper. An example of real concrete cracks was also presented to illustrate the performance of the proposed system in the presence of noise. This system is appropriate for incorporation with autonomous or semi-autonomous robotic systems. The integration of the above crack detection and quantification approaches with Microsoft Kinect is one of the future tasks. Furthermore, as part of the future work, it is desirable to carry out an extensive research to optimize the crack detection and quantification parameters using real life data. Another part of this study, incorporates the depth perception with texture and color analysis to improve the performance of the current vision-based corrosion detection approaches. Multi- resolution wavelet analysis is used to characterize the appropriate features for texture classifica- tion. The tradition way of texture analysis is compared with three proposed approaches which are based on the depth information. In traditional wavelet analysis for corrosion detection, an image is divided into sub-images of fixed sizes (e.g., 8 8). In this study, though, it is shown that if a variable sub-image window that represents a fixed area on the object under inspection is used as the datum, the classification performance for corrosion detection will dramatically increase. Through several experimental results, it is shown that the incorporation of the depth perception can improve the performance of a corrosion detection system significantly. Several experiments 172 are carried out and the results are presented to evaluate the performance of different proposed sce- narios. Moreover, the corroded regions can be quantified to represent the severity of the defected regions. This system is also ideal to be incorporated with mobile robotic systems such as UA Vs. Although tedious and costly, the CCTV survey system is the most widely used method for condition assessment of sewer pipelines in United States. Inspection processes based on CCTV are done manually by a human technician who must inspect thousands of miles of videotaped pipes to detect defects and prepare an assessment report. This inspection technique is subjective to an inspector’s level of experience, concentration, and fatigue. A study of an autonomous CCTV interpretation system for condition assessment of sewer pipelines is conducted in this study. The preliminary results show that the morphological opera- tions are appropriate approaches to extract joints, laterals, and cracks in CCTV surveys. Selecting an appropriate preprocessing technique will improve the performance of the proposed system. The use of neuro-fuzzy systems is ideal for classification purposes. Fuzzy logic plays an impor- tant role in this study in order to fuzzily describe the pipe’s condition. Since some features (e.g., area) have relatively big variations, the use of membership functions will help improve the clas- sifier’s performance. More research needs to be conducted on this part to develop a robust CCTV interpretation system. The above studies, which are based on image processing and computer vision, are promising approaches as nondestructive testing methods for structural health monitoring to complement sensor-based approaches. This dissertation shows the undisputable capabilities of the vision- based approaches for condition assessment of civil infrastructure systems. 173 Bibliography Abdel-Qader, I., Abudayyeh, O., and Kelly, M. E. 2003. Analysis of edge-detection techniques for crack identification in bridges. Journal of Computing in Civil Engineering, 17(4):255–263. Abdel-Qader, I., Pashaie-Rad, S., Abudayyeh, O., and Yehia, S. 2006. PCA-based algorithm for unsupervised bridge crack detection. Advances in Engineering Software, 37(12):771–778. Abdou, I. 1973. Quantitative methods of edge detection. Technical Report USCIPI Report 830, Image Processing Institute, University of Southern California, Los Angeles, CA. Abdou, I. E. and Pratt, W. K. 1979. Quantitative design and evaluation of enhance- ment/thresholding edge detectors. Proceedings of IEEE, 67(5):753–763. Abraham, D. M., Chae, M. J., and Gokhale, S. 2000. Utilizing neural networks for condition assessment of sanitary sewer infrastructure. Proceedings of the 17th IAARC/CIB/IEEE/IFR In- ternational Symposium on Automation and Robotics in Construction, pages 423–427. Abtoine, J.-P., Murenzi, R., Vandergheynst, P., and Ali, S. T. 2004. Two-Dimensional Wavelets and Their Relatives. Cambridge University Press, United Kingdom. ISBN 0-521-62406-1. Achler, O. and Trivedi, M. M. 2004. Camera based vehicle detection, tracking, and wheel base- line estimation approach. IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pages 743–748. Al-Otum, H. M. 2003. Morphological operators for color image processing based on Maha- lanobis distance measure. Optical Engineering, 42(9):2595–2606. Alageel, K. and Abdel-Qader, I. 2002. Harr transform use in image processing. Technical report, Department of Electrical and Computer Engineering, Western Michigan University, Kalamazoo, Michigan. ASCE 2005. Report Card for America’s Infrastructure. Technical report, American So- ciety of Civil Engineers. URL:<http://www.asce.org/files/pdf/reportcard/2005 Report Card- Full Report.pdf>. ASCE 2006. Report Card for California’s Infrastruc- ture. Technical report, American Society of Civil Engineers. URL:<http://www.ascecareportcard.org/Citizen Guides/2006 citizens guide.pdf>. Bachmann, G., Narici, L., and Beckenstein, E. 2000. Fourier and Wavelet Analysis. Springer, New York, NY . 174 Beis, J. S. and Lowe, D. G. 1997. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. Conference on Computer Vision and Pattern Recognition, pages 1000–1006. Puerto Rico. Benning, W., G¨ ortz, S., Lange, J., Schwermann, R., and R.Chudoba 2003. Development of an algorithm for automatic analysis of deformation of reinforced concrete structures using pho- togrammetry. VDI Berichte, (1757):411–418. Bosc, M., Heitz, F., Armspach, J.-P., Namer, I., Gounot, D., and Rumbachc, L. 2003. Automatic change detection in multimodal serial MRI: application to multiple sclerosis lesion evolution. NeuroImage, 20:643–656. Bouguet, J. Y . 2008. Camera calibration toolbox for matlab. http://www.vision.caltech.edu/bouguetj/calib doc/index.html. Brown, M. and Lowe, D. 2007. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74(1):59–73. Brown, M. and Lowe, D. G. 2003. Recognising panoramas. In Proceedings of the 9th Interna- tional Conference on Computer Vision (ICCV2003), pages 1218–1225. Nice, France. Brown, M. A. 2005. Mult-image Matching using Invatiant Features. PhD thesis, The University of British Columbia, Vancouver, British Columbia, Canada. Bruzzone, L. and Serpico, S. B. 1997. An iterative technique for the detection of land-cover transitions in multitemporal remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing, 35(4):858–867. Buchsbaum, W. H. 1968. Color TV Servicing. Prentice Hall Press, Englewood Cliffs, 2nd edition. ISBN 0-13-152389-9. Burt, P. J. and Adelson, E. H. 1983. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217–236. Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679–698. Chae, M. J. 2001. Automated interpretation and assessment of sewer pipeline. PhD thesis, Purdue University. Chae, M. J. and Abraham, D. M. 2001. Neuro-fuzzy approaches for sanitary sewer pipeline condition assessment. Journal of Computing in Civil Engineering, 15(1):4–14. Chang, P. C., Flatau, A., and Liu, S. C. 2003. Review paper: Health monitoring of civil infras- tructure. Structural Health Monitoring, 2(3):257–267. Chang, S. G., Yu, B., and Vetterli, M. 2000. Spatially adaptive wavelet thresholding with context modeling for image denoising. IEEE Transactions on Image Processing, 9(9):1522–1531. 175 Chen, L.-C., Shao, Y .-C., Jan, H.-H., Huang, C.-W., and Tien, Y .-M. 2006. Measuring system for cracks in concrete using multitemporal images. Journal of Surveying Engineering, 132(2):77– 82. Cheng, H. D., Jiang, X. H., Sun, Y ., and Wang, J. 2001. Color image segmentation: Advances and prospects. Pattern Recognition, 34(12):2259–2281. Choi, K. and Kim, S. 2005. Morphological analysis and classification of types of surface corro- sion damage by digital image processing. Corrosion Science, 47(1):1–15. Chong, K. P., Carino, N. J., and Washer, G. 2003. Health monitoring of civil infrastructures. Smart Materials and Structures, 12(3):483–493. Choset, H. and Henning, W. 1999. A follow-the-leader approach to serpentine robot motion planning. Journal of Aerospace Engineering, 12(2):65–73. Chung, H.-C., Liangand, J., Kushiyama, S., and Shinozuka, M. 2004. Digital image processing for non-linear system identification. International Journal of Non-Linear Mechanics, 39:691– 707. Civil Engineering Research Foundation (CERF) 2001. Evaluation of SSET: The Sewer Scanner and Evaluation Technology. ASCE Publications, Washington, DC 20037-1810. CERF Report No. 40551, ISBN 0-7844-0551-4. Coffey, J. M. 1988. Non-destructive testing–the technology of measuring defects. CEGB Re- search, (21):36–47. Collins, R. T., Lipton, A. J., and Kanade, T. 2000. Introduction to the special section on video surveillance. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):745–746. Comer, M. L. and Delp, E. J. 1999. Morphological operations for color image processing. Journal of Electronic Imaging, 8(3):279–289. Coppin, P. R. and Bauer, M. E. 1996. Digital change detection in forest ecosystems with remote sensing imagery. Remote Sensing Reviews, 13(3-4):207–234. Daubechies, I. 1992. Ten Lectures on Wavelets. SIAM, Philadelphia. ISBN 0-89871-274-2. Davis, L. S. 1975. A survey of edge detection techniques. Computer Graphics and Image Processing, 4:248–270. Deer, P. and Eklund, P. 2002. Values for the fuzzy c-means classifier in change detection for remot sensing. Porceedings of the 9th International Conference on Information Processing and Management of Uncertainity, pages 187–194. DeV ore, R. A., Jawerth, B., and Lucier, B. J. 1992. Image compression through wavelet trans- form coding. IEEE Transactions on Information Theory, 38(2):719–746. Duda, R. O. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York. ISBN 0-471-22361-1. 176 Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern classification. Wiley, New York, second edition. Dudziak, M. J., Chervonenkis, A. Y ., and Chinarov, V . 1999. Nondestructive evaluation for crack , corrosion, and stress detection for metal assemblies and structures. Proceedings of SPIE - Nondestructive Evaluation of Aging Aircraft, Airports, and Aerospace Hardware III, 3586:20– 31. Dumskyj, M. J., Aldington, S. J., Dore, C. J., and Kohner, E. M. 1996. The accurate assessment of changes in retinal vessel diameter using multiple frame electrocardiograph synchronised fun- dus photography. Curr Eye Res, 15(6):625–632. Duran, O., Althoefer, K., and Seneviratne, L. D. 2003. Pipe inspection using a laser-based trans- ducer and automated analysis techniques. IEEE/ASME Transactions on Mechatronics, 8(3):401– 409. Duran, O., Althoefer, K., and Seneviratne, L. D. 2007. Automated pipe defect detection and cat- egorization using camera/laser-based profiler and artificial neural network. IEEE Transactions on Automation Science and Engineering, 4(1):118–126. Edgington, D. R., Salamy, K. A., Risi, M., Sherlock, R. E., Walther, D., and Koch, C. 2003. Automated event detection in underwater video. Oceans Conference Record (IEEE), 5:2749– 2753. EPA 2004. Report to congress on impacts and control of Combined Sewer Overflows (CSOs) and Sanitary Sewer Overflows (SSOs). Technical report, U.S. Environmental Protection Agency. URL:<http://cfpub.epa.gov/npdes/cso/cpolicy report2004.cfm>. Faugeras, O. 1993. Three-dimensional computer vision: a geometric view point. The MIT Press, Cambridge, MA. ISBN-10:0-262-06158-9, ISBN-13:978-0-262-06158-1. Fieguth, P. W. and Sinha, S. K. 1999. Automated analysis and detection of cracks in underground scanned pipes. IEEE International Conference on Image Processing, 4(19):395–399. Fischler, M. A. and Bolles, R. C. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395. Fisher, R. A. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188. Frei, W. and Chen, C. 1977. Fast boundary detection: A generalization and a new algorithm. IEEE Transactions on Computers, C-26(10):988–998. Fujita, Y . and Hamamoto, Y . 2009. A robust method for automatically detecting cracks on noisy concrete surfaces. Next-Generation Applied Intelligence. Twenty-second International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems IEA/AIE 2009, pages 76–85. Tainan, Taiwan. Garcia-Alegre, M. C., Ribeiro, A., Guinea, D., and Cristobal, G. 2000. Eggshell defects de- tection based on color processing. Machine Vision Applications in Industrial Inspection VIII, 3966:280–287. 177 Giakoumis, I., Nikolaidis, N., and Pitas, I. 2006. Digital image processing techniques for the detection and removal of cracks in digitized paintings. IEEE Transactions on Image Processing, 15(1):178–188. Gokhale, S., Abraham, D. M., and Iseley, T. 1997. Intelligent sewer condition evaluation tech- nologiesan analysis of three promising options. North American No-Dig Conference, pages 254–265. Goldin, S. E. and Rudahl, K. T. 1986. Tutorial image processing system: A tool for remote sensing training. International Journal of Remote Sensing, 7(10):1341–1348. Gonzalez, R. C. and Wintz, P. 1987. Digital Image Processing. Addison-Wesley, Boston, MA, second edition. ISBN 0-201-11026-1. Gonzalez, R. C. and Woods, R. E. 1992. Digital Image Processing. Addison-Wesley, Boston, MA. ISBN 0-201-50803-6. Gonzalez, R. C., Woods, R. E., and Eddins, S. L. 2004. Digital Image Processing Using MAT- LAB. Prentice Hall, Upper Saddle River, NJ. ISBN 0-130-08519-7. Gonz´ alez-Aguilera, D. and G´ omez-Lahoz, J. 2009. Dimensional analysis of bridges from a single image. Journal of Computing in Civil Engineering, 23(6):319–329. Graybeal, B. A., Phares, B. M., Rolander, D. D., Moore, M., and Washer, G. 2002. Visual inspection of highway bridges. Journal of Nondestructive Evaluation, 21(3):67–83. Gunatilake, P., Siegel, M. W., Jordan, A. G., and Podnar, G. W. 1997. Image understanding algo- rithms for remote visual inspection of aircraft surfaces. Proceedings of SPIE - The International Society for Optical Engineering, 3029:2–13. Hanji, T., Tateishi, K., and Kitagawa, K. 2003. 3-D shape measurement of corroded surface by using digital stereography. Structural health monitoring and intelligent infrastructure: pro- ceedings of the First International Conference on Structural Health Monitoring and Intelligent Infrastructure, 1:699–704. Harris, C. and Stephens, M. 1988. A combined corner and edge detector. Alvey Vision Confer- ence, pages 147–152. Plessey Research Roke Manor, United Kingdom. Hart, J. C., Francis, G. K., and Kauffman, L. H. 1994. Visualizing quaternion rotation. ACM Transactions on Graphics, 13(3):256 – 276. Hartley, R. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press. ISBN 0-521-62304-9. Hashemi, B. and Najafi, M. 2007. U.S. sanitary and storm sewer systems survey. Trenchless Tecnology, pages 34–38. URL: <http://www.trenchlessonline.com/inc/data/archives/2007-10- 01.pdf>. Henry, R. and Luxmoore, A. R. 1996. A pipe-profiling adapter for cctv inspection cameras: development of a pipe-profiling instrument. Measurement Science and Technology, 7(4):495– 504. 178 Henstock, P. V . and Chelberg, D. M. 1996. Automatic gradient threshold determination for edge detection. IEEE Transactions on Image Processing, 5(5):784–787. Ho, S. K., White, R. M., and Lucas, J. 1990. A vision system for automated crack detection in welds. Measurement Science and Technology, 1(3):287–294. Hogg, D. C. 1993. Shape in machine vision. Image and Vision Computing, 11(6):309–316. Horn, B. K. P. 1986. Robot Vision. The MIT Press, Cambridge, MA. ISBN 0-07-030349-5. Huertas, A. and Nevatia, R. 2000. Detecting changes in aerial views of man-made structures. Image and Vision Computing, 18:583–596. Jahanshahi, M. R., Kelly, J. S., Masri, S. F., and Sukhatme, G. S. 2009. A survey and evalua- tion of promising approaches for automatic image-based defect detection of bridge structures. Structure and Infrastructure Engineering, 5(6):455–486. DOI: 10.1080/15732470801945930. Jang, J.-S. R. 1993. Anfis: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23(3):665–685. Jang, J.-S. R., Sun, C.-T., and Mizutani, E. 1997. Neuro-Fuzzy and Soft Computing: A Com- putational Approach to Learning and Machine Intelligence. Prentice Hall, Upper Saddle River, NJ. ISBN 0-13-261066-3. Kaseko, M. S., Lo, Z.-P., and Ritchie, S. G. 1994. Comparison of traditional and neural classifiers for pavement-crack detection. Journal of Transportation Engineering, 120(4):552–569. Kim, C., Kim, H., and Ju, Y . 2009. Bridge construction progress monitoring using image anal- ysis. In Proceedings of the 26th International Symposium on Automation and Robotics in Con- struction (ISARC 2009), pages 101–104. Austin, Texas, U.S. Kirkham, R., Kearney, P. D., Rogers, K. J., and Mashford, J. 2000. Pirata system for quantitative sewer pipe assessment. The International Journal of Robotics Research, 19(11):1033–1053. Kumar, S. and Taheri, F. 2007. Neuro-fuzzy approaches for pipeline condition assessment. Nondestructive Testing and Evaluation, 22(1):35–60. Kuntze, H.-B. and Haffner, H. 1998. Experiences with the development of a robot for smart multisensoric pipe inspection. Proceedings of IEEE International Conference on Robotics and Automation, 2:1773–1778. Lebart, K., Trucco, E., and Lane, D. 2000. Real-time automatic sea-floor change detection from video. Oceans Conference Record (IEEE), 2:1337–1343. Lee, K. H. 2005. First course on fuzzy theory and applications. Springer-Verlag, Berlin, Ger- many. ISBN 3-540-22988-4. Lemieux, L., Wieshmann, U. C., Moran, N. F., Fish, D. R., and Shorvon, S. D. 1998. The detection and significance of subtle changes in mixed-signal brain lesions by serial mri scan matching and spatial normalization. Medical Image Analysis, 2(3):227–242. 179 Lourakis, M. and Argyros, A. 2004. The design and implementation of a generic sparse bun- dle adjustment software package based on the levenberg-marquardt algorithm. Technical Re- port 340, Institute of Computer Science - FORTH, Heraklion, Crete, Greece. Available from http://www.ics.forth.gr/˜lourakis/sba. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110. Lu, D., Mausel, P., Brondizio, E., and Moran, E. 2004. Change detection techniques. Interna- tional Journal of Remote Sensing, 25(12):2365–2407. Mallat, S. and Zhong, S. 1992. Characterization of signals from multiscale edges. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 14(7):710–732. Mallat, S. G. 1989. Multifrequency channel decompositions of images and wavelet models. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12):2091–2110. Matheron, G. 1975. Random sets and integral geometry. Wiley, New York. McCrea, A., Chamberlain, D., and Navon, R. 2002. Automated inspection and restoration of steel bridges - a critical review of methods and enabling technologies. Automation in Construc- tion, 11(4):351–373. Meyer, F. 1986. Automatic screening of cytological specimens. Computer Vision, Graphics, and Image Processing, 35(3):356–369. Minkowski, H. 1903. V olumen und oberfl¨ ache. Mathematische Annalen, 57(4):447–495. Mizuno, Y ., Abe, M., Fujino, Y ., and Abe, M. 2001. Development of interactive support system for visual inspection of bridges. Proceedings of SPIE - The International Society for Optical Engineering, 4337:155–166. Moore, M., Phares, B., Graybeal, B., Rolander, D., and Washer, G. 2001. Re- liability of visual inspection for highway bridges, volume I: Final report. Techni- cal report, US Department of Transportation, Federal Highway Administration. URL: <http://www.tfhrc.gov/hnr20/nde/01020.htm>. Moravec, H. P. 1977. Towards automatic visual obstacle avoidance. Proc. 5th International Joint Conference on Artificial Intelligence, page 584. Moravec, H. P. 1979. Visual mapping by a robot rover. International Joint Conference on Artificial Intelligence, pages 598–600. Moselhi, O. and Shehab-Eldeen, T. 1999. Automated detection of surface defects in water and sewer pipes. Automation In Construction, 8(5):581–588. Moselhi, O. and Shehab-Eldeen, T. 2000. Classification of defects in sewer pipes using neural networks. Journal of Infrastructure Systems, 6(3):97–104. M¨ uller, K. and Fischer, B. 2007. Objective condition assessment of sewer systems. LESAM 2007 - 2nd Leading-Edge Conference on Strategic Asset Management. 180 Narasimhan, S. G. and Nayar, S. K. 2003. Contrast restoration of weather degraded images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 25(1):713–724. Nieniewski, M., Chmielewski, L., Jozwik, and Sklodowski 1999. Morphological detection and feature-based classification of cracked regions in ferrites. Machine GRAPHICS and VISION, 8(4):699–712. Otsu, N. 1979. A threshold selection method from gray-level histogrmas. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62–66. Pal, N. R. and Pal, S. K. 1993. A review of image segmentation techniques. Pattern Recognition, 26(9):1277–1294. Parks, D. and Gravel, J.-P. viewed 1 December 2005. Corner de- tectors. Center for Intelligent Machines, McGill University. URL: <http://www.cim.mcgill.ca/dparks/CornerDetector/index.htm>. Peng, L., Zhao, Z., Cui, L., and Wang, L. 2004. Remote sensing study based on irsa remote sensing image processing system. 2004 IEEE International Geoscience and Remote Sensing Symposium Proceedings: Science for Society: Exploring and Managing a Changing Planet. IGARSS 2004, 7:4829–4832. Pines, D. and Aktan, A. E. 2002. Status of structural health monitoring of long-span bridges in the united states. Progress in Structural Engineering and Materials, 4:372–380. Porter, R. and Canagarajah, N. 1996. A robust automatic clustering scheme for image segmen- tation with wavelets. IEEE Transactions on Image Processing, 5(4):662–665. Poudel, U. P., Fu, G., and Ye, J. 2005. Structural damage detection using digital video imaging technique and wavelet transformation. Journal of Sound and Vibration, 286(4-5):869–895. Poynton, C. A. 1996. A Technical Introduction to Digital Video. John Wiley & Sons. ISBN 047112253X. Prasad, L. and Iyenger, S. 1997. Wavelet Analysis with Applications to Image Procesing. CRC Press LLC, Boca Raton, FL. ISBN 0-849-33169-2. Pratt, W. K. 2001. Digital Image Processing. Wiley, New York, NY , third edition. ISBN 0-471- 37407-5. Prewitt, J. M. S. 1970. Object enhancement and extraction. New York. Academic Press. Radke, R. J., Andra, S., , Al-Kofahi, O., and Roysam, B. 2005. Image change detection algo- rithms: a systematic survey. IEEE Transactions on Image Processing, 14(3):294–307. Reed, T. R. and Dubuf, J. M. H. 1993. A review of recent texture segmentation and feature extraction techniques. CVGIP: Image Understanding, 57(3):359–372. Rey, D., Subsol, G., Delingette, H., and Ayache, N. 2002. Automatic detection and segmentation of evolving processes in 3d medical images: Application to multiple sclerosis. Medical Image Analysis, 6(2):163–179. 181 Reyna, S. M., Vanegas, J. A., and Khan, A. H. 1994. Construction technologies for sewer rehabilitation. Journal of Construction Engineering and Management, 120(3):467–487. Ringer, M. and Morris, R. D. 2001. Robust automatic feature detection and matching between multiple images. Technical report, Reaserach Institute for Advanced Computer Science. RIACS Technical Report 01.27. Roberts, L. G. 1965. Machine perception of three-dimensional solids. pages 159–197, Cam- bridge, MA. MIT Press. Rome, E., Hertzberg, J., Kirchner, F., Licht, U., and Christaller, T. 1999. Towards autonomous sewer robots: The makro project. Urban Water, 1(1):57–70. Rousseeuw, P. J. 1987. Robust Registration and Outlier Detection. John Wiley & Sons, New York. Salembier, P. 1990. Comparison of some morphological segmentation algorithms based on con- trast enhancement. application to automatic defect detection. Proceedings of the EUSIPCO-90 - Fifth European Signal Processing Conference, pages 833–836. Schlag, J. F., Sanderson, C., Neuman, C. P., and Wimberly, F. C. 1983. Implementation of automatic focusing algorithms for a computer vision system with camera. Technical report, Carnegie-Mellon University, Pittsburgh, PA 15213. Scott, G. and Longuet-Higgis, H. 1991. An algorithm for associating the features of two patterns. Proc. Royal Society London, B244:21–26. Serra, J. 1982. Image Analysis and Mathematical Morphology. Academic Press, London. Shehab-Eldeen, T. 2001. An automated system for detection, classification and rehabilitation of defects in sewer pipes. PhD thesis, Concordia University, Montreal, Quebec, Canada. Shinozuka, M. 2003. Homeland security and safety. First International Conference on Structural Health Monitoring and Intelligent Infrastructure, pages 1139–1145. Shinozuka, M., Chung, H.-C., Ichitsubo, M., and Liang, J. 2001. System identification by video image processing. Proceedings of SPIE - The International Society for Optical Engineering, 4330:97–107. Siegel, M. and Gunatilake, P. 1998. Remote enhanced visual inspection of aircraft by a mobile robot. IEEE Workshop on Emerging Technologies, Intelligent Measurement and Virtual Systems for Instrumentation and Measurement - ETIMVIS ’98. St. Paul, MN, USA. Singh, A. 1989. Digital change detection techniques using remotely-sensed data. International Journal of Remote Sensing, 10(6):989–1003. Sinha, S. K. 2000. Automated undergraound pipe inspection using a unified image processing and artificial intelligence methodology. PhD thesis, University of Waterloo, Waterloo, Ontario, Canada. 182 Sinha, S. K. and Fieguth, P. W. 2006a. Classification of underground pipe scanned images using feature extraction and neuro-fuzzy algorithm. Automation in Construction, 15(1):58–72. Sinha, S. K. and Fieguth, P. W. 2006b. Morphological segmentation and classification of under- ground pipe images. Machine Vision and Applications, 17(1):21–31. Sinha, S. K. and Fieguth, P. W. 2006c. Neuro-fuzzy network for the classification of buried pipe defects. Automation in Construction, 15(1):73–83. Sinha, S. K., Fieguth, P. W., and Polak, M. A. 2003. Computer vision techniques for automatic structural assessment of underground pipes. Computer-Aided Civil and Infrastructure Engineer- ing, 18(2):95–112. Sinha, S. K. and Karray, F. 2002. Classification of underground pipe scanned images using fea- ture extraction and neuro-fuzzy algorithm. IEEE Transactions on Neural Networks, 13(2):393– 401. Sinha, S. K., Karray, F., and Fieguth, P. W. 1999. Underground pipe cracks classification us- ing image analysis and neuro-fuzzy algorithm. Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics, pages 399–404. Skarbek, W. and Koschen, A. 1994. Colour image segmentation: A survey. Technical report, Institute for Technical Informatics, Technical University of Berlin. Snavely, K. N. 2008. Scene Reconstruction and Visualization from Internet Photo Collections. PhD thesis, University of Washington, Seattle, Washington, USA. Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. In SIGGRAPH Conference Proceedings, pages 835–846, New York, NY , USA. ACM Press. Strang, G. and Nguyen, T. 1996. Wavelets and Filter Banks. Wellesley-Cambridge Press, Welles- ley, MA. Szeliski, R. 2006. Image alignment and stitching: A tutorial. Foundations and Trends R in Computer Graphics and Vision, 2(1):1–104. Thirion, J.-P. and Calmon, G. 1999. Deformation analysis to detect and quantify active le- sions inthree-dimensional medical image sequences. IEEE Transactions on Medical Imaging, 18(5):429–441. Torr, P. H. S. and Murray, D. W. 1997. The development and comparison of robust methods for estimating the fundamental matrix. International Journal of Computer Vision, 24(3):271–300. Touzi, R., Lopes, A., and Bousquet, P. 1988. A statistical and geometrical edge detector for sar images. IEEE Transactions on Geoscience and Remote Sensing, 26(6):764–773. Triggs, B., McLauchlan, P. F., Hartley, R. I., and Fitzgibbon, A. W. 2000. Bundle adjustment - a modern synthesis. Vision Algorithms: Theory and Practice. International Workshop on Vision Algorithms., pages 298–372. 183 Tsao, S., Kehtarnavaz, N., Chan, P., and Lytton, R. 1994. Image-based expert-system approach to distress detection on CRC pavement. Journal of Transportation Engineering, 120(1):62–64. Vicci, L. 2001. Quaternions and rotations in 3-space: The algebra and its geometric interpreta- tion. Technical Report TR01-014, Microelectonic Systems Laboratory, Department of Computer Science, University of North Carolina, Chapel Hill, NC. Villasenor, J. D., Belzer, B., and Liao, J. 1995. Wavelet filter evaluation for image compression. IEEE Transcations on Image Processing, 4(8):1053–1060. Wang, K. C., Nallamothu, S., and Elliott, R. P. 1998. Classification of pavement surface distress with an embedded neural net chip. Artificial Neural Networks for Civil Engineers: Advanced Features and Applications, pages 131–161. ASCE. Watanabe, S., Miyajima, K., and Mukawa, N. 1998. Detecting changes of buildings from aerial images using shadow and shading model. Proceedings. Fourteenth International Conference on Pattern Recognition, 2(2):1408–1412. Xu, K., Luxmoore, A. R., and Davies, T. 1998. Sewer pipe deformation assessment by image analysis of video surveys. Pattern Recognition, 31(2):169–180. Yakimovsky, Y . 1976. Boundary and object detection in real world images. Journal of the Association for Computing Machinery, 23(4):599–618. Yamaguchi, T. and Hashimoto, S. 2006. Automated crack detection for concrete surface image using percolation model and edge information. 32nd Annual Conference on IEEE Industrial Electronics, pages 3355–3360. Yu, M., Wang, R., Jiang, G., Liu, X., and Cho, T.-Y . 2004. New morphological opera- tors for color image processing. IEEE Region 10 Annual International Conference, Proceed- ings/TENCON, A:443–446. Yu, S.-N., Jang, J.-H., and Han, C.-S. 2007. Auto inspection system using a mobile robot for detecting concrete cracks in a tunnel. Automation in Construction, 16(3):255–261. Zhang, Z. 1998. Determining the epipolar geometry and its uncertainty: A review. International Journal of Computer Vision, 27(2):161–195. Ziou, D. and Tabbone, S. 1998. Edge detection techniques-an overview. Pattern Recognition and Image Analysis, 8(4):537–559. 184
Abstract (if available)
Abstract
Automated health monitoring and maintenance of civil infrastructure systems is an active yet challenging area of research. Current structure inspection standards require an inspector to visually assess structure conditions. A less time-consuming and inexpensive alternative to current monitoring methods is to use a robotic system that can inspect structures more frequently, and perform autonomous damage detection. Nondestructive evaluation techniques (NDE) are innovative approaches for structural health monitoring. Among several possible techniques, the use of optical instrumentation (e.g., digital cameras), image processing and computer vision are promising approaches as nondestructive testing methods for structural health monitoring to complement sensor-based approaches. The feasibility of using image processing techniques to detect deterioration in structures has been acknowledged by leading researches in the field. This study represents the efforts that have been taken place by the author to form, implement, and evaluate several vision-based approaches that are promising for robust condition assessment of structures. ❧ It is well-recognized that civil infrastructure monitoring approaches that rely on visual approaches will continue to be an important methodology for condition assessment of such systems. Part of this study presents and evaluates the underlying technical elements for the development of an integrated inspection software tool that is based on the use of inexpensive digital cameras. For this purpose, digital cameras are appropriately mounted on a structure (e.g., a bridge) and can zoom or rotate in three directions (similar to traffic cameras). They are remotely controlled by an inspector, which allows the visual assessment of the structure’s condition by looking at images captured by the cameras. By not having to travel to the structures site, other issues related to safety considerations and traffic detouring are consequently bypassed. The proposed system gives an inspector the ability to compare the current (visual) situation of a structure with its former condition. If an inspector notices a defect in the current view, he/she can request a reconstruction of the same view using images that were previously captured and automatically stored in a database. Furthermore, by generating databases that consist of periodically captured images of a structure, the proposed system allows an inspector to evaluate the evolution of changes by simultaneously comparing the structure’s condition at different time periods. The essential components of the proposed virtual image reconstruction system are: keypoint detection, keypoint matching, image selection, outlier exclusion, bundle adjustment, composition, and cropping. Several illustrative examples are presented to demonstrate the capabilities, as well as the limitations, of the proposed vision-based inspection procedure. ❧ Visual inspection of structures is a highly qualitative method. If a region is inaccessible, binoculars must be used to detect and characterize defects. Although several NDE methods have been proposed for inspection purposes, they are nonadaptive and cannot quantify crack thickness reliably. A contact-less remote-sensing crack detection and quantification methodology based on 3D scene reconstruction (computer vision), image processing, and pattern recognition concepts is introduced. The proposed approach utilizes depth perception to detect cracks and quantify their thickness, thereby giving a robotic inspection system the ability to analyze images captured from any distance and using any focal length or resolution. This unique adaptive feature is especially useful for incorporating mobile systems, such as unmanned aerial vehicles (UAV), into structural inspection methods since it would allow inaccessible regions to be properly inspected for cracks. Guidelines are presented for optimizing the acquisition and processing of images, thereby enhancing the quality and reliability of the damage detection approach and allowing the capture of even the slightest cracks (e.g., detection of 0.1 mm cracks from a distance of 20 m), which are routinely encountered in realistic field applications where the camera-object distance and image contrast are not controllable. ❧ Corrosion is another crucial defect in structural systems that can lead to catastrophe if it is neglected. A novel adaptive approach based on multi-resolution wavelet analysis, color analysis and depth perception is proposed that drastically improves the performance of the defect detection algorithm. The main contribution of this part is the integration of the depth perception with the pattern classification algorithms, which has never been done in previous studies. Several analytical evaluations are presented to illustrate the capabilities of the proposed system. Furthermore, the area of the corroded regions are quantified using the retrieved depth information. ❧ Insufficient inspections and maintenance of sewer pipes are the primary causes of today’s poor pipeline conditions. CCTV surveys, which is the most commonly used pipeline inspection technique in the United States, are both costly and laborious. Furthermore, they are subjective to an inspector’s level of experience, attentiveness, and fatigue. A preliminary study on autonomous condition assessment of sewer pipelines based on CCTV surveys is presented. The proposed system will analyze CCTV video frames and provide inspectors with a report about probable defects and their precise locations. With this system, an inspector is not required to watch an entire CCTV video
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Vision-based and data-driven analytical and experimental studies into condition assessment and change detection of evolving civil, mechanical and aerospace infrastructures
PDF
Analytical and experimental studies in system identification and modeling for structural control and health monitoring
PDF
Studies into vibration-signature-based methods for system identification, damage detection and health monitoring of civil infrastructures
PDF
Experimental and analytical studies of infrastructure systems condition assessment using different sensing modality
PDF
Studies into data-driven approaches for nonlinear system identification, condition assessment, and health monitoring
PDF
An analytical and experimental study of evolving 3D deformation fields using vision-based approaches
PDF
Analytical and experimental studies of modeling and monitoring uncertain nonlinear systems
PDF
Analytical and experimental methods for regional earthquake spectra and structural health monitoring applications
PDF
Smart buildings: synergy in structural control, structural health monitoring and environmental systems
PDF
Analytical and experimental studies in modeling and monitoring of uncertain nonlinear systems using data-driven reduced‐order models
PDF
Damage detection using substructure identification
PDF
Wave method for structural system identification and health monitoring of buildings based on layered shear beam model
PDF
Efficient pipelines for vision-based context sensing
PDF
Studies into computational intelligence approaches for the identification of complex nonlinear systems
PDF
Mixture characterization and real-time extrusion quality monitoring for construction-scale 3D printing (Contour Crafting)
PDF
Computationally efficient design of optimal strategies for passive and semiactive damping devices in smart structures
PDF
Hybrid physics-based and data-driven computational approaches for multi-scale hydrologic systems under heterogeneity
PDF
Numerical and experimental study on dynamics of unsteady pipe flow involving backflow prevention assemblies
PDF
Enabling spatial-visual search for geospatial image databases
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
Asset Metadata
Creator
Jahanshahi, Mohammad Reza
(author)
Core Title
Vision-based studies for structural health monitoring and condition assesment
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Civil Engineering
Publication Date
06/16/2013
Defense Date
05/02/2011
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
change detection,civil infrastructure systems,computer vision,condition assessment,corrosion detection,crack detection, crack quantification,defect detection,image processing,inspection tool,OAI-PMH Harvest,pattern recognition,Structural health monitoring
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Masri, Sami F. (
committee chair
), Lee, Jiin-Jen (
committee member
), Sukhatme, Gaurav S. (
committee member
), Wellford, L. Carter (
committee member
)
Creator Email
jahansha@usc.edu,mohammad.jahanshahi@yahoo.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c127-617318
Unique identifier
UC1341080
Identifier
usctheses-c127-617318 (legacy record id)
Legacy Identifier
etd-Jahanshahi-31.pdf
Dmrecord
617318
Document Type
Dissertation
Rights
Jahanshahi, Mohammad Reza
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
change detection
civil infrastructure systems
computer vision
condition assessment
corrosion detection
crack detection, crack quantification
defect detection
image processing
inspection tool
pattern recognition