Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Data worth analysis in geostatistics and spatial prediction
(USC Thesis Other)
Data worth analysis in geostatistics and spatial prediction
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DATA WORTH ANALYSIS IN GEOSTATISTICS AND SPATIAL PREDICTION by Hamed Haddad Zadegan A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (CIVIL ENGINEERING) August 2013 Copyright 2013 Hamed Haddad Zadegan To my parents who have always provided me with their love and support and to my wife for her love, patience and encouragement. ii Acknowledgments First of all, I would like to express my sincere appreciation to my advisor and committee chair, Professor Roger Ghanem for invaluable support, patience and guidance throughout my Ph.D. at University of Southern California. Also, I would like to thank my committee members, Dr. Najmedin Meshkati, Dr. Sami Masri, Dr. Felipe de Barros, and Dr. Muhammad Sahimi for their encouragement and insightful comments. I am also extremely thankful to Dr. Paris Hajali and Jeremy Squire at Murex Environmental, Inc for providing me with valuable data and consultation. Many thanks to my fellow student friends at USC for providing such a won- derful environment to work: Vahid Keshavarzzadeh, Iman Yadegaran, Hadi Mei- dani, Ramakrishna Tipireddy, Charanraj Thimmisetty, Daniel Lakeland, Reza Jafarkhani, Ali Bolourchi, Mahmoud Kamalzare, Mehran Rahmani, Hamid Hajian, Farrokh Jazizadeh, Hossein Ataei, Pedram Oskouie, Arsalan Heydarian, Hamid Reza Jahangiri, and Parham Ghods. Last, but far from least, I want to express my deepest gratitude to my parents and my wife who generously provided me with their love, patience and encourage- ment. I am forever indebted to them. iii Contents Dedication ii Acknowledgments iii List of Figures vi 1 Abstract 1 2 Motivation 8 3 Literature Review 13 3.1 Random eld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Statistical properties of random elds . . . . . . . . . . . . . 14 3.2 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Simple kriging . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Ordinary kriging . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Case study 20 4.1 Field specications . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Sampling design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5 Bayesian Maximum Entropy 33 5.1 General spatial prediction . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 General BME framework . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4 Prior step: Maximization of entropy . . . . . . . . . . . . . . . . . . 36 5.5 Posterior step: Bayesian conditioning . . . . . . . . . . . . . . . . . 38 5.5.1 Posterior pdf by interval soft data . . . . . . . . . . . . . . 38 5.5.2 Posterior pdf by probabilistic soft data . . . . . . . . . . . . 39 5.6 BME estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.6.1 Comparison of BME and kriging . . . . . . . . . . . . . . . 46 iv 6 Mapping the probability 48 6.1 Indicator kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 Uncertainty model . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7 Value of information 54 7.1 Sampling design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.2 Value of information . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.3.1 Optimal Number of samples . . . . . . . . . . . . . . . . . . 58 7.3.2 Risk - cost - benet decision analysis . . . . . . . . . . . . . 62 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8 Sample design 69 8.1 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . 69 8.2 Multi-variate Normal Distribution . . . . . . . . . . . . . . . . . . . 70 8.3 KL distance from global set . . . . . . . . . . . . . . . . . . . . . . 71 8.4 KL distance between successive increments . . . . . . . . . . . . . . 83 8.5 Using kriging estimation as data, compared to global set . . . . . . 94 8.6 Using kriging estimation as data, compared to next step . . . . . . 107 8.6.1 Example of a design . . . . . . . . . . . . . . . . . . . . . . 118 8.7 Assigning real data to sample design . . . . . . . . . . . . . . . . . 123 9 Summary and Conclusion 131 References 135 v List of Figures 1.1 Schematic 3D conguration of contamination data . . . . . . . . . 3 2.1 Probability distribution function at unsampled location, taken from [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Estimation methods, taken from [2] . . . . . . . . . . . . . . . . . . 9 2.3 Model ow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Prediction process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Decision making process . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 Areal photo of contaminated eld, an old renery in southern Cali- fornia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Prediction map based on all normalized data, layer 1 . . . . . . . . 22 4.3 Prediction map based on all normalized data, layer 2 . . . . . . . . 23 4.4 Prediction map based on all normalized data, layer 3 . . . . . . . . 24 4.5 Prediction map based on all normalized data, layer 4 . . . . . . . . 25 4.6 Prediction map based on all normalized data, layer 5 . . . . . . . . 26 4.7 Bezene prediction map based on initial data . . . . . . . . . . . . . 28 4.8 Bezene prediction map based on initial data plus 33 percent of new samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.9 Bezene prediction map based on initial data plus 66 percent of new samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.10 Bezene prediction map based on initial data plus all new samples . 31 4.11 Clean up goal v. Volume for Dibenzo[a,h]anthracene in area A . . . 32 5.1 General BME framework, taken from [15] . . . . . . . . . . . . . . . 34 5.2 Schematic representation of BME, taken from [15] . . . . . . . . . 40 5.3 Example of hard data and soft data . . . . . . . . . . . . . . . . . . 41 5.4 Map of BME mean estimation . . . . . . . . . . . . . . . . . . . . 41 5.5 BME posterior of concentration showing dierent condence inter- vals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.6 Mean of the estimation posterior pdf, Hard data . . . . . . . . . . . 43 5.7 Mean of the estimation posterior pdf, Soft data . . . . . . . . . . . 43 5.8 Estimation error variance, Hard data . . . . . . . . . . . . . . . . . 44 vi 5.9 Estimation error variance, Soft data . . . . . . . . . . . . . . . . . . 44 5.10 Lower limit of BME estimations condence intervals 68%, Hard data 45 5.11 Lower limit of BME estimations condence intervals 68%, Soft data 45 5.12 Upper limit of BME estimations condence intervals 68%, Hard data 46 5.13 Upper limit of BME estimations condence intervals 68%, Soft data 46 6.1 Estimation map of contamination . . . . . . . . . . . . . . . . . . . 52 6.2 Probability map of contamination . . . . . . . . . . . . . . . . . . . 53 7.1 Bayesian data worth analysis, taken from [21] . . . . . . . . . . . . 56 7.2 Decision tree for contaminated land, taken from [3] . . . . . . . . . 62 7.3 Procedure of value of information, taken from [3] . . . . . . . . . . 63 8.1 KL distance from global set . . . . . . . . . . . . . . . . . . . . . . 72 8.2 Estimation map of contamination based on 100% of data . . . . . . 73 8.3 Estimation map of contamination using 10% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 74 8.4 Estimation map of contamination using 20% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 75 8.5 Estimation map of contamination using 30% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 76 8.6 Estimation map of contamination using 40% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 77 8.7 Estimation map of contamination using 50% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 78 8.8 Estimation map of contamination using 60% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 79 8.9 Estimation map of contamination using 70% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 80 8.10 Estimation map of contamination using 80% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 81 8.11 Estimation map of contamination using 90% of data, minimum KL distance at the top and maximum distance at the bottom . . . . . . 82 8.12 KL distance between successive increments . . . . . . . . . . . . . 84 8.13 Estimation map of estimation based on 10% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.14 Error Variance, 10% of kriging data . . . . . . . . . . . . . . . . . . 85 8.15 Estimation map of estimation based on 20% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.16 Error Variance, 20% of kriging data . . . . . . . . . . . . . . . . . . 86 vii 8.17 Estimation map of estimation based on 30% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.18 Error Variance, 30% of kriging data . . . . . . . . . . . . . . . . . . 87 8.19 Estimation map of estimation based on 40% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.20 Error Variance, 40% of kriging data . . . . . . . . . . . . . . . . . . 88 8.21 Estimation map of estimation based on 50% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.22 Error Variance, 50% of kriging data . . . . . . . . . . . . . . . . . . 89 8.23 Estimation map of estimation based on 60% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.24 Error Variance, 60% of kriging data . . . . . . . . . . . . . . . . . . 90 8.25 Estimation map of estimation based on 70% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.26 Error Variance, 70% of kriging data . . . . . . . . . . . . . . . . . . 91 8.27 Estimation map of estimation based on 80% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.28 Error Variance, 80% of kriging data . . . . . . . . . . . . . . . . . . 92 8.29 Estimation map of estimation based on 90% of kriging data, min- imum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.30 Error Variance, 90% of kriging data . . . . . . . . . . . . . . . . . . 93 8.31 KL Distance kriging data and global set . . . . . . . . . . . . . . . 95 8.32 Estimation map of contamination, 50% random data . . . . . . . . 96 8.33 Estimation map of contamination based on 10% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 8.34 Estimation map of contamination based on 20% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.35 Estimation map of contamination based on 30% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.36 Estimation map of contamination based on 40% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 viii 8.37 Estimation map of contamination based on 50% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.38 Estimation map of contamination based on 60% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.39 Estimation map of contamination based on 70% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.40 Estimation map of contamination based on 80% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.41 Estimation map of contamination based on 90% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.42 Estimation map, 50% random data . . . . . . . . . . . . . . . . . . 108 8.43 Estimation map of contamination based on 10% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.44 Error Variance, 10% of kriging data . . . . . . . . . . . . . . . . . . 109 8.45 Estimation map of contamination based on 20% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.46 Error Variance, 20% of kriging data . . . . . . . . . . . . . . . . . . 110 8.47 Estimation map of contamination based on 30% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.48 Error Variance, 30% of kriging data . . . . . . . . . . . . . . . . . . 111 8.49 Estimation map of contamination based on 40% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8.50 Error Variance, 40% of kriging data . . . . . . . . . . . . . . . . . . 112 8.51 Estimation map of contamination based on 50% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.52 Error Variance, 50% of kriging data . . . . . . . . . . . . . . . . . . 113 8.53 Estimation map of contamination based on 60% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.54 Error Variance, 60% of kriging data . . . . . . . . . . . . . . . . . . 114 ix 8.55 Estimation map of contamination based on 70% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.56 Error Variance, 70% of kriging data . . . . . . . . . . . . . . . . . . 115 8.57 Estimation map of contamination based on 80% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 8.58 Error Variance, 80% of kriging data . . . . . . . . . . . . . . . . . . 116 8.59 Estimation map of contamination based on 90% of kriging data, minimum KL distance on the left and maximum KL distance on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.60 Error Variance, 90% of kriging data . . . . . . . . . . . . . . . . . . 117 8.61 Sample design for 10% of data . . . . . . . . . . . . . . . . . . . . . 118 8.62 Sample design for 20% of data . . . . . . . . . . . . . . . . . . . . . 119 8.63 Sample design for 30% of data . . . . . . . . . . . . . . . . . . . . . 119 8.64 Sample design for 40% of data . . . . . . . . . . . . . . . . . . . . . 120 8.65 Sample design for 50% of data . . . . . . . . . . . . . . . . . . . . . 120 8.66 Sample design for 60% of data . . . . . . . . . . . . . . . . . . . . . 121 8.67 Sample design for 70% of data . . . . . . . . . . . . . . . . . . . . . 121 8.68 Sample design for 80% of data . . . . . . . . . . . . . . . . . . . . . 122 8.69 Sample design for 90% of data . . . . . . . . . . . . . . . . . . . . . 122 8.70 Sample design for 100% of data . . . . . . . . . . . . . . . . . . . . 123 8.71 Estimation map of contamination based on 10% of real data . . . . 124 8.72 Estimation map of contamination based on 20% of real data . . . . 124 8.73 Estimation map of contamination based on 30% of real data . . . . 125 8.74 Estimation map of contamination based on 40% of real data . . . . 125 8.75 Estimation map of contamination based on 50% of real data . . . . 126 8.76 Estimation map of contamination based on 60% of real data . . . . 126 8.77 Estimation map of contamination based on 70% of real data . . . . 127 8.78 Estimation map of contamination based on 80% of real data . . . . 127 8.79 Estimation map of contamination based on 90% of real data . . . . 128 8.80 Estimation map of contamination based on 100% of real data . . . . 128 8.81 KL distance from global set . . . . . . . . . . . . . . . . . . . . . . 129 8.82 Successive KL distance . . . . . . . . . . . . . . . . . . . . . . . . . 130 x Chapter 1 Abstract According to US Environmental Protection Agency (EPA), browneld site is dened as a land previously used for industrial or commercial uses. Although there is a possibility that Browneld site be clean, it can not be considered clean because of its previous use [1] [37]. The Brownelds Program provides funding for assessment of Browneld sites and supports the redevelopment of these sites through policies and laws to enhance environmental quality [1]. Types and extent of contaminants in browneld sites vary signicantly among them. They can be in surface soil, subsurface soil, and aquifers. Based on the 1999 EPA Brownelds Case Studies Summary Report, con- taminants included petroleum hydrocarbons, lead, construction debris, polychlo- rinated biphenyls, treated wood, industrial chemicals, and diesel fuel (Brownelds Title VI Case Studies Summary Report. June 1999, EPA Document Number: EPA 500-R-99-003) [37]. A main challenge in decision making on soil remediation is risk management through acquisition of knowledge using more informative models and data. To demonstrate the approach to this problem, value of information should be consid- ered and computed. The essential steps to be taken are: 1. Dening risk: Since any wrong action in soil remediation could cause risk, characterization of risk is very important. Risk can be nancial or health related. 1 2. Characterizing uncertainties relevant to risk: Due to lack of site char- acterization and huge uncertainty associated with type and extent of soil contamination, it is crucial to investigate and quantify the uncertainty. 3. Designing actions that will reduce uncertainty in risk: Our tools for reducing the uncertainty in risk is limited to sample collection. Therefore, sample design has a signicant role in uncertainty reduction process. Because of ever increasing contamination in surface and subsurface soil, it has become a major problem for environmental protection organizations. Soil reme- diation problem is a complicated issue and has several challenges. To summarize the challenges associated with soil remediation, we can consider them as some operational challenges that could lead to technical challenges. Operational challenges: There are dierent challenges associated with various aspects of operations related to soil remediation: Subsurface is not illuminated Soil exploration is very expensive Failure cost of wring decision is high Initial and boundary conditions of problem are unknown In soil remediation problem subsurface is not illuminated and the only available source for data acquisition is sampling at limited number of locations. Figure 1.1 shows a schematic conguration of contamination data. Also soil exploration is very expensive and collecting new samples needs a lot of money, time and analysis. Moreover, the cost of failure in a remediation problem is high. In fact calculation 2 of failure cost is very complicated and it might be combination of health risk and nancial penalty. Another challenge in soil remediation is that initial and boundary conditions of problem are unknown and it makes the problem more complicated Figure 1.1: Schematic 3D conguration of contamination data Technical challenges: Due to above mentioned operational challenges, we will face dierent technical issues in analysis related to soil remediation, which include: Credibility of probabilistic analysis is important Model-based updating in highly uncertain Data worth analysis is critical Since our approach to this problem is probabilistic, credibility of probabilistic analysis should be evaluated carefully. Therefore, we aim to nd a way to assess the credibility of the analysis. Also, since model-based updating in highly uncertain, we should evaluate the accuracy of implemented models. Finally, in order to be 3 cost eective, data worth analysis is critical and we should nd a way to identify the locations which have most information. In this work we try to address some of these challenges. Generally we aim to improve the accuracy of estimation and then propose a method to design the most ecient soil sampling program. One of the objectives of this research is to propose a new approach to geostatis- tical problems based on information theory. Our goal is to estimate a probability distribution function (pdf) of concentration at any unsampled location based on data given at neighboring locations. A traditional way to solve this problem is kriging. Kriging is one of the least square algorithms and is the best linear unbi- ased predictor. Although kriging has a strong theoretical foundation and exible framework, it has some shortcomings that motivate us too seek a more powerful and reliable method. The major limitation of kriging is that it is not capable of incorporating uncertain data in posterior distribution. During last decades Bayesian Maximum Entropy (BME) was developed as a rigorous alternative prediction method which can successfully overcome the short- comings and limitations of other mapping methods such as kriging. In the BME framework all types of uncertain data, physical laws and expert judgments could be utilized which improves the prediction of contamination and makes the results more desirable than kriging. In BME we nd a general knowledge distribution by maximizing the entropy and then use bayesian conditioning to update the prior. To improve the quality of prediction of contamination we need to have more data or samples. When we add new samples to our model, we get new probabil- ity distribution function for the quantity of interest. Then we need to compare the old (prior) and new (posterior) predictions and quantify the dierence and improvement from one prediction to another one and measure information content 4 of predictions. Also, to nd the best design for location of new samples, we should nd the samples that provide maximum information compared to other samples. Samples with maximum information will be the optimal design for location of new data. The optimal design minimizes the cost of sampling because it provides the most possible information with minimum number of samples. This thesis is structured as follows: In Chapter 2 we discuss about general concept of spatial prediction and general framework for data analysis in this research. At rst step the objective is to produce the best prediction of quantity of interest based on available data. In the prediction process we start with some available knowledge. These data could be any type of data like hard or soft data. Then we feed the data to the model where model is selected based on our demand and objective. Chapter 3 is a short overview of geostatistics. In this chapter we dene random elds and introduce simple kriging and ordinary kriging as two traditional ways of estimation of unsampled locations. Also a case study of an old oil renery is presented in this chapter. In this case study, we implement the ordinary kriging method to analyze the contamination data of an oil renery for the purpose of soil remediation based on regulation of environmental agencies. Available data includes concentration of several chemicals at dierent locations and depths. First we designed a plan to take more samples to improve the quality of our estimation. Then we estimated the dierent maps for chemicals and calculated the location and volume of excavations required for soil remediation. Chapter 4 is about Bayesian maximum entropy. Implementation of kriging in chapter 3 showed that although kriging is a strong and exible method, it has several limitations and shortcoming. The main weakness of kriging is that it can only use hard data for estimation. While most of our data are uncertain, 5 we need to nd another estimation framework for incorporation of any type of data. Bayesian maximum entropy method is capable of incorporating uncertain data. In this chapter we discuss about entropy and maximization of entropy. Also, generating the prior based on general data and nding posterior distribution for dierent types of data is discussed in this chapter. In Chapter 5 mapping of the contamination probability is discussed. In this sec- tion we try to model uncertainty in order to make a framework for decision making for problems related to soil remediation. By using Indicator kriging, instead of a Gaussian distribution at each point, a conditional cumulative distribution function is constructed which demonstrates the probability of event that contamination is higher than threshold. Chapter 6 is about value of information. One of the major challenges associated with soil remediation is uncertainty about the type and level of contamination. More soil sample collection could decrease the uncertainty but on the other hand, collecting more soil samples increases the cost. Therefore, there should be a balance between cost and uncertainty. In this chapter, the concept of value of information in soil remediation is discussed and some available approaches in the literature are explained. These approaches try to determine the optimum number of new samples required for decision making in soil contamination problems. Although these approaches are very powerful, but they don't oer any solution to nd the best locations for new samples. In Chapter 7 we try to identify the best locations for new samples. To address this problem, Kullback-Leibler divergence which is a measure in probability the- ory is used. The Kullback-Leibler divergence represents the dierence between two probability distribution functions. Under the assumption of Multi-variate normal distribution for log-transformed concentration data, the Kullback-Leibler 6 divergence can be considered as a tool to measure the information content of new samples. Computation of information content of new samples helps us to optimize the location of samples. In this chapter dierent approaches are used to evaluate and validate the Kullback-Leibler divergence method. Results of these analysis show that using the Kullback-Leibler divergence method can help us to estimate the map of contamination with a smaller group of measurements which reduces the cost of operation. In this chapter, rst we use kriging output for identifying best locations for sampling and then try to validate the results by real data. Our analysis shows that Kullback-Leibler divergence results have acceptable agreement with reality. Chapter 8 is a summary and general conclusion about this thesis. Main results of this research are mentioned in this chapter. 7 Chapter 2 Motivation One of our objectives in this research is to propose a new approach to geostatistics problems. Generally, as it is shown in gure 2.1 we can estimate a probability distribution function (pdf) at any unsampled location based on contamination data available at neighboring locations. In this gure, circles are location of available contamination measurements. Figure 2.1: Probability distribution function at unsampled location, taken from [2] All the estimation methods at fundamental level have similar approach. The estimation at unsampled location is a weighted average of neighbors and how these weights are computed is distinction between methods. The main dierence between these methods are weight calculation methods (gure 2.2 ). As it can be observed, some of these methods are much simpler while others other more sophisticated. Figure 2.3 shows the general framework for data analysis in this research. Based on available data we select an appropriate statistical model for analysis. Model 8 Figure 2.2: Estimation methods, taken from [2] analyzes the data and makes a prediction. Prediction quality is dened based on our requirements. Once we have the prediction, we need to make a decision. Decision is made based on the predictions and other resources like experience or expert judgement. This ow chart (Figure 2.3) shows two main steps. The rst step belongs to prediction and the second step is decision making. At the rst step the objective is to produce the best prediction of the quantity of interest. In this research, quantity of interest is estimation of contamination at all unobserved locations of random eld. The second step aims to use the prediction to make the best decision which can satisfy our requirements and limitation. Figure 2.4 shows more details about prediction process. In this process we start with some available knowledge. These data could be of any type of data like hard or soft data. Then we feed the data to the model where model is selected based 9 Model Prediction Data Decision Figure 2.3: Model ow chart on the demand and objective. When the appropriate model was selected we try to nd the likelihood which shows the ability of data to inform the model. Likelihood will be updated by new data using Bayesian approach and this process nally constructs a pdf of quantity of interest. Once the PDF of quantity of interest is constructed, we can proceed to next step which in making decision based on prediction. Figure 2.5 shows more details about decision making process. When we nd a PDF to describe the quantity of interest, we need to test the information content of the prediction. To improve the quality of prediction and reduce its uncertainty we need to have more data or samples. When we add new samples to model, we construct a new pdf for the quantity of interest (QoI). Then we need to compare the new pdf 10 Current knowledge Model: Kriging, physics, … Data Likelihood: Ability of data to inform model New likelihood of model PDF of QoI Figure 2.4: Prediction process with old one to quantify the dierence and evaluate the improvement from one prediction to another one. Number and location of new samples has a signicant impact on information content of samples. As we increase the number of samples, our predictions become more reliable and less uncertain. But when the number of samples exceeds a specic limit, information content converges to its limit and new samples will not provide any additional information. Finding this limit for the number of samples helps to nd the most ecient sampling design. 11 Informing model with data Model Data PDF of predicted QoI Given data and model Test information content: -Convergence of information analysis -Resource exhaustion: Select next samples with different criteria Make decision: Remediation, excavation volume , … Y E S N O Figure 2.5: Decision making process 12 Chapter 3 Literature Review Spatiotemporal models have a crucial role in study of environmental, hydrolog- ical and engineering sciences [32]. One of the application of these models is to determine the trend of pollution in time and space[16]. These models also try to characterize dierent parameters associate with earth which vary in time and space [20]. Spatiotemporal models are capable of nding temporal pattern in soil parameters [39]. Moreover, Using these models can help us to design a monitoring system for processes vary in time and space [41]. Generally these models help us to construct a probability distribution based on observations for the purpose of prediction. Due to limited number of observations and input, these models are stochastic and not deterministic [32]. Stochasticity of these models could be interpreted by physical models [11]. Geostatistics deals with spatiotemporal data and provides tools for correlated variables in space and time. Geostatistics provides a tool to make an optimum estimate and assess the quality of this estimate [25]. 3.1 Random eld A random variable is a set of possible values and a probability over this set. Ran- dom eld is dened as a collection of random variables. One realization of a random eld is one particular possibility out of many [32]. 13 3.1.1 Statistical properties of random elds The statistical properties of a random eld are completely dened with all the mul- tivariate pdf (probability density function) or cdf (cumulative distribution func- tion) [32]. F U (u 1 ;u 2 ;:::;u n ) =P (U(x 1 )u 1 ;U(x 2 )u 2 ;:::;U(x n )u n ) (3.1) where U(x) = (u 1 ;u 2 ;:::;u n ) is a random eld at points (x 1 ;x 2 ;:::x n ). Then for the probability we have: P U (u 1 ;u 2 ;:::;u n ) = @ n F U (u 1 ;u 2 ;:::;u n ) @u 1 @u 2 :::@u n (3.2) The rst moment is the mean or expected value of a random eld which describes its trend: E[U(x)] = Z 1 1 up(u)du (3.3) The second moment is the autocorrelation function: E[U(x i );U(x j )] = Z 1 1 u i u j p(u i ;u j )du i du j (3.4) The correlation of a random eld can be measured by covariance [32]: C U (x i ;x j ) =Cov[U(x i );U(x j )] =E[U(x i );U(x j )]E[U(x i )]E[U(x j )] (3.5) 14 As it is a function of location, it is called covariance function. While the similarity betweenU(x i ) andU(x j ) is expressed with the covariance, their dissimilarity may be measured by their average squared dierence: E[U(x i )U(x j )] 2 = 2 (x i ;x j ) (3.6) where (x i ;x j ) is called semivariogram. We can nd the sample variogram as [38]: (h) = 1 2N(h) N(h) X i=1 [u(x i )u(x i +h)] (3.7) where h =jx i x j j is the separation lag. 3.2 Kriging Kriging is a geostatistical method to estimate a parameter at an unobserved loca- tion of a Gaussian random eld based on available neighboring data. Kriging minimizes the estimation error variance and has zero mean estimation error. Krig- ing estimates are weighted linear combination of available data. Kriging is the best linear unbiased estimator [34] [35] [32]. Let U(x) = (u 1 ;u 2 ;:::;u n ) be the values of our random eld at given points < x 1 ;x 2 ;:::;x n > and u(x 0 ) be the unknown value at an unsampled location x 0 . The weighted linear estimate of x 0 is: u(x 0 ) = n X i1 w i (x 0 )u(x i ) (3.8) 15 Where w i (x 0 ) are weights depend on data and location ofx 0 . The estimation error would be the dierence between the estimated value u (x 0 ) and he unknown true value u(x 0 ): R(x 0 ) =u (x 0 )u(x 0 ) = n X i1 w i (x 0 )u(x i )u(x 0 ) (3.9) 3.2.1 Simple kriging When the mean of the U(x) = is known we can work with the residual [35] [34] [32]: Z(x) =U(x) (3.10) Where we have E(Z) = 0. Also, estimation at point x 0 is z (x 0 ) = n X i1 w i (x 0 )z(x i ) (3.11) Then the estimation error is R(x 0 ) =z (x 0 )z(x 0 ) = n X i1 w i (x 0 )z(x i )z(x 0 ) = 0 (3.12) And the variance of estimation error would be var[R(x 0 )] =var[Z (x 0 )Z(x 0 )] (3.13) which becomes: var[R(x 0 )] = 2 U + n X i=1 n X j=1 w i w j C U (x i x j ) 2 n X i=1 w i C U (x i x 0 ) (3.14) 16 The minimization of error is performed by setting the n rst partial derivatives to zero. @var[R(x 0 )] @w i = 2 n X j=1 w j C U (x i x j ) 2C U (x i x 0 ) = 0 (3.15) Which leads to a linear system of n equations with n unknowns: n X j=1 w j C U (x i x j ) =C U (x i x 0 ); i = 1; 2;:::;n (3.16) In matrix notation , the simple kriging equation system is: 2 6 6 6 6 6 6 6 4 C 1;1 C 1;2 C 1;n C 2;1 C 2;2 C 2;n . . . . . . . . . . . . C m;1 C m;2 C m;n 3 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 4 w 1 w 2 . . . w n 3 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 4 C 1;0 C 2;0 . . . C n;0 3 7 7 7 7 7 7 7 5 (3.17) Where, C ij is covariance between points i and j. Brie y: Cw =D =)w =C 1 D (3.18) The mateix C is independent of estimation location and for dierent location x 0 we only have to recompute D [34] [35] [32]. 3.2.2 Ordinary kriging When the mean of the E[U(x)] = E(U) is unknown, the mean error is [34] [35] [32]: R(x 0 ) = n X i=1 w i (x 0 )U(x i )U(x 0 ) (3.19) 17 Then E[R(x 0 )] = n X i=1 w i (x 0 )E(U)E(U) = [ n X i=1 w i (x 0 ) 1]E(U) (3.20) To have zero expected error we should have: n X i=1 w i (x 0 ) = 1 (3.21) Similar to simple kriging, we need to minimize the estimation error: var[R(x 0 )] = 2 U + n X i=1 n X j=1 w i w j C U (x i x j ) 2 n X i=1 w i C U (x i x 0 ) (3.22) subject to n X i=1 w i (x 0 ) = 1 The ordinary kriging equation system becomes: n X j=1 w j C U (x i x j ) + =C U (x i x 0 ); i = 1; 2;:::;n (3.23) subject to n X i=1 w i (x 0 ) = 1 The matrix notation of the ordinary kriging is: 18 2 6 6 6 6 6 6 6 6 6 6 4 C 1;1 C 1;2 C 1;n C 2;1 C 2;2 C 2;n . . . . . . . . . . . . C m;1 C m;2 C m;n 1 1 0 3 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 4 w 1 w 2 . . . w n 3 7 7 7 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 6 6 6 4 C 1;0 C 2;0 . . . C n;0 1 3 7 7 7 7 7 7 7 7 7 7 5 (3.24) Which can be shown as: Cw =D =)w =C 1 D (3.25) Like simple kriging the matrix C is independent of estimation location and for dierent location x 0 we only have to recompute D [34] [35] [32]. 19 Chapter 4 Case study 4.1 Field specications Now we present a real case study using kriging. In this case study, SADA [2, 48] (Spatial Analysis and Decision Assistance software, developed in The Institute for Environmental Modeling at the University of Tennessee) is used to predict soil contamination in and old oil renery. The objective of current study is to estimate a map of contamination for dierent chemicals and then decide about soil remediation plan. The area of this eld was approximately 1200ft 2000ft (gure 4.1). We had measurements for dierent chemical at dierent location and dierent depths. Although the data was available to very deep locations, because of importance of top layer of eld we focused on top 11ft of soil and divided it into 5 layers: 0 to 0.50 ft 0.50 to 1.50 ft 1.50 to 3.0 ft 3.0 to 6.0 ft 6.0 to 11.0 ft In our data, we had a lot of measurements for dierent types of chemicals. Since some of chemicals had the same behavior and some them were not of concern, based 20 Figure 4.1: Areal photo of contaminated eld, an old renery in southern California on experts comments and previous experiments we decided to analyze the following major contaminations: Benzene Dibenzo[a,h]anthracene TPH-g TPH-D TPH-MO One approach to consider all the compounds at a same time together, is to nor- malize each compound with its own limit and then put all the data together and analyze at the same time. Here results are ordinary kriging are shown in gures 21 4.2, 4.3, 4.4, 4.5 and 4.6. Also, the same analysis was performed for all compounds in all layers. Figure 4.2: Prediction map based on all normalized data, layer 1 22 Figure 4.3: Prediction map based on all normalized data, layer 2 23 Figure 4.4: Prediction map based on all normalized data, layer 3 24 Figure 4.5: Prediction map based on all normalized data, layer 4 25 Figure 4.6: Prediction map based on all normalized data, layer 5 26 4.2 Sampling design The next step in this project is to nd the best location for taking new samples to measure the contamination to reduce the uncertainty of predictions and also evaluate the previous predictions. The following methods are secondary approaches for sample designs: Judgmental (user locates samples manually) Adaptive Fill: new sample locations will be placed in large gaps of data. Highest variance: new sample locations will be placed in high variance loca- tions. We performed all of these method for small part of samples and decided to choose the method of highest variance for entire the eld. To avoid doing analysis on whole domain, based on experts judgements we found the major spill points and focused our analysis on 3 areas (shown in gure 4.1). Totally we found 120 new point for new sampling. After sampling we decided to divide new data in 3 random batches and gradually add them to the initial data and repeat all the analysis. After repeating the analysis for all chemicals, we found out the except for some minor cases, new samples were in acceptable agreement with our initial predictions. Also, new samples decreased the uncertainty of pre- dictions because they provided important information about places that we had no information about them. 27 Figure 4.7: Bezene prediction map based on initial data 28 Figure 4.8: Bezene prediction map based on initial data plus 33 percent of new samples 29 Figure 4.9: Bezene prediction map based on initial data plus 66 percent of new samples 30 Figure 4.10: Bezene prediction map based on initial data plus all new samples Another benet of these analysis was to nd the amount of the soil which needs to be excavated. For this purpose, we plotted the clean up goal versus the volume 31 of contaminated soil for major pollutants. These curves indicate that how the amount of contaminated soil increases or decreases if we change the limit. In these curves the overburden soil in also included. The overburden is any part of the site not included in the area of concern but lying vertically above the area of concern. Overburden is clean but needs to be excavated. For example gure 4.11 shows clean up goal versus volume for Dibenzo[a,h]anthracene in area A shown in gure 4.1. These curves show that when we have an interval with large slope, any small change in the limit could result in signicant change in volume of contaminated soil. Figure 4.11: Clean up goal v. Volume for Dibenzo[a,h]anthracene in area A 32 Chapter 5 Bayesian Maximum Entropy 5.1 General spatial prediction In chapter 3, we explained the fundamentals of kriging and showed a case study by using ordinary kriging. But, although kriging seems to be a exible method, it has some shortcomings. Kriging is not capable of taking to account uncertain data. Also kriging has a linear nature which makes it impossible to have a non- linear predictor. Moreover, posterior of kriging is dened only by the mean and the variance. Recently, classic geostatistics was generalized and a method called Bayesian maximum entropy (BME) was developed [9, 10, 8]. In BME we can incorporate various type of data with dierent quality levels. The BME posterior is a non-gaussian pdf which allows more processing. Also, it can be proved that under some assumptions, kriging will be a special case in method of BME. In the framework of BME we need to dene two general categories of data: Hard data are measurements that could be considered error-free. Example of hard data can be measurements of soil contamination reported by laboratory. In other words, we can assume thet probability of a hard data to happen is equal to one. Soft data includes any kind of uncertain data. Data shown in interval form, probability type data, and expert judgments could be some examples of soft data. 33 5.2 General BME framework According to the Christakos [8] main steps of BME are as follows: Prior step in which we need to nd a general distribution based on available general knowledge. Meta-prior step in which some mathematical relations are found based on specic information. Posterior step where we update the prior with data. In Bayesian maximum entropy method, rst we need to maximize the entropy to achieve the prior distribution. Figure 5.1 [15] shows the general BME framework. Figure 5.1: General BME framework, taken from [15] 34 5.3 Entropy Shannon [44] introduced the entropy to express uncertainty in a system. The mathematical formulation of entropy for discrete random variable is [44]: H = n X i=1 p i logp i (5.1) Where p i is the probability of each random variable. Since information and prob- ability are inversely proportional, we can relate information to entropy: Info(A) = 1 P (A) (5.2) Also, it is common to use logarithms for information [8]: Info(Y ) =log(P (Y )) (5.3) Then we can write entropy as expected value of information which is known as entropy function [23]: H =E[Info(Y )] (5.4) According to this denition, if we maximize the entropy it is equal to maxi- mizing the prior information under the constraints of prior information given by general knowledge denoted by K G [8]. We have to choose a prior distribution that maximizes the use of information given by K G without incorporating any spurious information. This distribution will be the one with maximum entropy. As we add more and more information, entropy decreases. Even if the relation between entropy and information is mathematically 35 clear, it is not in accordance with our intuition and we need to propose other inter- pretations. Golan et al [27] considers maximizing the entropy as maximizing the missing information. Assume that we have already have fully taken into account the general knowledge K G . The maximum entropy will be the one with least pos- sible additional information. In other words, among all distribution in agreement with K G maximum entropy is the least informative one. Jaynes [22] states: "entropy maximization is a kind of insurance policy that protects us against predicting spurious details [...] for which there is no evidence in the data" In fact, our goal in entropy maximization is to nd a distribution which is more likely to happen based on available data. Distributions with higher entropy have more chance to happen because they can be produced in more ways [15]. 5.4 Prior step: Maximization of entropy GenerallyZ map = [Z hard Z soft Z k ] includes all the data points plusZ k at estima- tion point. Letf G (Z map ),Z map = [Z 1 ;Z 2 ;:::Z m ;Z k ] T be the multivariate probabil- ity density function of random variablesz map = [z 1 ;z 2 ;:::z m ;z k ] T before considering any hard or soft data. To obtain the prior probability density function we need to maximize the entropy function [9, 8]: E[Info(z map )] = Z f G (z map ) log(f G (z map ))dz map (5.5) 36 we will have to maximize this expression under the constraint of respecting the prior available data as: E[g ] = Z g (z map )f G (z map )dz map (5.6) where g (z map ) is a known function about the moments like mean and covari- ance that can be incorporated. Table 5.1 shows these constraints [8]. Constraint type g Normalization 0 1 Mean 1 Z 0 . . . . . . (for each location) n + 1 Z n Variance n + 2 [Z 0 m 0 ] 2 . . . . . . (for each location) 2(n + 1) [Z n m n ] 2 Variance and covariance 2(n + 1) + 1 [Z 0 m 0 ] [ Z 1 m 1 ] . . . . . . (for each pair of locations) (n+4)(n+1) 2 [Z 0 m 0 ][Z n m n ] Table 5.1: Constraints for maximization of entropy Solving the system of equations yields to maximum entropy solution for the prior probability density function [8]: f G (z map ) = 1 A n exp( Nc X =1 g (a map )) (5.7) where A n represents the normalization constraints. A = Z exp( Nc X =1 g (a map ))dz map (5.8) 37 In general, this prior in non-gaussian, but in case that only the mean and covariance is used as general knowledge, this prior has the following Gaussian form [43]: f G (z map ) = 1 (2) n+1 2 jCj 1 2 exp[ 1 2 (z map m map ) T C 1 (z map m map )] (5.9) Where C is matrix produced by Lagrange multipliers and n is number of data. 5.5 Posterior step: Bayesian conditioning Now that we have the prior pdf f G (z map ) we can nd the posterior at prediction point given hard data and soft data: f K (z k ) =f G (z k jz hard ;z soft ) (5.10) 5.5.1 Posterior pdf by interval soft data Let x be a random variable and y and z with associate pdf f xyz (; ;). Then it can be proved that conditional pdf of x give the knowledge that y = and 1 <z < 2 is: f(j ; 1 <z < 2 ) = Z 2 1 f yz ( ;)d 1 Z 2 1 f xyz (; ;)d (5.11) using = z k , = z hard and = z soft and z map = [z hard z soft z k ] above equation can be proved [43]. 38 5.5.2 Posterior pdf by probabilistic soft data Now we need to compute the posterior when the soft data is a probabilistic like pdf. if we assume that the cumulative distribution of soft data is F s (z soft ) then it can be shown that [15]: f K (z k ) =f G (z k jz hard ;F s (z soft )) = R f G (z k ;z hard ;z soft )f s (z soft )dz soft R f G (z hard ;z soft )f s (z soft )dz soft (5.12) 5.6 BME estimation As it is shown in gure 5.2 [15], the rst step of BME is nding a prior pdf by using method of maximization of entropy. In this step we use the available general data (K G ) like physical laws, expert judgments and global moments. Maximization of entropy yields joint distribution including available data and prediction location. In the posterior step we use the Bayesian rule to include the specic knowledge K G in the neighborhood of estimation location. Finally by using posterior pdf it is possible to produce any indicator. 39 Figure 5.2: Schematic representation of BME, taken from [15] Figure 5.3 shows an example using SEKSGUI (Spatiotemporal Epistemic Knowledge Synthesis Graphical User Interface) framework [50, 29] in which we have both hard data and soft data. In this example, soft data are probabilistic type. Based on these data we try to estimate the variable at unsampled locations by BME. 40 Figure 5.3: Example of hard data and soft data Figure 5.4 is the map of the mean estimate by BME. This estimation is result of both hard data and soft data. Figure 5.4: Map of BME mean estimation 41 As it was mentioned earlier, output of BME at any location is a general pdf. Figure 5.5 shows one of these pdf's. Once we have the pdf, we can perform dierent processes on it and nd the necessary condence interval. Figure 5.5: BME posterior of concentration showing dierent condence intervals Here, we use BME approach on a real case study [50, 29]. First we use kriging to estimate the contamination map of a eld. In kriging we can only use hard data. Then we assume that hard data have error and transform the data to soft data. We repeat the analysis with soft data and compare the results. Following graphs show that incorporating soft data can make signicant dierence in the results. Figures 5.6 and 5.7 show mean of the estimation pdf for hard data and soft data. It can be observed that using soft data instead of hard data can cause a dierence in estimation. It shows importance of soft data in process of estimation which can improve the quality of estimation. 42 Easting(ft) Northing(ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 5.6: Mean of the estimation posterior pdf, Hard data Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 5.7: Mean of the estimation posterior pdf, Soft data Figures 5.8 and 5.9 show the error variance of estimation for hard data and soft data. It can be observed that incorporating soft data reduces the error variance of estimation which increase the quality of estimation. 43 Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0.5 1 1.5 2 2.5 3 3.5 Figure 5.8: Estimation error variance, Hard data Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0.5 1 1.5 2 2.5 3 3.5 4 Figure 5.9: Estimation error variance, Soft data Figures 5.10 and 5.11 show lower limit of BME estimations for hard data and soft data at condence intervals of 68%. It can be observed that after incorporating soft data, estimations map has some signicant improvements. 44 Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure 5.10: Lower limit of BME estimations condence intervals 68%, Hard data Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure 5.11: Lower limit of BME estimations condence intervals 68%, Soft data Figures 5.12 and 5.13 show upper limit of BME estimations for hard data and soft data at condence intervals of 68%. It can be observed that after incorporating soft data, estimations map has some signicant improvements. 45 Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 5.12: Upper limit of BME estimations condence intervals 68%, Hard data Easting (ft) Northing (ft) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 5.13: Upper limit of BME estimations condence intervals 68%, Soft data 5.6.1 Comparison of BME and kriging Although both these methods are very strong and exible, but BME method has some advantages over kriging estimation [15]: 46 While kriging is a linear estimator, BME is non-linear. While kriging is only capable of incorporating hard data, BME can handle large variety of soft data. BME posterior is generally non-gaussian Under some conditions it can be proved that method of kriging is a special case for method of BME. 47 Chapter 6 Mapping the probability 6.1 Indicator kriging In the process of decision making associated with soil remediation and to design a successful remediation strategy, uncertainty should be included in the analysis. One approach to locate new samples is to nd and locate the points which have highest estimation variance calculated by ordinary kriging [6]. But since kriging variance is homoscedastic which means it is independent of contamination value, it just represents the spatial conguration of measured data [19]. A non-parametric approach for characterization of contamination is Indicator Kriging [24]. In this method a conditional cumulative distribution function (ccdf) is constructed at any unsampled location. In this way we can analyze the uncertainty related to exceedance of limits instead of analyzing uncertainty of concentration itself [49]. In the process of remediation the ultimate goal is to nd and remove the con- taminated area where concentration is higher than threshold. Therefore estimating concentration of the contamination at unsampled locations is not enough by itself and we need to analyze the probability of exceedance of threshold which is much more important than accuracy of estimations. By using the Kriging method, we could estimate the distribution of concentra- tion at any unsampled location. For the purpose of remediation it would be more 48 beneciary if at any location we have the probability of the event that concentra- tion is larger than threshold. In this way we can perform risk assessment analysis on the result for the purpose of remediation. 6.2 Uncertainty model Suppose that we need to model uncertainty of soil property z at an unsampled location x 0 . while we have available data n points:fz(x i );i = 1;:::;ng. One way to show the uncertainty of random variableZ(x 0 ) is through distribution function [19]: F (x 0 ;zj(n)) =ProbfZ(x 0 )zj(n)g (6.1) where F (x 0 ;zj(n)) is conditional cdf conditioned by n data. Assuming a Gaussian distribution, we can simply construct the cumulative distribution function (cdf) by using the mean and the standard deviation of given Gaussian distribution: ProbfZ(x 0 )zj(n)g = z (6.2) where z = 1 2 1 +erf z p 2 (6.3) In order to estimate a non-parametric conditional cumulative distribution func- tion (ccdf) which does not need any assumption for the shape, we could estimate the ccdf for a series of dierent K cutos, z k , [49]: 49 F (x 0 ;z k j(n)) =ProbfZ(x 0 )z k j(n)g k = 1; 2;:::;K (6.4) To construct a non-parametric conditional cumulative distribution function (ccdf) we need to interpolate expectation of an indicator random variable I(x 0 ;z k )[12] : F (x 0 ;z k j(n)) =EfI(x 0 ;z k )j(n)g (6.5) where this indicator represents a transformation of all available observations, x i , into a series of cuto values,z k , showing that if limit has been exceeded or not [24]: I(x i ;z k ) = 8 > > > < > > > : 1 if z(x i )z k 0 otherwise (6.6) At any unsampled location x 0 , K values between 0 and 1 should be calculated as conditional cumulative distribution function (ccdf) values for z k . These values which should be non-decreasing, will construct the ccdf at each point that indicates the probability of concentration exceeding dierent limits. For each cuto value z k , after transformation of observations, we use Kriging and estimate the indicator at an unsampled location. That indicator shows the probability of exceeding threshold. Once we have performed this calculation for dierent cuto values, we can build the ccdf. The only information that is needed to perform kriging for each cuto isK indi- cator variograms which could be found by tting a model to available observations [26]: 50 (h;z k ) = 1 2N(h) N(h) X i=1 [I(x i ;z k )I(x i +h;z k )] 2 (6.7) As in can be observed, instead of assuming a Gaussian distribution at each point and trying to estimate a value by using the mean and the variance of normal distribution, we build the conditional cumulative distribution function at each point for dierent cuto values. As an example, gure 6.1 is result of ordinary kriging for Arsenic contamination and shows an estimation map which indicates concentration of Arsenic at each point. But Figure 6.2 which is result of indicator kriging for same Arsenic data, demonstrates the probability of event that contamination exceeds the limit at each point. As it can be observed Figure 6.2 is much more helpful for the purpose of uncertainty assessment. 51 Figure 6.1: Estimation map of contamination 52 Figure 6.2: Probability map of contamination 53 Chapter 7 Value of information 7.1 Sampling design Concerns associated with soil contamination is growing and has become one of the main issues for environmental protection agencies. One of the most dicult issues that environmental agencies have, is decision making about soil remedia- tion. After site investigation, it is hard to dierentiate between "nothing found" because of absence of contamination and "nothing found" because of poor quality of investigation [5]. The main problem is the uncertainty of contamination due to combination of complex conditions in contaminated soils. Another issue is the high cost of remediation that has a crucial role in the analysis and decision [42]. There exists various options and alternatives to make decision about soil remediation and each alternative has dierent consequences. Due to uncertainty associated with soil contamination, analysis can lead to wrong decision which can cause risk to human beings. Therefore there should be a balance between probabilities of correct decision consequences and false decision consequences [28]. While uncer- tainty can be reduced by taking more samples, on the other hand, collecting more data increases the cost of remediation [4]. Therefore new sampling designs should be ecient and cost eective. Traditionally there are dierent strategies used to design the sampling size, such as: (1) minimizing the cost for a given accuracy, (2) minimizing uncertainty for a specic budget, (3) following demands of regulations, and also some combination of these approaches [3]. 54 To converge to the best and optimum sampling design, sampling for soil contam- ination is usually performed in multiple stages not in a single stage [47]. Location and number of samples at each sampling stage are very important and should be selected with very carefully. One approach to locate new samples is to nd the points with highest estimation variance calculated by ordinary kriging [7]. But since the kriging variance is homoscedastic and is independent of the value of mea- sured contamination, it just represents the spatial conguration of contamination data [19]. Another approach to locate new samples is to analyze probability of exceeding of a critical threshold by conditional probability of contamination which was discussed in indicator kriging [13]. In these methods we can not fully include the uncertainty associated with estimation and remediation in the results, because these method can not combine dierent sources of data with dierent accuracy [14]. Therefore, we need to use an approach in which all the uncertainties asso- ciated with parameters can be considered. Value of information could be one of these approaches. 7.2 Value of information If we have access to perfect information about contaminant distribution then we can remove all the uncertainty in the process of estimation and decision making about soil remediation. But in the real word it is almost impossible to have access to perfect information. We can dene value of information as a Bayesian analysis for making decision in the presence of uncertainty [21]. Estimating the value of information can lead to better understanding and analysis of the problem [4]. Value of information could be considered as dierence between the quality of interest when additional data is used and when additional data is not used [18]. Therefore, 55 to design a cost eective remediation plan, the value of new additional information is computed with respect to cost of making wrong decision. Additional samples have value if it has some new information which can cause changes in the decision making process [36] One of the best ways to nd the value of sampling data is through a Bayesian decision analysis, which is used extensively for study of soil contamination [33]. Figure 7.1: Bayesian data worth analysis, taken from [21] We are interested to analyze and evaluate the value of dierent data collec- tion alternatives (sampling designs). To perform this analysis we have dierent approaches [18]: 56 Prior analysis: if we compare the design alternatives based on available data and before any data collection , it would be prior analysis. In this type of analysis everything is based on our prior knowledge about contamination. Posterior analysis: if we compare the design alternatives after collecting new samples, it would be posterior analysis. In this type of analysis the prior objective function is compared with the objective function updated by new samples. Pre-posterior analysis: If we perform the analysis after nding the number and location of samples of proposed sampling design, but prior to actual measurements, it would be preposterior analysis. In fact in this type of analysis we try to estimate the value of new samples before actually doing any measurement. Figure 7.1 shows a general overview of how these 3 types of analysis are related. First we estimate the remediation cost at the prior stage using the available infor- mation. Then at pre-posterior stage, we nd the optimal number for new samples and without taking the samples, we estimate the value of those samples. If that design is cost eective then we go through posterior stage and perform the sampling program and update the information. 7.3 Procedure In this section we try to brie y review some of the procedures that have been developed and used by other authors for evaluation of value of information. 57 7.3.1 Optimal Number of samples The following discussion about optimal number of samples is summary of method proposed by T. Norberg and L. Rosen [36]: If our decision is about detection of contamination and remediation of soil, rst we need to dene an action level or limit for contamination of soil. Failure happens when soil contamination level is discovered to be higher than action level. If we focus our study on one cell of soil (small area of soil), we can assume that P (C) is the prior probability of contamination in the cell. Ifk F is cost of failure andg B is benet of remediation. Then the prior risk cost of remediation is [36]: k F P R (F )g B P R (F ) (7.1) To be cost ecient we should have [36]: P (F )P R (F ) k R k F +g B (7.2) if we denote k R k F +g B by [36]: 0 = (k F +g B ) min(P (F );P R (F ) +)g B (7.3) The above calculations are prior calculations. Using Baye's theorem, the pos- terior probability of contamination is [36]: P (CjD) = P (C)P (DjC) P (C)P (DjC) +P (C )P (DjC ) (7.4) 58 Also we have [36]: P (FjD) =P (CjD)P (FjD;C) +P (C jD)P (FjC ;D) (7.5) Then the posterior minimum expected cost is [36]: 1 = 8 > > > < > > > : (k F +g B ) min(P (FjD);P(F R jD) +)g B if D (k F +g B ) min(P (FjD );P(F R jD ) +)g B if D (7.6) In order to compare this alternative with previous one, mean of 1 is calculated [36]: E[ 1 ] =(k F +g B ) min(P (FjD);P R (FjD) +)P (D) + (k F +g B ) min(P (FjD );P R (FjD ) +)P (D )g B (7.7) Then the expected value of sample information is [36]: W = 0 E[ 1 ] (7.8) and expected value of perfect information would be [36]: E[ 1p ] =(k F +g B ) min(P (FjC);P R (FjC) +)P (C) + (k F +g B ) min(P (FjC );P R (FjC ) +)P (C )g B (7.9) 59 Suppose that we have dierent strategies for remediation. If k S is cost of sampling strategy S, then [36]: W S (n) =W S (n)k S (n) (7.10) Then the worth of data is [36]: W = 8 > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > : 0 if P (C) 1 P(C) 1 1 W max if 1 P (C) 2 P(C) 2 W max if P (C) 2 0 if 2 P (C) (7.11) where 1 = P (DjC ) P (DjC ) + (1)P (DjC) (7.12) 2 = P (D jC ) P (D jC ) + (1)P (D jC) (7.13) W max = (1P (DjC )P (D jC))W max P (7.14) Assume that we haven> 0 random sample log-normally distributed in remedi- ation unit wherey i = logx i and and are unknown mean and standard deviation of soil and y = 1 n P y i is the mean of measured concentrations [36]. 60 Let AL be the threshold such that if > AL the soil is contaminated. Also, c is a threshold such that if y>c contamination is detected. Then [36], P (DjC ) =P (ycj AL ) = 1 c = p n 1 c AL = p n (7.15) is standard normal distribution function. And [36], P (D jC) =P (y<cj AL ) = c = p n c AL = p n (7.16) Then [36]: P (DjC ) = 1 ((c 0 +k 1 ) p n)) (7.17) and P (D jC) = ((c 0 k 2 ) p n)) (7.18) where c 0 = (c AL )=. The above discussion about optimal number of samples was summary of method proposed by T. Norberg and L. Rosen [36]. In this method Norberg and Rosen tried to nd the optimum number of samples for one soil cell. This number is calculated based on failure probabilitiesP (D jC) and P (DjC ). As it can be observed in this method location of samples are not determined. 61 7.3.2 Risk - cost - benet decision analysis The following discussion about value of information is summary of method pro- posed by Par-Erik Back [3]: We can dene the objective function of each alternative as [18] : i =B i C i P f C f (7.19) Where B is benet of remediation, C is cost of remediation, C f is cost of failure and P f is probability of failure. Figure 7.2: Decision tree for contaminated land, taken from [3] To assess the worth of a any proposed sampling program, we need both prior and posterior analysis. As already mentioned, the data worth is dened as dier- ence between objective functions at pre-posterior and prior stage. Now we need to exactly explain the details of data worth analysis. We can summarize the process of data worth analysis in ve steps [3]: Sampling The objective of sampling is nd out the mean concentration at remediation unit as a measure to judge about the contamination. The true mean of concentration is a xed but unknown value and it is assumed that distribution of concentrations is log-normal. [3] 62 Figure 7.3: Procedure of value of information, taken from [3] Probability estimation For data worth analysis we need to calculate 4 types of probabilities [18]: Probabilities of the state, P [state], which shows the probabilities of alternative states: contaminated or not contaminated: if>AL =C + then we consider the soil as contaminated and if <AL =C . we consider the soil as not contaminated. Here, is true but unknown concentration mean. [3] Probabilities of the sample, P [sample], which refers to probabilities of sampling results and re ects detection or not detection of contamina- tion: if y > AL = D + we consider the sampling result as detected and if y<AL =D then sampling result would be not detected. Here,y is average of measurements and AL is a threshold dened by environmental agencies.[3] 63 Conditional probabilities given the sample, P [statejsample] which is a model for estimating probability of sample given the state.[3] Conditional probabilities given the state, P [samplejstate] which includes our main goal to update the prior probability P [C] to the poste- rior probability of failure P [CjD ] by Bayes rule: [3, 18] P [statejsample] = P [samplejstate]:P [state] P fP [samplejstate]:P [state]g (7.20) Value of information Let be the true but unknown mean of log-transformed concentration of popula- tion. We try to estimate this parameter by taking some samples, y, at dierent locations and assume y is log-transformed mean of samples at these locations [3]. yN ; ln(CV 2 + 1) n (7.21) This distribution is considered as a model for bayesian updating in order to nd the likelihood of Sample given the State, e.g. P (DjC). Prior At the prior phase of data worth analysis, we have dierent ways to classify a remediation unit. Assume that the unknown is given. Then by using distribution ofy we nd a PDF for mean of samples around. Since Action Level (AL) is given, probability of event that samples mean is greater than AL can be calculated [3]. 64 Following probability shows one of four probabilities of P [samplejstate]: P (D jC + ) = P (D \C + ) P (C + ) = Z 1 AL P (D j)f prior ()d P (C + ) (7.22) where P (D + j) = P (y > ALj) is computed from normal distribution of y and f prior is prior probability distribution (PDF) of contamination and f prior () is the prior PDF of contamination evaluated at [3]. Finally, Probability of detection will be [3]: P (D + ) =P (C ):P (D + jC ) +P (C + ):P (D + jC + ) (7.23) Pre-posterior Using Baye's theorem we can estimate four probabilities of pre-posterior stage, P [statejsample]: For example, failure probability is estimated as [3]: P (C + jD ) = P (D jC + ):P (C + ) P (D ) = P (D jC + ):P (C + ) P (C ):P (D jC ) +P (C + ):P (D jC + ) (7.24) Cost analysis Failure cost is the cost related to making wrong decision which may cause risk to humans. Failure cost is very uncertain and result of analysis is very sensitive to this parameter. 65 Failure cost is function of how much true mean of concentration is higher than action level [3]. Failure cost,C f could be modeled by an asymmetric loss function, f loss () which if function of true mean [17]: C f = Expected loss P (C + ) = Z 1 AL f prior ()f loss ()d P (C + ) (7.25) where f prior () is the prior PDF of contamination evaluated at . To nd value of new information we need to nd the dierence of objective functions at prior and pre-posterior phases. Pre-posterior phase is analyzed when we have dened the design of new measurements but have not performed the sampling yet. If we have dierent remediation alternatives, then we have dierent objective function, i . Then at prior stage [3]: prior = max[ i ] (7.26) and at pre-posterior phase [3]: preposterior = P (D + ) max[ i jD + ] + P (D ) max[ i jD ] (7.27) Then, estimated value of information is [3]: EVI = preposterior prior (7.28) This expected value of information can be calculated as function of n, which gives us the optimum number of samples. 66 The above discussion about value of information is summary of method pro- posed by Par-Erik Back [3]. 7.4 Discussion As it was mentioned in previous sections, the goal of implementation of value of information is to nd the best sampling design for soil remediation. It means that we are trying to ne the optimum number and best locations for new samples which provides the maximum information while it has the minimum cost. We brie y explained two approaches for value of information analysis in soil remediation. These methods can estimate the optimum number of samples in a remediation unit of soil. But these approaches do not propose any specic method for nding the best locations of sampling. Although these methods can nd an ecient number for sampling, they do not consider the importance of location and conguration of samples. In fact, a successful remediation plan should take into the account all the available resources of knowledge about contamination. The conguration and spatial correlation of data is an important factor that has not been included in this method. The covariance matrix of data is the tool we implement to consider the correlation of data. Using correlation matrix can play a signicant role in nding best location for samples. Another problem in these methods is related to prior probabilities. The pro- cedure of calculating value of information in previous sections show that results are strongly dependent on prior knowledge and prior probabilities that we assume. Since these prior probabilities can not be exactly calculated and should be esti- mated based on dierent available sources, it can increase the uncertainty associ- ated with the results. 67 In the next chapter we try to propose a method to nd the best sample locations. This method is a probabilistic approach which computes the distance between dierent distribution functions. 68 Chapter 8 Sample design 8.1 Kullback-Leibler divergence Kullback-Leibler (KL) divergence or relative entropy is dened as a measure in probability theory which shows proximity of two probability distribution function. Also, Kullback-Leibler divergence can be considered as dierence between amount of information that two distribution carry [30]. Kullback-Leibler divergence for two discrete probability distributions of P and Q is non-negative and not symmetric and is dened as [31]: D KL (PkQ) = X i P i ln P i Q i (8.1) In continuous case, Kullback-Leibler divergence of P and Q is dened as: D KL (PkQ) = Z 1 1 P (x) ln P (x) Q(x)dx (8.2) Intuitively, If P is distribution of data and Q is our model, Kullback-Leibler divergence represents the average likelihood [45]. In summary, the Kullback-Leibler divergence can be considered as a distance which distinguishes between statistical distributions. 69 8.2 Multi-variate Normal Distribution Usually in environmental studies, data related to concentration of contamination is modeled as a lognormal distribution [46]. Therefore, since we are analyzing soil contamination, we can use statistical properties of normal distribution for characterization of contamination. Estimation of kriging at each point is a Gaussian random variable. Therefore, given the mean and spatial correlation between dierent points we can assume that our estimation is a k dimensional multi-variate normal distribution. xN(; ) (8.3) where is ak dimensional mean vector of concentration and iskk covariance matrix which shows correlation structure. Assuming that at the each step of estimation process we have a probability distribution function of contamination, the Kullback-Leibler divergence between these two multi-variate normal distribution can be written as [40]: D KL (N 0 kN 1 ) = 1 2 tr 1 1 0 +( 1 0 ) T 1 1 ( 1 0 )ln det 0 det 1 k (8.4) Where 0 is covariance matrix of old estimation, 1 is covariance matrix of new estimation, 0 is mean vector of old estimation, and 1 is mean vector of new estimation. This distance can be used as a measure to demonstrate how new data at each step can cause improvement or change between distributions. The larger distance means more new information. This equation is a powerful tool which can help us to nd the value of additional samples. The more distance caused by a new sample means that that sample is more informative. 70 In the following sections we implement this approach to analyze the value of information. 8.3 KL distance from global set The purpose of this section is to show the application and concept of Kullback- Leibler divergence (KL distance). Since we have a lot of real data from con- taminated soil mentioned in previous chapters, we use these data to validate our approach. In the rst step, we start our analysis with 10% of the real data. In order to select this subset, we randomly generate 200 subsets with 10% of data and then calculate the KL distance between each of these subsets and global set. In this way we can nd the subset which have the maximum and minimum KL distance from global set. The maximum distance from global set represents the worst estimation and minimum distance from global set represemts the possible estimation with a subset containing 10% of data. 71 0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 200 250 300 % of data KL distance Maximum KL distance Minimum KL distance Figure 8.1: KL distance from global set We repeat this analysis for random subsets containing 20% and continue up to 90%. At each step we perform 200 iterations to nd the maximum and minimum distance from global set. Figure 8.1 shows the results of this analysis. It can be clearly observed that adding more data increases the information content and also after adding more sample points KL distance from global decreases. Figure 8.2 is on estimation map based on 100% of data which is considered as global set in this problem. The color bar shows the log-transformed concentration. Following graphs are estimation maps for dierent subsets with maximum and minimum KL distance. 72 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.2: Estimation map of contamination based on 100% of data 73 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 Figure 8.3: Estimation map of contamination using 10% of data, minimum KL distance at the top and maximum distance at the bottom 74 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 Figure 8.4: Estimation map of contamination using 20% of data, minimum KL distance at the top and maximum distance at the bottom 75 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 Figure 8.5: Estimation map of contamination using 30% of data, minimum KL distance at the top and maximum distance at the bottom 76 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 Figure 8.6: Estimation map of contamination using 40% of data, minimum KL distance at the top and maximum distance at the bottom 77 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 Figure 8.7: Estimation map of contamination using 50% of data, minimum KL distance at the top and maximum distance at the bottom 78 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Figure 8.8: Estimation map of contamination using 60% of data, minimum KL distance at the top and maximum distance at the bottom 79 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Figure 8.9: Estimation map of contamination using 70% of data, minimum KL distance at the top and maximum distance at the bottom 80 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.10: Estimation map of contamination using 80% of data, minimum KL distance at the top and maximum distance at the bottom 81 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.11: Estimation map of contamination using 90% of data, minimum KL distance at the top and maximum distance at the bottom 82 As it can be observed, by adding more data, estimation map approaches to estimation of global set. After adding almost 50% of data, the estimation map is very similar to map based on 100% of data. It means that if sample locations had been selected correctly we would have the same estimation with only half of current data. 8.4 KL distance between successive increments In the previous section we compared each set of data with global set which contains 100% of data. Since in the real case studies we don't have access to global set, in this section we use another approach. We try to nd KL distance step by step to see the behavior of improvement. We start our analysis with 10% of data and then estimate the contamination map. In order to do this, we randomly choose a subset with 10% of data. Then we try to add another 10% and make it a subset with 20% of data. For this purpose, we randomly generate 200 subsets with 10% of data and add them to current 10% subset. Therefore, we will have 200 subsets with 20% of data. Then we estimate the contamination map for all 200 subsets and calculate the KL distance between these estimations and previous map where we had used only 10% of data. Among these 200 calculated KL distances we pick the maximum and the minimum. We continue this procedure for other increments of 10% until we reach 90% of data. At each increment we calculate the KL distance from previous step. In this way, at each step we can nd the best samples for the next step. After adding these samples, we continue this procedure for all other steps until we reach the global set. Figure 8.12 shows the results of this analysis. 83 0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 % of data KL distance Maximum KL distance Minimum KL distance Figure 8.12: KL distance between successive increments This graph shows that value of subsets are not equal. As it can be observed from gure 8.12 some of subsets can make a large distance between estimation. At one point we observe a maximum and after that although adding more data increases the information content, but KL distance from next step becomes very small. it means that those sample have less information content compared to previous ones. This graph demonstrates that there is a optimal number of samples for estima- tion. After that optimum point, if we increase the number of samples, there is no signicant change in estimation. 84 We can use this conclusion for nding the optimum number of samples and also for nding the best location for samples. Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.13: Estimation map of estimation based on 10% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 8.14: Error Variance, 10% of kriging data 85 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.15: Estimation map of estimation based on 20% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 8.16: Error Variance, 20% of kriging data 86 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.17: Estimation map of estimation based on 30% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.18: Error Variance, 30% of kriging data 87 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.19: Estimation map of estimation based on 40% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.20: Error Variance, 40% of kriging data 88 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.21: Estimation map of estimation based on 50% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.22: Error Variance, 50% of kriging data 89 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.23: Estimation map of estimation based on 60% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.24: Error Variance, 60% of kriging data 90 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.25: Estimation map of estimation based on 70% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.26: Error Variance, 70% of kriging data 91 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.27: Estimation map of estimation based on 80% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.28: Error Variance, 80% of kriging data 92 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.29: Estimation map of estimation based on 90% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Figure 8.30: Error Variance, 90% of kriging data 93 8.5 Using kriging estimation as data, compared to global set Next step is to use kriging estimation at dierent point as data. The reason of this approach is that in real cases we don't have access to global set and we should make decision based on limited number of available data. In this section, after nding the sample locations, we use global set for comparison and validation. First of all we randomly remove half of measurements and continue with remain- ing 50% of data. We do kriging by 50% of data and nd the estimation map. Then we use estimation map as a source of data. Similar to previous sections, we start our analysis with 10% of estimated data from kriging and again try to estimate the contamination map. In order to do this, we randomly choose 200 subsets with 10% of estimated data and then calculate the KL distance between each of these subset and global set. Then we try to add another 10% and make it a subset with 20% of estimated data. For this purpose we randomly generate 200 subsets with 10% of estimated data and then add them to previous 10%. Once we have 200 subsets with 20% of data, we estimate the contamination map for all 200 subsets. Then we calculate the KL distance between these estimations and global set. Among these 200 calculated KL distances we pick the maximum and the minimum. We continue this procedure for other increments of 10% until we reach 90% of estimated data. At each increment we calculate the KL distance from global set. In this way, at each step we can nd the best samples. Figure 8.31 shows the result of this analysis. 94 0 10 20 30 40 50 60 70 80 90 100 150 160 170 180 190 200 210 220 230 240 250 260 % of data KL distance Maximum KL distance Minimum KL distance Figure 8.31: KL Distance kriging data and global set As it can be observed in gure 8.31 when we add more kriging data to our analysis, the content of information increases. Also gure 8.31 shows that rst added subsets can make a large distance between estimation and global set. After some point although adding more data increases the information content, but KL distance from global set becomes very small. it means that those new sample have less information content compared to previous ones. In other words, by using this approach we can choose more informative samples earlier. Figure 8.32 shows the estimation map based on randomly selected 50% of data. Next gures are maximum and minimum KL distance of increments from global 95 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Figure 8.32: Estimation map of contamination, 50% random data set. It can be observed that after couple of steps, our estimation has captured the main features of actual contamination map. These graphs helps us to make an important conclusion. These results demon- strate that KL distance approach can be considered as a strong tool for nding best location for samples. So far we have shown that, given the global set we can select the best samples among all the samples which can provide the most information. Moreover we analyzed the situation that we have no access to global set. In this case using kriging estimation helped us to nd the best location for new samples. The results show that if we remove half of the data and use the estimation map which is based on remaining data, we can nd best location for samples. In this 96 analysis we can almost capture main features of global set estimation by almost half of data. Following graphs show the results of this analysis. Left columns are minimum KL distance and right columns are maximum KL distance. Each row indicates 10% increment increase. 97 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.33: Estimation map of contamination based on 10% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 98 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.34: Estimation map of contamination based on 20% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 99 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.35: Estimation map of contamination based on 30% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 100 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.36: Estimation map of contamination based on 40% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 101 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.37: Estimation map of contamination based on 50% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 102 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.38: Estimation map of contamination based on 60% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 103 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.39: Estimation map of contamination based on 70% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 104 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.40: Estimation map of contamination based on 80% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 105 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 Figure 8.41: Estimation map of contamination based on 90% of kriging data, minimum KL distance at the top and maximum KL distance at the bottom 106 8.6 Using kriging estimation as data, compared to next step At this section we take one more step forward in using KL distance concept for locating new samples. In this section, rst we remove half of the data and work with 50% of available data. The dierence here is that in this section we don't use global set for comparison anymore and we use next step estimation for calculating KL distance and comparison. First of all we randomly remove half of measurements and continue with remain- ing 50% of data. We do kriging by 50% of data and nd the estimation map. Then we use estimation map as a source of data. We start our analysis with 10% of estimated data from kriging and again try to estimate the contamination map. In order to do this, we randomly choose a subset with 10% of estimated data and then calculate the KL distance between this subset and next step. We randomly generate 200 subset with10% of data and add to previous 10% which totally becomes 20%. Then we try to nd the subsets which gives the maximum and minimum KL distance between this 20% and previous step. In this way we can nd the best subset with 20% of estimated data. Then we try to add another 10% and make it a subset with 30% of estimated data. Again, For this purpose we randomly generate 200 subsets with 10% of estimated data and then add them to previous 20%. Once we have 200 subsets with 30% of data, we estimate the contamination map for all 200 subsets. Then we calculate the KL distance between these estimations and previous step. Among these 200 calculated KL distances we pick the maximum and the minimum. 107 We continue this procedure for other increments of 10% until we reach 90% of estimated data. At each increment we calculate the KL distance from previous step. In this way, at each step we can nd the best samples. Figure 8.42 shows the result of this analysis. Figure 8.42 shows that at the rst step, we can select the most informative subset. As it can be observed, rst selected subsets have the largest KL distance from previous step. As we select more subsets the KL distance decreases. It can be clearly seen that last subsets have small contribution in improving estimation. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 % of data KL distance Maximum KL distance Minimum KL distance Figure 8.42: Estimation map, 50% random data This approach indicates that to pick the best location for samples, we should select the subsets which have the larger KL distance from previous step. Following graphs show the estimation map and error variance map for selected subsets in this approach. 108 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.43: Estimation map of contamination based on 10% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 8.44: Error Variance, 10% of kriging data 109 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.45: Estimation map of contamination based on 20% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 8.46: Error Variance, 20% of kriging data 110 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.47: Estimation map of contamination based on 30% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 8.48: Error Variance, 30% of kriging data 111 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.49: Estimation map of contamination based on 40% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 8.50: Error Variance, 40% of kriging data 112 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.51: Estimation map of contamination based on 50% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 8.52: Error Variance, 50% of kriging data 113 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.53: Estimation map of contamination based on 60% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 8.54: Error Variance, 60% of kriging data 114 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.55: Estimation map of contamination based on 70% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure 8.56: Error Variance, 70% of kriging data 115 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.57: Estimation map of contamination based on 80% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 Figure 8.58: Error Variance, 80% of kriging data 116 Easting (ft) Northing (ft) Estimation map (min KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Easting (ft) Northing (ft) Estimation map (max KL distance) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.59: Estimation map of contamination based on 90% of kriging data, minimum KL distance on the left and maximum KL distance on the right Easting (ft) Northing (ft) Error variance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Figure 8.60: Error Variance, 90% of kriging data 117 These gures clearly indicate that after adding 50% of data, we have captured the most features of contamination map. This fact, shows the importance of location of samples in soil remediation problems. In this example, by using KL divergence method we could make same estimation as estimation of global set, by using only half of available data. In general, we can implement this approach to all of soil remediation cases. We can use available data to locate the next required samples. 8.6.1 Example of a design By using above mentioned procedure, we can nd the best sample locations. Fol- lowing graphs demonstrate an example of a design found by this method: 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.61: Sample design for 10% of data 118 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.62: Sample design for 20% of data 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.63: Sample design for 30% of data 119 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.64: Sample design for 40% of data 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.65: Sample design for 50% of data 120 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.66: Sample design for 60% of data 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.67: Sample design for 70% of data 121 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.68: Sample design for 80% of data 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.69: Sample design for 90% of data 122 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 Figure 8.70: Sample design for 100% of data %begingure[H] 8.7 Assigning real data to sample design Now that we have found a design for location of samples, we try to validate this design by using real data. Since we have access to a real case study, we implement this approach on available data to see if this design can help us in analysis of value of information. In this section, rst we assign real data to nearest grid points of sample design, and then based on priority of data location we add 10% subsets at each step. Following graphs show the results of kriging estimation when we add 10% of data at each step. 123 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure 8.71: Estimation map of contamination based on 10% of real data Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure 8.72: Estimation map of contamination based on 20% of real data 124 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.73: Estimation map of contamination based on 30% of real data Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.74: Estimation map of contamination based on 40% of real data 125 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.75: Estimation map of contamination based on 50% of real data Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.76: Estimation map of contamination based on 60% of real data 126 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.77: Estimation map of contamination based on 70% of real data Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.78: Estimation map of contamination based on 80% of real data 127 Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.79: Estimation map of contamination based on 90% of real data Easting (ft) Northing (ft) Estimation map 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 −3 −2 −1 0 1 2 3 Figure 8.80: Estimation map of contamination based on 100% of real data 128 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 % of data KL distance Figure 8.81: KL distance from global set Figure 8.81 shows the KL distance of each subset from global set. As we increase the number of data, information content of estimation increases and distance from global set decreases. 129 0 10 20 30 40 50 60 70 80 90 100 50 55 60 65 70 75 80 85 90 95 100 % of data KL distance Figure 8.82: Successive KL distance Figure 8.82 shows the successive distance between subsets. At each step we compute the distance from previous step. In this analysis we can nd the best sample locations at each step which has the largest KL distance from previous step. This method can be implemented on any similar problem to nd the most ecient sample design. 130 Chapter 9 Summary and Conclusion The motivation of this research was to propose a new approach to geostatistics for soil remediation problems. We tried to estimate a probability distribution function (pdf) at any unsampled location based on data given at neighboring locations and nally identify the best locations for next samples. In this research, as a case study we worked on pollution data of an oil renery. Our initial analysis was performed by ordinary kriging. Considering shortcomings and limitations of kriging we tried to nd a more exible method to meet our demands. Therefore we chose Bayesian maximum entropy (BME) as an alternative method which has several advantages over kriging. The main advantage of BME is its capability to incorporate dierent type of data in estimation. BME can utilize the soft data such as interval data and prob- ability data. By incorporating soft data, we can include several valuable sources of information in estimation such as expert knowledge and previous experiences. The output of BME is generally a non-gaussian posterior pdf. Once we nd the posterior we can nd various indicators such as mean, mode, condence interval, etc. After the estimation analysis, we need to make a decision about remediation. In the process of decision making associated with soil remediation, uncertainty should be considered in the analysis. In the process of remediation the ultimate goal is to nd and remove contaminated area where concentration is higher than threshold. Therefore estimating concentration of contamination at unsampled locations is not 131 enough by itself and we need to analyze the probability of exceedance of threshold which is much more important than accuracy of estimations. instead of assuming a Gaussian distribution at each point and trying to estimate a value by using the mean and the variance of normal distribution, we build the conditional cumula- tive distribution function at each point for dierent cuto values. This approach which is called Indicator Kriging helps us to nd the probability of exceedance of threshold. To make better decisions and improve the estimation, we need to dene an ecient procedure for sampling design. We need to identify the best location for future samples which provides the most information. Our nal goal in this research is that to develop a comparison method for predicted distributions (pdf) in order to evaluate the value of information. So far we have discussed several estimation methods estimating a pdf as output. Purpose of comparison between these distribution is to quantify the information carried by data. Since in soil remediation problem we are always dealing with uncertainty and it is almost impossible to have access to perfect information, we can dene value of information as a Bayesian analysis for making decision in the presence of uncer- tainty [21]. To design a cost eective remediation plan, the value of new additional information is computed with respect to cost of making wrong decision. Additional samples have value if it has some new information which can cause changes in the decision making process [36] Kullback-Leibler divergence or relative entropy is dened as a measure in proba- bility theory which shows proximity of two probability distribution function. Also, Kullback-Leibler divergence can be considered as dierence between amount of information that two distribution carry [30]. 132 In environmental studies, data related to concentration of contamination could be modeled as a lognormal distribution [46]. As the result, we can use statistical properties of normal distribution for characterization of soil contamination. Also, given the mean and spatial correlation of contamination between dierent points we can assume that the kriging estimation map is a n dimensional multi-variate normal distribution. Assuming that at each step of estimation we have a probability distribution function, the Kullback-Leibler divergence between these two multi-variate normal distribution can be computed. This distance demonstrates how new data can cause change between two distributions. The larger distance means more new information. This equation is a powerful tool which can help us to nd the value of additional samples. The larger distance caused by a new sample means that sample is more informative. As we increase the number of samples, our predictions become more reliable and less uncertain. But when the number of samples exceeds a specic limit, the new samples do not provide any additional information. We try to nd this limited by using Kullback-Leibler divergence method. First we use real data to validate our method. Results show when we add more data, Kullback-Leibler distance from global set decreases and nally approaches to zero. In this way, we can identify the samples that carry more information than others. Results show that we could estimate the same contamination map only with half of available data. In other words, if we select samples carefully, haly of data would be sucient for estimation. But, in real case studies, we don't have access to global set for comparison. To overcome this challenge, we use kriging output as our global set. To do this, rst we pick a limited number of available samples and do kriging. Output of kriging 133 is a contamination map from which we can sample. Assuming this kriging map is global set, we can implement previous Kullback-Leibler divergence method to identify the best locations. After nding best locations, if we assign actual data to those location, it will be observed that after incorporating only 50% of data our estimation map converges to ultimate map. In summary, the Kullback-Leibler divergence method is a powerful method to nd an ecient sampling design based on limited number of data. This method identies the locations with most information content. 134 References [1] United States Environmental Protection Agency. Retrieved from http://www.epa.gov/brownelds/. [2] Spatial Analysis and Decision Assistance (SADA). geostatistical-based deci- sion assistance software, university of tennessee. http://www.tiem.utk.edu/ sada/. [3] P ar-Erik Back. A model for estimating the value of sampling programs and the optimal number of samples for contaminated soil. Environmental Geology, 52(3):573{585, 2007. [4] P ar-Erik Back, Tommy Norberg, and Lars Ros en. On the Cost-Eectiveness of Sampling Contaminated Soil: Comparison of Two Models. Citeseer, 2007. [5] R. Bosman. Sampling strategies and the role of geostatistics in the investiga- tion of soil contamination. In Contaminated Soil'93, volume 2, pages 587{597. Springer Netherlands, 1993. [6] TM Burgess and R Webster. Optimal interpolation and isarithmic mapping of soil properties. Journal of Soil Science, 31(2):333{341, 1980. [7] TM Burgess, R Webster, and AB McBratney. Optimal interpolation and isarithmic mapping of soil properties. iv sampling strategy. Journal of Soil Science, 32(4):643{659, 1981. [8] G Christakos. Modern spatiotemporal geostatistics: Oxford university press. New York, 2000. [9] George Christakos. A bayesian/maximum-entropy view to the spatial estima- tion problem. Mathematical Geology, 22(7):763{777, 1990. [10] George Christakos. Random eld models in earth sciences. Courier Dover Publications, 1992. 135 [11] George Christakos and Vijayanivas R Raghu. Dynamic stochastic estimation of physical variables. Mathematical Geology, 28(3):341{365, 1996. [12] VD Clayton and GJ Andre. Gslib geostatistical software library and users guide, oxford university press, 1992. [13] D D'Or. Towards a real-time multi-phase sampling strategy optimization. In Geostatistics for Environmental Applications, pages 355{366. Springer, 2005. [14] D. D'Or and P. Bogaert. Spatial prediction of categorical variables with the bayesian maximum entropy approach: the ooypolder case study. European Journal of Soil Science, 55(4):763{775, 2004. [15] Dimitri D'Or. Spatial prediction of soil properties, the bayesian maximum entropy approach. University Catholique de Louvain, 2003. [16] Barrett P Eynon and Paul Switzer. The variability of rainfall acidity. Cana- dian Journal of Statistics, 11(1):11{23, 1983. [17] GT Flatman and EJ Englund. Asymmetric loss function for superfund reme- diation decisions. Technical report, Environmental Protection Agency, Las Vegas, NV (United States). Environmental Monitoring Systems Lab., 1992. [18] R Allan Freeze, Bruce James, Joel Massmann, Tony Sperling, and Leslie Smith. Hydrogeological decision analysis: 4. the concept of data worth and its use in the development of site investigation strategies. Ground Water, 30(4):574{588, 1992. [19] Pierre Goovaerts. Geostatistics for natural resources evaluation. Oxford Uni- versity Press on Demand, 1997. [20] John Haslett and Adrian E Raftery. Space-time modelling with long-memory dependence: Assessing ireland's wind power resource. Applied Statistics, pages 1{50, 1989. [21] Bruce R James and Steven M Gorelick. When enough is enough: The worth of monitoring data in aquifer remediation design. Water Resources Research, 30(12):3499{3513, 1994. [22] Edwin T Jaynes. On the rationale of maximum-entropy methods. Proceedings of the IEEE, 70(9):939{952, 1982. [23] Edwin T Jaynes. ET Jaynes: Papers on probability, statistics, and statistical physics, volume 50. Springer, 1989. 136 [24] AG Journel. Nonparametric estimation of spatial distributions. Journal of the International Association for Mathematical Geology, 15(3):445{468, 1983. [25] Andre G Journel and Charles J Huijbregts. Mining geostatistics, volume 600. Academic press London, 1978. [26] Kai-Wei Juang and Dar-Yuan Lee. Simple indicator kriging for estimating the probability of incorrectly delineating hazardous areas in a contaminated site. Environmental Science & Technology, 32(17):2487{2493, 1998. [27] George Judge and Douglas Miller. Maximum entropy econometrics: Robust estimation with limited data. John Wiley, 1996. [28] Paul R Kleindorfer, Howard G Kunreuther, and Paul JH Schoemaker. Deci- sion sciences: an integrative perspective. Cambridge University Press, 1993. [29] A Kolovos, HL Yu, and G Christakos. Seks-gui v. 0.6 user manual. Department of Geography, San Diego State University: San Diego, CA, 2006. [30] Solomon Kullback. Information theory and statistics. Courier Dover Publica- tions, 1968. [31] Solomon Kullback and Richard A Leibler. On information and suciency. The Annals of Mathematical Statistics, 22(1):79{86, 1951. [32] Phaedon C Kyriakidis and Andr e G Journel. Geostatistical space{time mod- els: a review. Mathematical geology, 31(6):651{684, 1999. [33] Joel Massmann and R Allan Freeze. Groundwater contamination from waste management sites: The interaction between risk-based engineering design and regulatory policy: 1. methodology. Water Resources Research, 23(2):351{367, 1987. [34] Georges Matheron. The theory of regionalized variables and its applications, volume 5. Ecole national sup erieure des mines, 1971. [35] Donald E Myers. Matrix formulation of co-kriging. Journal of the Interna- tional Association for Mathematical Geology, 14(3):249{257, 1982. [36] Tommy Norberg and Lars Ros en. Calculating the optimal number of contami- nant samples by means of data worth analysis. Environmetrics, 17(7):705{719, 2006. [37] United States Department of Labor. Retrieved from http://www.osha.gov/sltc/brownelds/. 137 [38] Margaret A Oliver and R Webster. Kriging: a method of interpolation for geo- graphical information systems. International Journal of Geographical Infor- mation System, 4(3):313{332, 1990. [39] A Papritz and H Fl uhler. Temporal change of spatially autocorrelated soil properties: optimal estimation by cokriging. Geoderma, 62(1):29{43, 1994. [40] William D Penny. Kullback-liebler divergences of normal, gamma, dirichlet and wishart densities. Wellcome Department of Cognitive Neurology, 2001. [41] Ignacio Rodr guez-Iturbe and Jos e M Mej a. The design of rainfall networks in time and space. Water Resources Research, 10(4):713{728, 1974. [42] Roland W Scholz and Ute Schnabel. Decision making under uncertainty in case of soil remediation. Journal of environmental management, 80(2):132{ 147, 2006. [43] Marc Laurent Serre. Environmental spatiotemporal mapping and ground water ow modelling using the BME and ST methods. PhD thesis, Citeseer, 1999. [44] Claude Elwood Shannon and Warren Weaver. A mathematical theory of communication, 1948. [45] Jonathon Shlens. Notes on kullback-leibler divergence and likelihood theory. System Neurobiology Laboratory, Salk Institute for Biological Studies, Califor- nia, 2007. [46] Ashok K Singh, Ashok K Singh, Anita Singh, and Max Engelhardt. Technol- ogy Support Center Issue: Lognormal Distribution in Environmental Appli- cations. US Environmental Protection Agency, National Exposure Research Laboratory, 1997. [47] Greg A Stenback et al. Geostatistical sample location selection in expedited hazardous waste site characterization. ASCE, 2000. [48] Robert N Stewart, S Tom Purucker, and GE Powers. Sada: a freeware deci- sion support tool integrating gis, sample design, spatial modeling, and risk assessment. In Proceedings of the International Symposium on Environmental Software Systems, Prague, Czech Republic, 2007. [49] Marc Van Meirvenne and P Goovaerts. Evaluating the probability of exceeding a site-specic soil cadmium contamination threshold. Geoderma, 102(1):75{ 100, 2001. 138 [50] Hwa-Lung Yu, Alexander Kolovos, George Christakos, Jiu-Chiuan Chen, Steve Warmerdam, and Boris Dev. Interactive spatiotemporal modelling of health systems: the seks{gui framework. Stochastic Environmental Research and Risk Assessment, 21(5):555{572, 2007. 139
Abstract (if available)
Abstract
A main challenge in decision making on soil remediation is risk management through acquisition of knowledge using more informative models and data. To demonstrate the approach to this problem, value of information should be considered and computed. To improve the quality of prediction of contamination we need to have more data or samples. When we add new samples to our model, we get new probability distribution function for the quantity of interest. Then we need to compare the old (prior) and new (posterior) predictions and quantify the difference and improvement from one prediction to another one and measure information content of predictions. Also, to find the best design for location of new samples, we should find the samples that provide maximum information compared to other samples. Samples with maximum information will be the optimal design for location of new data. The optimal design minimizes the cost of sampling because it provides the most possible information with minimum number of samples. ❧ In this research we try to identify the best locations for new samples. To address this problem, Kullback-Leibler divergence which is a measure in probability theory is used. The Kullback-Leibler divergence represents the difference between two probability distribution functions. Under the assumption of Multi-variate normal distribution for log-transformed concentration data, the Kullback-Leibler divergence can be considered as a tool to measure the information content of new samples. Computation of information content of new samples helps us to optimize the location of new samples.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Design optimization under uncertainty for rotor blades of horizontal axis wind turbines
PDF
Effective flow and transport properties of deforming porous media and materials: theoretical modeling and comparison with experimental data
PDF
Probabilistic data-driven predictive models for energy applications
PDF
Model selection principles and false discovery rate control
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Deep learning for characterization and prediction of complex fluid flow systems
PDF
Machine-learning approaches for modeling of complex materials and media
PDF
A polynomial chaos formalism for uncertainty budget assessment
PDF
Stochastic and multiscale models for urban and natural ecology
PDF
Efficient simulation of flow and transport in complex images of porous materials and media using curvelet transformation
PDF
Multiscale and multiresolution approach to characterization and modeling of porous media: From pore to field scale
PDF
Hydraulic fracturing and the environment: risk assessment for groundwater contamination from well casing failure
PDF
Mutual information estimation and its applications to machine learning
PDF
A stochastic Markov chain model to describe cancer metastasis
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
PDF
Damage detection using substructure identification
PDF
Algorithms for stochastic Galerkin projections: solvers, basis adaptation and multiscale modeling and reduction
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
Model, identification & analysis of complex stochastic systems: applications in stochastic partial differential equations and multiscale mechanics
PDF
Data-driven robotic sampling for marine ecosystem monitoring
Asset Metadata
Creator
Haddad Zadegan, Hamed
(author)
Core Title
Data worth analysis in geostatistics and spatial prediction
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Civil Engineering
Publication Date
06/27/2015
Defense Date
04/30/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
geostatistics,Kullback-Leibler divergence,OAI-PMH Harvest,sampling design,soil remediation,value of information
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ghanem, Roger G. (
committee chair
), De Barros, Felipe (
committee member
), Masri, Sami F. (
committee member
), Meshkati, Najmedin (
committee member
), Sahimi, Muhammad (
committee member
)
Creator Email
haddadza@usc.edu,hamed.haddadzadegan@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-280600
Unique identifier
UC11293333
Identifier
etd-HaddadZade-1711.pdf (filename),usctheses-c3-280600 (legacy record id)
Legacy Identifier
etd-HaddadZade-1711.pdf
Dmrecord
280600
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Haddad Zadegan, Hamed
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
geostatistics
Kullback-Leibler divergence
sampling design
soil remediation
value of information