Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
(USC Thesis Other)
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Efficient Stochastic Simulations of Hydrogeological Systems: from Model Complexity to Data Assimilation By: Mahsa Moslehi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY (Civil Engineering) Advisor: Prof. Felipe P.J. de Barros December, 2018 Abstract Hydrogeological models, that represent flow and transport processes in subsur- face domains, are subject to uncertainty due to the presence of spatially multi-scale heterogeneous hydrogeological properties, such as hydraulic conductivity, and the insufficient characterization of such domains. As an outcome, dependent variables, such as velocity field, contaminant concentration, and consequently hydrogeologi- cal predictions relevant to human health risks are prone to uncertainty. Therefore, stochastic numerical methods are utilized to capture the uncertainty of hydro- geological systems and model predictions. However, such methods are computa- tionally expensive and efficient approaches are required to relieve the computa- tional burden. In this study, various components that influence computational complexities of hydrogeological models are discussed. Then, by considering the interaction between these components, novel frameworks are developed to reduce the computational cost associated with stochastic numerical hydrogeological sys- tems. In particular, first, a method is proposed to optimally allocate compu- tational resources in numerical hydrogeological models under uncertainty. Next, the numerical complexity corresponding to hydrogeological models is alleviated tremendously by employing Wavelet transformations upscaling technique. Finally, 1 a unified framework, which combines the ensemble Kalman filter data assimila- tion technique with the Wavelet transformations upscaling method, is developed to simultaneously reduce the computational costs associated with stochastic hydroge- ological systems, and the uncertainty corresponding to hydrogeological model pre- dictions. Results indicate that using our proposed frameworks leads to enormous computational cost reduction, by several orders of magnitude, while maintaining the accuracy of model predictions. The outcome of this research provides reliable solutions that can aid decision makers to efficiently model complex hydrogeological systems and to achieve accurate predictions relevant to aquifer remediations and human health risks. 2 Dedicated to my mother, my father, and my sister. 3 Acknowledgments I am grateful to many significant individuals for their help and contribution through my PhD study. First and foremost, I would like to thank my advisor, Professor Felipe de Barros, for his inspiration, motivation, and support through- outmygraduatestudy. Hisguidancehelpedmelearnprofoundconceptsandtackle challenging problems. His breadth of knowledge and excitement for new discover- ies encouraged me to work on multidisciplinary research problems. I enjoyed all the meetings we had together since he always treated me very kindly and shared his intelligent insights about research challenges. He has been very caring and approachable to clarify doubts and willing to assist me overcome whatever hurdles I came across during my doctoral research. I simply could not have had a better and more concerned advisor. IwouldliketothankProfessorMuhammadSahimi, whohelpedmedevelopnew research ideas and conduct exceptional research in different fields of science. I am also grateful to Professor Kelly Sanders, Fatemeh Ebrahimi and Ram Rajagopal for their constructive inputs in my research. Moreover, I would like to acknowledge Chris Henri for providing a computational software that increased the pace of my progress in my PhD research. 4 My sincere thanks go to my friends and PhD colleagues for their support which kept me going in the past few years. Last but not least, I want to express my deepest love and gratitude to my dear parents, Monireh and Yousef, my sister, Maryam, and her husband, Hesam. I am where I am because of their unconditional love and support. 5 Contents Abstract 1 Dedication 3 Acknowledgments 4 List of Tables 9 List of Figures 11 1 Introduction 17 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2 Research Overview and Scope . . . . . . . . . . . . . . . . . . . . . 20 2 Uncertainty Quantification of Environmental Performance Met- rics in Heterogeneous Aquifers with Long Range Correlations 25 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.1 Physical Formulation . . . . . . . . . . . . . . . . . . . . . . 29 2.2.2 Random Space Function Model for the Hydraulic Conduc- tivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.3 Environmental Performance Metrics . . . . . . . . . . . . . . 32 2.3 Numerical Implementation . . . . . . . . . . . . . . . . . . . . . . . 33 2.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Stochastic Analysis of EPMs . . . . . . . . . . . . . . . . . . 35 2.4.2 ImpactoftheLongRangeCorrelationK StructureonEPMs’ Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3 Optimal Allocation of Computational Resources in Hydrogeolog- ical Models under Uncertainty 50 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6 3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Overall Error Estimation of the Model Response . . . . . . . 58 3.3.2 Description of the Optimization Process . . . . . . . . . . . 62 3.4 Evaluating the Performance of the Method . . . . . . . . . . . . . 63 3.5 Simplistic Illustration . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6 Heterogeneous Groundwater Flow and Transport Problem . . . . . 71 3.6.1 Physical Formulation, Model Setup and Implementation . . 72 3.6.2 Computational Results and Method Performance . . . . . . 74 3.6.3 Plume-Scale and Ergodicity . . . . . . . . . . . . . . . . . . 79 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4 Upscaling of Solute Transport in Disordered Porous Media by Wavelet Transformations 85 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 Wavelet Transformations . . . . . . . . . . . . . . . . . . . . 91 4.3.2 Upscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.3 The Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4 The Spatial Distribution of the Permeability and Numerical Simu- lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.6 Efficiency of the Calculation . . . . . . . . . . . . . . . . . . . . . . 108 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5 PerformanceEvaluationofEnKF-basedHydrogeologicalSiteChar- acterization using Color Coherence Vectors 111 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.2.1 Ensemble Kalman Filter (EnKF) . . . . . . . . . . . . . . . 115 5.2.2 Color Coherence Vector (CCV) . . . . . . . . . . . . . . . . 118 5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.4 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.4.1 Problem Setup and Data Assimilation . . . . . . . . . . . . 122 5.4.2 Performance Assessment . . . . . . . . . . . . . . . . . . . . 126 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6 Efficient Characterizations of Stochastic Hydrogeological Systems usingEnsembleKalmanFilterCoupledwithWaveletTransforma- tions Upscaling 132 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7 6.3 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7 Summary 144 Reference List 149 8 List of Tables 2.1 Geometrical and geostatistical parameters of the geological formation 34 2.2 Flow and transport parameters . . . . . . . . . . . . . . . . . . . . 35 2.3 Entropy and variance of the EPMs computed from the MC simula- tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Parameters for the ODE example . . . . . . . . . . . . . . . . . . . 67 3.2 Comparison between the actual and estimated error for the ODE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3 Parameters of Flow and Transport in Heterogeneous Field example 74 3.4 Comparison between the actual and the estimated error in hetero- geneous flow and transport problem. . . . . . . . . . . . . . . . . . 78 4.1 Summary of the parameters used in the simulations. R is the measure of the reduction in the number of grid blocks defined by R = (n f −n c )/n f × 100% and x CP is the location of the control plane where the concentration breakthrough curves are evaluated. . 98 5.1 Summary of the discretized color values with their corresponding number of coherent and incoherent pixels. . . . . . . . . . . . . . . 120 5.2 Summary of the parameters used in this study. . . . . . . . . . . . . 123 9 5.3 Comparison of various criteria used to evaluate the performance of parameter estimations. . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1 Summary of the model parameters used in this study. . . . . . . . . 140 6.2 PerformancecomparisonofourproposedEnKF-WTframeworkwith the regular EnKF methods in terms of computational costs and the accuracy of parameter estimations. . . . . . . . . . . . . . . . . . . 142 10 List of Figures 1.1 Impacts of hydraulic conductivity’s (K) spatial structures on the concentration (C) signal versus time (t) at an environmentally sen- sitive location. Planar views of heterogeneous and homogeneousK- fields are displayed in top-left and top-right, respectively. Results for heterogeneous and homogeneous K-fields are shown in bottom- left and bottom-right, respectively. . . . . . . . . . . . . . . . . . . 19 2.1 Empirical C p PDF and the lognormal PDF fitted to the C p data obtained from MC simulations for H=0.3. . . . . . . . . . . . . . . 37 2.2 Empirical C p PDF and the lognormal PDF fitted to the C p data obtained from MC simulations for H=0.3. . . . . . . . . . . . . . . 37 2.3 Empirical T p PDF and the lognormal PDF fitted to the T p data obtained from MC simulations for H=0.3. . . . . . . . . . . . . . . 38 2.4 Empirical T p PDF and the lognormal PDF fitted to the T p data obtained from MC simulations for H=0.3. . . . . . . . . . . . . . . 38 2.5 Comparison between the CDF of the T e , T p and T l data achieved from the MC simulations for H=0.3. . . . . . . . . . . . . . . . . . 40 2.6 Comparison between the CDF of the C e , C p and C l data achieved from the MC simulations for H=0.3. . . . . . . . . . . . . . . . . . 40 11 2.7 Information entropy of C p as a function of H, obtained from the MC simulations using Eq. (2.8) for the Empirical Entropy and Eq. (2.9) for the Lognormal Entropy. . . . . . . . . . . . . . . . . . . . 43 2.8 Mean value ofC p data obtained from MC simulations together with the 95% CI of the mean value. . . . . . . . . . . . . . . . . . . . . 44 2.9 Information entropy ofT p as a function ofH, obtained from the MC simulations using Eq. (2.8) for the Empirical Entropy and Eq. (2.9) for the Lognormal Entropy. . . . . . . . . . . . . . . . . . . . . . . 45 2.10 Mean value ofT p data obtained from MC simulations together with the 95% CI of the mean value. . . . . . . . . . . . . . . . . . . . . 45 2.11 Probability of C p being larger than a critical value defined by C c = αC 0 for different α = 0.001, 0.005, 0.01, 0.05, 0.1. . . . . . . . . . . . 47 3.1 Schematic process of a typical stochastic numerical hydrogeological simulation: (a) Different grid resolutions for the physical domain (here only spatial discretization); (b) For a specific physical reso- lution Δ i , the MC framework is employed by sampling N mc real- izations from the pdf of the random field parameter Θ; (c) The hydrogeological model is solved numerically for each realization and the results are integrated to achieve the statistical characterization of the output corresponding to a specific Δ and N mc value. . . . . . 57 12 3.2 (a) The total error (Eq. (3.14)) versus the number of MC realiza- tions (statistical resolutions) for different spatial discretizations Δ. (b)Evolutionofthetotalerror(Eq. (3.14))versusspatialdiscretiza- tion for given numbers of MC realizations. (c) MC error (first term of the RHS of the Eq. (3.14)) versus the number of MC realizations for different spatial resolutions (d) Discretization error (second term of the RHS of the Eq. (3.14)) versus the spatial discretization for different number of MC realizations. . . . . . . . . . . . . . . . . . 68 3.3 Error and computational time for different combinations ofN mc and Δ. The optimal points were computed by solving Eq. (3.4) subject tothefollowingcomputationalbudget(time): T tot = [5, 10, 50, 100, 250] s. It can be observed that the empirical optimal points found by brute-force simulations and the optimal points achieved by solving Eq. (3.4) are almost identical. . . . . . . . . . . . . . . . . . . . . 69 3.4 (a) The total error (Eq. (3.14)) versus the number of MC realiza- tions (statistical resolutions) for different spatial discretizations Δ. (b)Evolutionofthetotalerror(Eq. (3.14))versusspatialdiscretiza- tion for given numbers of MC realizations. (c) MC error (first term of the RHS of the Eq. (3.14)) versus the number of MC realizations for different spatial resolutions (d) Discretization error (second term of the RHS of the Eq. (3.14)) versus the spatial discretization for different number of MC realizations. . . . . . . . . . . . . . . . . . 75 13 3.5 Error and computational time for different combinations ofN mc and Δ. The optimal points were computed by solving Eq. (3.4) subject to following computational budget (time): T tot = [10 5 , 2× 10 5 , 3× 10 5 ]s. Itcanbeobservedthattheempiricaloptimalpointsfoundby brute-force simulations and the optimal points achieved by solving Eq. (3.4) are almost identical. . . . . . . . . . . . . . . . . . . . . 76 3.6 Optimalpointsofthetwoscenarioswithdifferentsourcezonedimen- sions are depicted subject to the 3 different time constraints: T tot = [10 5 , 2×10 5 , 3×10 5 ]s. Blue square symbols corresponds toL 2 = 9λ m, while red circle symbols are for L 2 = 2.5λ m. Solid lines corre- spond to the computational cost. . . . . . . . . . . . . . . . . . . . 80 4.1 Comparisonof(a)theconductivityfieldinthehigh-resolutionmodel of size L = 512 m with the Hurst exponent H = 0.2 and the order of magnitude variations of the K values, S = 3, and (b) the corre- sponding upscaled field. . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Solute plume snapshots for the high-resolution and coarse-scale con- ductivity fields. The concentration C(x,t) was computed in both high-resolution and coarse-scale conductivity fields for the parame- ters,L = 512m,H = 0.2andS = 3, andfor(a)high-resolutionfield at PVI = 0.2; (b) coarsened field at PVI = 0.2; (c) high-resolution field at PVI = 0.8, and (d) coarsened field at PVI = 0.8. . . . . . . 101 4.3 The results for the first scenario (see Table 1): Concentration break- through curves for the high-resolution fields withS = 3 and dimen- sion L = 512 m and the corresponding coarsened fields. Results shown for (a) H = 0.2; (b) H = 0.4; (c) H = 0.6, and (d) H = 0.8. 102 14 4.4 Goodness-of-fit analysis between the concentration profiles obtained from fine-scale and upscaled fields for the case withL = 512m,H = 0.8 and S = 3. The straight line is represented by y = 0.99x + 1.7 with R 2 = 0.9997. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5 The percent reduction, R, of the number of grid blocks in the upscaled fields and that of the grid representing the corresponding high-resolution fields, and its dependence on the Hurst coefficient H in the first scenario (see Table 1). . . . . . . . . . . . . . . . . . 104 4.6 Results obtained for the second scenario (see Table 1). Comparison between the concentration breakthrough curves for coarsened and fine scale conductivity fields for H = 0.7 and grid size L = 512 m. Results reported for (a) S = 3; (b) S = 5, and (c) S = 7. . . . . . . 105 4.7 Results for the third scenario (see Table 1): The breakthrough curves for the high-resolution conductivity fields with H = 0.7 and S = 3 and the corresponding coarsened fields with (a) L = 256 m; (b) L = 512 m and (c) L = 1024 m. . . . . . . . . . . . . . . . . . . 106 4.8 Comparison of the second central spatial momentS xx (t) of the con- centration distribution in the high-resolution and upscaled conduc- tivity fields. The results are for (a) H = 0.2 and S = 3, and (b) H = 0.7 and S = 7 (see Table 4.1). The size of the numerical grid is L = 512 and the porosity is equal to 0.54. . . . . . . . . . . . . . 108 5.1 (a) Reference K-field together with two K-fields, (b) and (c), with similar RMSE (approximately equal to 0.55) with respect to the reference K-field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 15 5.2 A simplistic example of CCV calculation steps: (a) color intensities of a grayscale image after blurring, (b) discretized colorspace with three distinct colors, (c) similarly-color connected components and (d) labeled pixels as either coherent or incoherent. . . . . . . . . . . 119 5.3 Measurement locations for K and h in (a) Scenario I and (b) Sce- nario II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4 (a) Reference Y-field, (b) estimated Y-field with Scenario I and (c) estimated Y-field with Scenario II. . . . . . . . . . . . . . . . . . . 126 5.5 The concentration BTCs obtained from the reference K-field and the estimated K-fields in Scenarios I and II. . . . . . . . . . . . . . 128 6.1 Schematicrepresentationofthecomputationaldomainandthemea- surement locations for both K and h. . . . . . . . . . . . . . . . . 139 6.2 EstimatedensemblemeanofconductivityfieldsobtainedfromEnKF- WT(c)andfromEnKF(b)togetherwiththereferenceconductivity field (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 16 Chapter 1 Introduction 1.1 Motivation Hydrogeological models that represent flow and transport in subsurface forma- tionsareusuallylarge-scalewithexcessivecomputationalcomplexityanduncertain characteristics. To better manage contaminated aquifers, computational models are required to simulate subsurface domains and predict multiple quantities of interests. Such quantities of interest or model predictions are regarded as environ- mental performance metrics (EPMs). Examples of EPMs consist of the estimates of adverse human health effects (i.e. increased lifetime cancer risk), the maximum concentration of a contaminant at an environmentally sensitive target, and solute travel times from source to receptor [1]. Each EPM provides important informa- tion for groundwater management since decision makers need to meet multiple criteria to keep subsurface resources sustainable and protect human health. There are multiple sources of uncertainty that affect hydrogeological model pre- dictions. A major source of uncertainty stems from the incomplete characteriza- tion of the heterogeneous geological formations. Hydrogeological properties, such as hydraulic conductivity K, are spatially heterogeneous across multiple length scales. This variability has a significant influence on flow and transport behavior as it can be observed in Fig. 1.1 (e.g. [2, 3, 4]). In order to achieve reliable and accurate predictions which are relevant for contaminated groundwater-driven 17 risk analysis and remediation strategies, the subsurface environment needs to be characterized in details. However, due to the high costs of data acquisition, mea- surements of the hydrogeological properties are scarce and therefore a detailed site characterization is not practical [5]. As an outcome of insufficient information, model predictions regarding flow and solute transport are subject to uncertainty. Therefore,efficientmodelingtechniquesshouldbeutilizedtocharacterizemodel parameters (e.g. K) using the available sparse measurements. To tackle this chal- lenge, inverse modeling and data assimilation techniques are customarily employed to infer hydrogeological properties using observed data [6, 7, 8]. Thus, stochas- tic methods are required to quantify the uncertainty of EPMs in terms of their moments and probability density functions (PDFs) or the corresponding cumula- tive distribution function (CDFs). In particular, within the context of probabilistic riskanalysis, computingthePDFofatargetquantityispreferentialsinceitenables the decision makers to quantify the likelihood of extreme events. Furthermore, due to the presence of the spatio-temporal variability of input parameters, nonlinearities and complex boundary conditions, numerical methods are required to predict a model response (i.e. EPM). Such conditions are normally encountered in different hydrogeological applications such as flow and solute trans- port [9, 5], contaminant site management [10, 11], human health risk assessment [12, 13] and the response of ecosystems [14]. Thus, in many situations, fully ana- lytical treatments of flow and transport cannot be utilized and therefore numerical methods are employed. Numerical simulation of flow and transport in large-scale porous media requires a computational grid with high enough resolution to rep- resent the variability of the hydrogeological properties [15, 16]. Simulation with such high-resolution computational grids entails solving several million discretized 18 t C t C Figure 1.1: Impacts of hydraulic conductivity’s (K) spatial structures on the con- centration (C) signal versus time (t) at an environmentally sensitive location. Pla- nar views of heterogeneous and homogeneousK-fields are displayed in top-left and top-right, respectively. Results for heterogeneous and homogeneous K-fields are shown in bottom-left and bottom-right, respectively. equations over thousands of time steps, leading to a very high computational bur- den. As discussed in the aforementioned paragraphs, the combined effect of data scarcity and spatial variability of the hydraulic properties of the subsurface leads to several challenges. First, the heterogeneous hydrogeological system needs to be approximately discretized in order to accurately capture spatial and temporal fluctuations of the model response. In addition to spatio-temporal discretization in numerical schemes, a statistical resolution should be considered in order to quantify the uncertainty in model predictions. For this purpose, the Monte Carlo (MC) framework is commonly utilized. The MC framework is straightforward to 19 implement and can provide an estimation of the model response, but its accuracy highly depends on the considered statistical resolution [17]. In the MC framework, the hydrogeological model needs to be simulated repeatedly to provide statistically meaningful results. In summary, numerical simulation of flow and transport processes at the field scale might become prohibitively computationally expensive, particularly when such highly-resolved models are subject to uncertainty and must be cast within a Monte Carlo sampling [5, 18]. As a result, a key aspect of the hydrogeological simulation is how to distribute the limited computational resources in an effi- cient manner in order to reduce the simulation cost [19, 20, 21]. Furthermore, novel and efficient frameworks need to be developed to simultaneously reduce the physical and stochastic complexities of hydrogeological systems and decrease the computational costs associated with such systems, while preserving the accuracy of prediction models. 1.2 Research Overview and Scope In this section, an overview of the research tasks, described in this dissertation, is provided. The objective of these research tasks is to develop methodologies that aim to address the goals stated in Section 1.1. Our first step consists of provid- ing a motivational example to illustrate the complexity and tediousness associated with stochastic hydrogeological simulations. We investigate how the structure of the heterogeneous hydraulic conductivity (K) impacts the predication of environ- mental performance metrics (EPMs). In this numerical analysis, the EPMs are quantities that are of relevance to risk assessment and remediation, such as peak flux averaged concentration, early and late arrival times among others. By using 20 stochastic simulations, we quantify the uncertainty associated with the EPMs for a given heterogeneous spatial structure of the K-field, that displays long-range cor- relation, and identify the probability distribution function (PDF) model that best captures the statistics of the EPMs. Then, through the use of information theory, we reveal how the structural parameter, characterizing persistent/anti-persistent correlation structures of theK-field influences the EPMs and corresponding uncer- tainties. Next,weshowtheinteractionofvariouselementsinvolvedinastochastichydro- geological modeling and propose a framework to optimally allocate computational costs in such models. Uncertainty quantification for predicting subsurface flow and transport often entails utilizing a numerical Monte Carlo framework, which repeatedly simulates the model according to a random field parameter representing hydrogeological characteristics of the aquifer. The physical resolution (e.g. spatial grid resolution) for the simulation is customarily chosen based on recommenda- tions in the literature, independent of the number of Monte Carlo realizations. This practice may lead to either excessive computational burden or inaccurate solutions. We develop an optimization-based methodology that considers the trade-off between the following conflicting objectives: time associated with computational costs, statistical convergence of the model prediction and physical errors corre- sponding to numerical grid resolution. Computational resources are allocated by consideringtheoverallerrorbasedonajointstatistical-numericalanalysisandopti- mizing the error model subject to a given computational constraint. The derived expression for the overall error explicitly takes into account the joint dependence 21 between the discretization error of the physical space and the statistical error asso- ciated with Monte Carlo realizations. The performance of the framework is tested against computationally extensive simulations of flow and transport in spatially heterogeneous aquifers. Results show that modelers can achieve optimum physical andstatisticalresolutionswhilekeepingaminimumerrorforagivencomputational time. The physical and statistical resolutions obtained through our analysis yields lower computational costs when compared to the results obtained with prevalent recommendations in the literature. Then, we highlight the significance of the geo- metrical characteristics of the contaminant source zone on the optimum physical and statistical resolutions. Thereafter, we reduce the substantial computational burden associated with hydrogeological modeling by decreasing the physical complexity of the numeri- cal model while preserving the accuracy of the model predictions. A common approach to alleviate the problem is to utilize an upscaling method that generates models that require less intensive computations. The method must also preserve the important properties of the spatial distribution of the K-field. We use an upscaling method based on the wavelet transformations (WTs) that coarsens the computationalgridbasedonthespatialdistributionofK. Thetechniqueisapplied to a porous formation with broadly distributed and correlated K values, and the governing equation for solute transport in the formation is solved numerically. The WT upscaling preserves the resolution of the initial highly-resolved computational grid in the highK zones, as well as that of the zones with sharp contrasts between the neighboring K, whereas the low-K zones are averaged out. To demonstrate the accuracy of the proposed method, we simulate fluid flow and nonreactive solute transport in both the high-resolution and upscaled grids, 22 and compare the concentration profiles and the breakthrough times. The results indicate that the WT upscaling of a K field generates non-uniform upscaled grids with a number of grid blocks that on average is about two percent of the number of the blocks in the original high-resolution computational grids, while the concentra- tion profiles, the breakthrough times and the second moment of the concentration distribution, computed for both models, are virtually identical. A systematic para- metric study is also carried out in order to investigate the sensitivity of the method to the broadness of theK field, the nature of the correlations in the field (positive versus negative), and the size of the computational grid. As the broadness of theK field and the size of the computational domain increase, better agreement between the results for the high-resolution and upscaled models is obtained. Finally, we use a data assimilation framework to reduce the uncertainty asso- ciated with the EPMs. Among the available Monte Carlo based data assimilation methods, Ensemble Kalman Filter (EnKF) is an efficient technique that integrates all the available observed data to minimize the uncertainty of aquifer characteri- zations and consequently model predictions. However, EnKF is computationally demanding since it repeatedly updates the model parameters and variables until reaching a desired accuracy. Therefore, we propose a framework to couple the EnKF technique with the aforementioned WT upscaling to reduce simultaneously the uncertainty of model predictions, the complexity of the numerical simulations and the computational cost corresponding to data assimilation in stochastic hydro- geological modeling. Furthermore, to quantify the accuracy of hydrogeological parameter estimations resulting from such data assimilation frameworks, we pro- pose a vision-based approach that considers the spatial structure embedded in the estimated fields. Our new approach consists of adapting a new metric, based 23 on Color Coherence Vectors (CCV), to evaluate the accuracy of estimated fields achieved by data assimilation methods. The rest of the thesis is organized as follows. In Chapter 2, the computational complexity of a stochastic hydrogeology system is shown through a numerical anal- ysis in the context of risk assessment. Then, an algorithm to optimally allocate computational resources in such systems is introduced in Chapter 3. Next, com- putational costs are minimized by upscaling hydrogeological systems using WTs in Chapter 4. Thereafter, Chapter 5 discusses the commonly-used data assim- ilation method and introduces our new CCV-based metric, which evaluates the performance of hydrogeological site characterizations. Finally, Chapter 6 proposes an efficient framework to couple WTs upscaling and EnKF to decrease the com- putational cost, uncertainty and numerical complexity in hydrogeological models. Concluding remarks are summarized in Chapter 7. 24 Chapter 2 Uncertainty Quantification of Environmental Performance Metrics in Heterogeneous Aquifers with Long Range Correlations 1 Through this chapter, the cumbersomeness of the probabilistic modeling of hydrogeological systems in the context of risk assessment is displayed. We quan- tify the uncertainty associated with environmental performance metrics (EPMs), relevant to risk analysis, in the presence of long range correlations in hydraulic conductivity (K) structures in the geological formations. It will be observed how many simulations have been carried out for capturing the statistics of the EPMs and performing a typical risk analysis. Furthermore, we show the importance of aquifer connectivity in the uncertainty of EPMs. 1 The material presented in this chapter is published in Journal of Contaminant Hydrology (Moslehi and de Barros, 2016) 25 2.1 Introduction In order to manage a contaminated aquifer, multiple EPMs such as the maxi- mum concentration of a contaminant at an environmentally sensitive target, solute travel times from source to receptor and the estimates of adverse human are required [1]. For example, as summarized in [22, 23, 24] decision makers are interested in addressing questions such as: (i) Should a drinking water well be shut down before contamination arrives?; (ii) What is the dilution potential of the geological formation?; (iii) What is the risk stemming from a contaminated site?; and (iv) How likely will human be exposed to contamination and if so, how long? Answering these questions requires quantifying multiple variables of interest and selecting the proper EPM. If protecting public health is the priority of the analysis, quantifying the peak concentration of the contaminant at an environ- mentally sensitive target can be one of the appropriate EPMs, whereas in other water management-related decisions, travel time of the contaminant might be the felicitous candidate [25, 22, 23]. Moreover, the choice of the EPM affects decisions related to the allocation of resources toward site characterization sampling cam- paigns [26]. Hence, it is essential to understand how the uncertainty arising from the subsurface variability propagates to different EPMs. The structural pattern of the variability of hydrogeological properties (e.g. K) impacts the uncertainty associated with EPMs and controls the likelihood of extreme events such as the probability that concentration of a pollutant exceeds a regulatory threshold value [27]. Many researchers investigated the effect of the spatial heterogeneity of the K-field on the mean and the variance of the resident concentration of the solute plume [28, 29, 30], the temporal evolution of the con- centration’s coefficient of variation [31, 32, 33, 34], the statistics of solute’s travel 26 times [25], the signature of contaminant breakthrough curves [35, 36, 37] and the PDF/CDF of the contaminant concentration [38, 39, 34]. Up to date, most works focused on quantifying the uncertainty of transport variables (and therefore associated EPMs) for geological sites characterized by short range spatial correlation models forK such as the exponential and Gaussian variograms. However, for some sites, hydrogeological data provide some evidence that certain fields display long range correlations in K [18, 40, 36, 41, 42, 43]. Sahimi and Yortsos provided a review on the use fractal geometry to character- ize such subsurface formations [44]. Dagan developed an analytical expression for the macrodispersion tensor in porous media characterized by heterogeneity of evolving scale [45]. Bellin and Fiori used a Bayesian approach to investigate the impact of conditioning transport predictions on hydraulic transmissivity data in the evolving-scale formations [46]. By using first-order perturbation theory, Bellin et al. [47] and Fiori et al. [48] provided analytical solutions for moments of the longitudinal effective dispersion coefficient in formations with long correlation in conductivity. In the work of Di Federico and Neuman, the macrodispersion tensor in the geological fields characterized by heterogeneity of evolving scale was quan- tified [41]. Di Federico and Zhang investigated solute transport analytically in conductivity formations with long range correlations [49] . Riva and Guadagnini developed the moments of hydraulic head in the formations with evolving-scale heterogeneity [50]. Chen and Hsu [51] and Hsu and Chen [52] studied flow and transport statistics in two dimensional fractal formations and further extended their work to three dimensional multi-scale porous media. Riva et al. proposed a geostatistical model that captures main aspects of non-Gaussian statistical scaling 27 behavior of the random variable and analyzed the impact of the model on flow and transport processes [53]. In the current study, the interplay between the uncertainty of EPMs and the structure of the geological formations, specifically in porous media with long range correlation in K, is analyzed. We investigate how the correlation of K affects the uncertainty of EPMs within the context of risk assessment. The main objectives of this work are to: (1) quantify the uncertainty of EPMs for a given long range spatial structure of K and identify the PDF model that best captures the uncer- tainty of the EPMs; and (2) show how the structural (geostatistical) parameter of the subsurface, that controls the long range correlations of K, affects the overall uncertainty of the selected EPMs. In particular, the influence of the structural parameter, characterizing the persistent and anti-persistent correlation behavior of the K fields, on the decision making variables relevant for risk analysis is of interest of this study. The chapter is organized as follows. First, the theoretical background regarding flow and transport processes, geological formations and the EPMs are described in Section 2.2. Then, the numerical implementation is elaborated in Section 2.3. Next, the results obtained from the stochastic analysis of the EPMs and the impact of heterogeneous structure on the EPMs’ uncertainty are discussed in Section 2.4. Finally, the concluding remarks of the current study are summarized in Section 2.5. 28 2.2 Theoretical Background 2.2.1 Physical Formulation A fully saturated steady state incompressible flow in the absence of sinks and sources is considered through a two-dimensional (2D) heterogeneous geological formation. The hydraulic conductivity tensor K(x) is spatially heterogeneous in the Cartesian coordinate system, x = (x 1 ,x 2 ) and considered to be isotropic, i.e. K ij =Kδ ij , where δ ij is the Kronecker delta. As a consequence of the spatial variability inK(x), the Darcy-scale velocity v(x) is heterogeneous and determined by Darcy’s law, v(x) = K(x) n e ∇ϕ(x), (2.1) where n e is the effective porosity of the porous medium which is assumed to be uniform andϕ(x) is the hydraulic head. Since flow is considered to be steady state, the hydraulic head is evaluated solving the following governing equation, ∇· [K(x)∇ϕ(x)] = 0. (2.2) In this work, no-flow boundary conditions in the transverse boundaries and a con- stant hydraulic head gradient (G) along the longitudinal direction are imposed such that flow is uniform-in-mean alongx 1 . As an outcome, the mean flow is char- acterized by a constant mean velocity E [v (x)] = (U 1 , 0) where E is the expected value operator. 29 An inert solute, with n p number of particles, is released instantaneously at time t o along a line source of transverse dimension (` 2 ) located at a longitudinal location x 1 = x s and transported through the fluctuating velocity field. The spatio-temporal evolution of the concentration field is assumed to be governed by the advection-dispersion equation, ∂C(x,t) ∂t + v(x)·∇C(x,t) =∇ [D d ·∇C(x,t)], (2.3) whereC(x,t) is the solute resident concentration in location x and timet and D d is the local-scale dispersion tensor. The local-scale dispersion tensor depends on the longitudinal and transverse dispersivities, namely a L and a T and is computed in the present study according to Eq. (2) in Salamon et al. [54]. Details pertaining the numerical implementation of Eqs. (3.17) and (5.10) are provided in Section 2.3. 2.2.2 Random Space Function Model for the Hydraulic Conductivity Due to the lack of detailed site characterization, the K-field is uncertain and Eqs. (3.17) and (5.10) need to be solved within a stochastic framework. As a consequence, quantities such as v(x) and C(x,t) are regarded as random func- tions. In order to solve Eqs. (3.17) and (5.10), spatially heterogeneous hydraulic conductivity fields require to be randomly generated. For this chapter, our goal is to investigate how K-fields displaying a long range correlation affects transport 30 observables and their corresponding uncertainties. More precisely, we aim to quan- tify the impact of persistence/anti-persistence correlation structure of K on the statistical characterization of the EPMs. Therefore, we opt to generate theK-field with a fractional Brownian motion (fBm) algorithm (e.g [55, 56, 57]). The fBm field corresponds to a stochastic process that induces long range correlations and broad distributions in the hydraulic conductivity values. Selecting the fBm field is not a limitation and other correlation models that generate long range correlations can be adopted. Some field data acquired in oil reservoirs and aquifers display evidence of the fBm-type correlations in the horizontal direction parallel to the geological bedding [18, 40, 36, 42, 43]. The fBm correlation function used to generate the random log-conductivity, i.e. ln [K(x)], field can be written as follows, C(r)−C(0) ∼ r 2H , (2.4) where r is the lag-distance and H is the Hurst coefficient. For 0 < H < 0.5, the field’s hydraulic property is negatively correlated (anti-persistent) while for 0.5 < H < 1, we have positive correlation in the K-field (persistent). Negative correlationindicatesthatagridblockwithahigh(orlow)conductivityislesslikely tobeaneighbortoanothergridblockwithhigh(orlow)conductivitywhichresults in a field with less connected paths and more heterogeneity. The opposite occurs for a positive correlation which produces fields with more connected channels that can lead to a preferential flow in the subsurface domain. The fBm correlation function, Eq. (2.4), indicates thatC(r) increases as the lag r grows since no cutoff scale for the correlations was introduced in the fBm and the correlation length is as large as the domain’s size. 31 2.2.3 Environmental Performance Metrics The solution of Eqs. (3.17) and (5.10) allows us to estimate EPMs that are rel- evant for risk analysis and aquifer remediation. In most studies related to ground- water contamination, the EPMs of interest are the resident concentration and travel times ([1] and references therein). However, in most applications, the con- centration measured at the field is the flux-averaged concentration (e.g. [58]) and therefore it is mostly used in risk analysis (e.g. [59, 27]). Hence, the first EPM of interest and analyzed in this chapter is the peak flux-averaged concentrationC p at a given control plane located perpendicular to the mean flow direction atx 1 =x cp , C p (x cp ) = max t>to " ˙ m (t) Q w # , (2.5) where ˙ m (t) is the mass discharge rate through the control plane and Q w is the volumetric discharge (per unit depth) at the position of the control plane Q w = A n e [v(x)· n]dx 2 , (2.6) withA denoting the control plane and n is the normal vector of the control plane. Additionally, it is important to quantify specific features of the concentration breakthrough curve such as the arrival time of the peak concentration as well as early and late arrivals. Therefore, the second EPM of interest is the time of occurrence of the peak flux averaged concentrationC p (x cp ), namelyT p . Moreover, we quantify the early and late arrivals of the plume,T e ≡t 5 (where 5% of mass has reached the control plane) and T l ≡ t 95 (where 95% of the mass has reached the 32 control plane). In general, early arrivals are important for risk analysis, whereas late arrivals are of relevance to remediation. In the case of chemical mixtures, both early and late arrivals can be of critical importance for health risk as shown in Henri et al. [27]. Finally, the solute plume’s concentration associated with T e andT l at the control plane, namely C e andC l respectively, are also quantified. In summary, the EPMs of interest are summarized in the following vector: = [C p ,C e ,C l ,T p ,T e ,T l ]. (2.7) The quantities in are random variables and need to be characterized by their PDFs. The uncertainty of all components of the vector in the formations char- acterized by long range correlated heterogeneity will be quantified. Furthermore, we will compute how the uncertainty in is affected by the presence of persistent and anti-persistent correlation of the hydraulic conductivity manifested through the Hurst coefficient (see Eq. 2.4). 2.3 Numerical Implementation In order to assess the uncertainty associated with EPMs, numerical simulations have been carried out within a Monte Carlo (MC) framework. To cast the flow and transport equations within a MC simulation, multiple realizations (N mc ) of the heterogeneous fields withS orders of magnitude difference between the conductiv- ity values and a Hurst coefficient H are randomly generated using the description provided in Section 2.2. For each H value, N mc realizations of the 2D K-field of dimensions (L 1 = L 2 ≡ L) with uniform grid blocks (i.e. Δ 1 = Δ 2 ≡ Δ) are 33 generated where subscripts 1 and 2 correspond to the x 1 and x 2 directions in the Cartesian coordinate, respectively. As shown in Moslehi et al., the choice of N mc and Δ affects the computational costs and the accuracy of the model predictions (e.g. EPMs) [19]. Grid block dimensions were selected based on the numerical analysis of Moslehi et al. [60]. Furthermore, N mc was large enough to achieve statistical convergence. The generated ensemble will allow us to investigate the impact of the structural parameter H on the statistics of the EPMs. Geometrical and geostatistical parameters used in this study are summarized in Table 2.1. Table 2.1: Geometrical and geostatistical parameters of the geological formation Parameters Values Units L 256 m Δ 1 m N mc 1000 - S 2 - H 0.1, 0.2,..., 0.9 - Eqs. (3.17) and (5.10) are solved for each equiprobable realization of the random hydraulic conductivity field. The flow equation is solved using MOD- FLOW which is a finite-difference groundwater model [61]. Solute transport is modeled through a random walk particle tracking code (RW3D) documented in [62, 54, 63, 64, 65]. Flow and transport parameters associated with the numerical examples are summarized in Table 2.2. The output of the transport process is the flux averaged concentration break- through curve (BTC) at the control plane which is located at position x cp . From the BTC, we extract the concentration of the plume and corresponding arrival 34 Table 2.2: Flow and transport parameters Parameters Values Units n e 0.3 - G 0.07 - ` 2 50 m a L 0.1 m 2 /d a T 0.02 m 2 /d x cp 200 m x s 10 m C 0 1 kg/m 3 n p 30000 - times. The EPMs of interest and their statistics are achieved by post-processing the outputs of the solute transport simulations. 2.4 Results and Discussion Our computational analysis is divided into two parts. First, we aim to identify the PDF model that best represents the parameters in (see Eq. 2.7). The second part illustrates how the persistent and anti-persistent structure of K, epitomized by H (see Eq. 2.4), influences the uncertainty of the EPMs present in. 2.4.1 Stochastic Analysis of EPMs For the purpose of illustration, flow and transport processes are simulated through the K-fields characterized by H = 0.3. The choice of H = 0.3 is based on the previous study conducted by Rasaei and Sahimi [66]. In the subsequent section, the impact of persistent/anti-persistent correlation of conductivities on the EPM’s uncertainty is discussed in details. 35 The empirical PDF of the C p is shown in Fig. 2.1 as well as the parametric PDF model that best fits the empirical results. Several well established parametric PDFs were fitted to the available C p data obtained from N mc realizations and the best parametric PDF is depicted in Fig. 2.1. The most appropriate PDF is selectedbasedontheBayesianinformationcriterionwhichisamaximumlikelihood estimatedrivenmethodthatattemptstocombatoverfitting([67]). Wetestedother methods, e.g. negative of the log likelihood and Akaike information criterion, and similarresultswereobtained. AsitcanbeobservedfromFig. 2.1, forthenumerical setup investigated in this chapter, the best parametric PDF fitted to the C p data is the lognormal distribution with log-scale parameters μ =−6.01 and σ = 1.48. This result corroborates with the numerical simulations of Henri et al. ([27]), where the PDF of the increased lifetime cancer risk, which is proportional to the flux averaged concentration, was inferred to be lognormal. Other probabilistic distribution models, such as the Beta PDF, are documented in the literature but are valid for the resident concentration ([48, 68]). In order to further compare the empirical probability distribution of C p with the fitted lognormal probability distribution, the Quantile-Quantile (Q-Q) plot of the raw data versus the lognormal distribution with the obtained log-scale param- eters (i.e. μ =−6.01 andσ = 1.48) are depicted in Fig. 2.2. Q-Q plots graphically comparetwoprobabilitydistributionsbyplottingtheirquantilesagainsteachother [69]. As observed in Fig. 2.2, the probability distribution ofC p is in a good agree- ment with the lognormal probability distribution, which confirms the properness of lognormal fitting for the scenario investigated. The remarkable outcome of this result is that with a relatively small number of MC realizations and consequently low computational cost, one can estimate the log-scale parameters of the lognormal 36 distribution (μ, σ) and accordingly provide an estimate of the uncertainty of C p . The parametric distribution allows us to evaluate the statistics of the maximum concentration of the solute plume and determine the probability that concentra- tion will exceed a regulatory established value, e.g. maximum contaminant level (see [70]). 0 0.005 0.01 0.015 0.02 0.025 C p 0 100 200 300 400 PDF empirical lognormal Figure 2.1: Empirical C p PDF and the lognormal PDF fitted to the C p data obtained from MC simulations for H=0.3. 0 0.005 0.01 0.015 0.02 0.025 Quantiles of lognormal distribution 0 0.005 0.01 0.015 0.02 0.025 Quantiles of C p Figure 2.2: Empirical C p PDF and the lognormal PDF fitted to the C p data obtained from MC simulations for H=0.3. Similar analysis has been performed for T p and the results revealed that for the numerical setup analyzed in this work, the best parametric PDF fitted to the T p raw data, obtained from the MC simulations, is also lognormal with log-scale 37 parameters μ = 6.70 and σ = 1.43. The empirical and the lognormal PDFs for T p are depicted in Fig. 2.3. The Q-Q plot ofT p and the fitted lognormal distribution, with log-scale parameters μ = 6.70 and σ = 1.43, is shown in Fig. 2.4 to further emphasize the lognormal behavior although a small departure from the lognormal behavior is observed for large T p values. Similarly, as in the case for the peak concentration, by simulating a small number of MC realizations and evaluating the μ and σ of T p , an approximation of the statistics of T p can be estimated at a given control plane. 0 4000 8000 12000 T p 0 0.2 0.4 0.6 0.8 1 PDF ×10 -3 empirical lognormal Figure 2.3: Empirical T p PDF and the lognormal PDF fitted to the T p data obtained from MC simulations for H=0.3. 0 2000 4000 6000 8000 Quantiles of lognormal distribution 0 2000 4000 6000 8000 Quantiles of T p Figure 2.4: Empirical T p PDF and the lognormal PDF fitted to the T p data obtained from MC simulations for H=0.3. 38 We have performed similar analyses on theT e andT l numerical data. Likewise, results reveal that the lognormal PDF best captures the statistics of T e and T l . Furthermore, in order to have a better comparison between the uncertainty asso- ciated with the early, peak and late arrivals, the CDFs of the T e , T p and T l are plotted in Fig. 2.5. Alternatively, by calculating the information entropy of the T e , T p and T l data, it is noticed that the maximum uncertainty is associated with the late arrival time of the contaminant plume. The information entropy, which is a measure of uncertainty, of the EPMs is calculated using the following equation [71]: E(θ i ) = E [−ln(P (θ i ))], (2.8) where E[·] is the expected value operator andE(θ i ) and P (θ i ) are the information entropy and probability mass function of the EPM θ i , respectively. Recall that θ i is given in Eq. (2.7). The entropy value (E) and the arithmetic variance (Var) corresponding to each EPM are reported in Table 2.3. The results in Table 2.3 indicate that among arrival times, maximum variance or entropy corresponds to the late arrival time (T l ). Due to the presence of low conductivity zones close to the high conductivity zones in theK-fields withH = 0.3, portions of the solute can be trapped in the low conductivity zones which impacts the late arrival time of the plume. Therefore, T l can vary substantially from realization to realization such that results in a larger uncertainty. The opposite occurs to T e which is associated with fast flow paths that the solute plume will eventually encounter. 39 10 1 10 2 10 3 10 4 10 5 T 0 0.2 0.4 0.6 0.8 1 CDF(T) T e T p T l Figure 2.5: Comparison between the CDF of theT e ,T p andT l data achieved from the MC simulations for H=0.3. 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 C 0 0.2 0.4 0.6 0.8 1 CDF(C) C e C p C l Figure 2.6: Comparison between the CDF of theC e ,C p andC l data achieved from the MC simulations for H=0.3. Likewise, the CDFs of theC e ,C p andC l obtained from the MC simulations are plotted in Fig. 2.6. The results indicate thatC e andC l follow a lognormal PDF for the scenario investigated in this study. For the purpose of quantitative comparison between the uncertainty of the early, peak and late solute concentrations, the entropy and arithmetic variance are reported in Table 2.3. Computing the variance and entropy associated with C e , C p and C l reveal that the maximum uncertainty corresponds to the peak concentration at the control plane. Moreover, it can be observed in Table 2.3 that the uncertainty corresponding to C e is larger than C l . 40 Table 2.3: Entropy and variance of the EPMs computed from the MC simulations. EPMs E Var T e 8.2036 2.0740× 10 7 T P 8.4891 4.3682× 10 7 T l 9.1711 1.5565× 10 8 C e -4.8901 6.4013× 10 −5 C P -4.1740 3.2542× 10 −4 C l -6.0265 8.2147× 10 −6 The reason can be attributed to the fact that the small variability in the ensemble is linked to the low values of C l . These late time concentrations are low since the plume is more diluted at that stage (i.e. tail of the concentration breakthrough curve). Consequently, the sample-to-sample fluctuations of C l are small, whereas C e is driven by advection in fast flow channels and the values are larger and more variable within the MC realizations. The results shown in this section have direct implications on health risk assess- ment. Forinstance, theUSEnvironmentalProtectionAgency(EPA)stipulatesthe concentration limits of contaminant disposal from different sources (e.g. industrial facilities, nuclear plants, etc). These regulations customarily require simulating flow and transport and consequently the contaminant’s concentration at various locations and times. Furthermore, EPA regulations have been requiring the need to include uncertainty bounds in predictions [72]. Hence, in order to accurately predict the level of contaminant’s concentration, it is of considerable practical importance to evaluate the full statistics of the EPMs. 41 2.4.2 Impact of the Long Range Correlation K Structure on EPMs’ Uncertainty Next, we investigate the influence of the porous media’s structure, specifi- cally persistent and anti-persistent correlation of the hydraulic conductivity, on the statistics of C p and T p . The entropy of the peak concentration numerical val- ues for different H are depicted in Fig. 2.7 together with the entropy calculated from the lognormal distribution using the equation below [73]: E(θ i ) = ln h σe μ+ 1 2 √ 2π i . (2.9) In Eq. (2.9),σ andμ are the log-scale parameters of the lognormal distribution associated with a specific EPM. The empirical entropy shown in Fig. 2.7 is eval- uated by Eq. (2.8) using the empirical probability distribution of the numerical data (i.e. C p ) without assigning any known parametric PDF to the data. The lognormal entropy plotted in Fig. 2.7 is obtained by finding the best parametric PDF (i.e. lognormal) using Bayesian information criterion and substituting the log-scale parameters associated with the PDF (μ and σ) into Eq. (2.9). As it can be observed in Fig. 2.7, the empirical entropy of C p , obtained for all values ofH, is in a good agreement with the lognormal entropy. Furthermore, the results revealed that the entropy of C p increases as H enlarges. Next, to further grasp the effects of persistent and anti-persistent correlation of hydraulic conduc- tivities on the statistics ofC p , the estimated mean value of the peak concentration (E[C p ]) and the 95% confidence interval (CI) of the estimated mean value for the 42 MC realizations are presented in Fig. 2.8 for different H. As it can be observed from Fig. 2.8, both E[C p ] and the variance of C p increase with H. As mentioned in Section 2.2.2, persistent correlation in hydraulic conductivity values (H > 0.5) results in more channeling effects in the porous formation which aids the plume to reach the control plane quite fast before the solute plume dilutes substantially. Therefore, larger C p is obtained, on average, for higher values of H (i.e. H > 0.5) as opposed to simulations with lower values ofH (i.e. H < 0.5). On the other hand, a less connected formation (H < 0.5) facilitates the entrapment of the solute plume and thus increasing its residence time within the formation. As a consequence, the effects of local-scale dispersion on the solute plume become noticeable and tend to smooth out high concentration gradients (e.g. [29]) and thus reduce the sample-to-sample fluctuations (within the MC ensemble) of C p . The reduction of the variability of C p within the MC realizations leads to less uncertainty and therefore a decrease of entropy and variance. Moreover, Fig. 2.7 shows that the rate of entropy expansion for C p decreases as H increases. This implies that the uncertainty of C p is more sensitive to low values of H. 0.2 0.4 0.6 0.8 H -5.5 -5 -4.5 -4 -3.5 -3 E(C p ) Empirical Entropy Lognormal Entropy Figure 2.7: Information entropy of C p as a function of H, obtained from the MC simulations using Eq. (2.8) for the Empirical Entropy and Eq. (2.9) for the Lognormal Entropy. 43 0.2 0.4 0.6 0.8 H 0 0.005 0.01 0.015 C p 95% CI for E[C p ] E[C p ] Figure 2.8: Mean value of C p data obtained from MC simulations together with the 95% CI of the mean value. Similar analysis is performed for T p and the entropy results are depicted in Fig. 2.9. As it can be noticed from Fig. 2.9, the entropy of T p reduces as H increases. Likewise, the estimated mean value of the time corresponding to the peak concentration (E[T p ]) and the 95% confidence interval (CI) of the estimated mean value for N mc realizations of the solute transport in the disordered porous media are shown in Fig. 2.10. As opposed to C p , both E[T p ] and the variance of T p decrease as H increases. Since larger H (i.e. H > 0.5) produces more connected paths and flow chan- neling, solute plume reaches the control plane faster through these connected flow channels and T p values, on average, are smaller and less variable as compared to T p values in lower H (i.e. H < 0.5). Moreover, similar to the C p results (see Fig. 2.7), the rate of decreasing entropy reduces as H increases. These results indicate that the uncertainty of the EPMs are more sensitive to the low values of H. InordertoillustratetheimpactofH inthecontextofriskassessment, Fig. 2.11 shows the probability of C p being greater than a hypothetical critical value. The procedure of the analysis is as follows. First, for all values of H, the lognormal 44 0.2 0.4 0.6 0.8 H 8.2 8.4 8.6 8.8 9 E(T p ) Empirical Entropy Lognormal Entropy Figure 2.9: Information entropy of T p as a function of H, obtained from the MC simulations using Eq. (2.8) for the Empirical Entropy and Eq. (2.9) for the Lognormal Entropy. 0.2 0.4 0.6 0.8 H 1000 2000 3000 4000 5000 6000 T p 95% CI for E[T p ] E[T p ] Figure 2.10: Mean value of T p data obtained from MC simulations together with the 95% CI of the mean value. PDF is fitted to the data obtained from the N mc realizations and the log-scale parameters associated with the lognormal distribution (μ and σ) are evaluated. Next, we compute the probability ofC p being greater than a specified value defined as C c ≡αC 0 , where C 0 is the initial concentration of the solute at the source (see Table 2.2) andα is a factor bounded as follows, 0≤α< 1. Given that the peak flux averaged concentration is lognormally distributed (see Fig. 2.1), we can calculate the probability of exceedance for different H values using the following analytical equation, 45 Pr [C p ≥C c ] = 1−F (C c ) = 1 2 − 1 2 erf ln(C c )−μ 2σ ! , (2.10) whereF (C c ) is the value of the peak flux averaged concentration’s CDF calculated at C c . Eq. (2.10) is evaluated for various values of α and the results are illustrated in Fig. 2.11. Note that the lognormal parameters, μ and σ can be obtained using the arithmetic mean and variance of C p (E[C p ] and Var[C p ], respectively) by the given equations, μ = ln (E[C p ]) 2 q Var[C p ] + (E[C p ]) 2 , (2.11) σ = v u u t ln " Var[C p ] (E[C p ]) 2 + 1 # , (2.12) where E[C p ] and Var[C p ] are evaluated from the numerical MC simulations. As illustrated in Fig. 2.11, the probability of C p being greater than a specified value, e.g. Pr[C p ≥ 0.001C 0 ], increases with H. This is due to the fact that K- fields with persistent correlations (H > 0.5) display more connected paths which increases the probability of observing higher concentration values, when compared to the K-fields with H < 0.5. Furthermore, Fig. 2.11 reveals that Pr[C p ≥C c ] is more sensitive to low values of H (i.e. H < 0.5). 46 0.2 0.4 0.6 0.8 H 0 0.2 0.4 0.6 0.8 1 Pr[C p ≥C c ] C c =αC 0 α = 0.01 α = 0.001 α = 0.1 α = 0.05 α = 0.005 Figure 2.11: Probability of C p being larger than a critical value defined by C c = αC 0 for different α = 0.001, 0.005, 0.01, 0.05, 0.1. 2.5 Summary In this chapter, we provided an overview of how the structure of the K-field affects the uncertainty of EPMs which are of relevance to probabilistic risk assess- ment. We have investigated how the uncertainty stemming from a spatially het- erogeneous hydraulic conductivity field propagates to different EPMs. Statistics of the EPMs are evaluated by the means of numerical MC simulations. The results indicate that the distribution of the EPMs (evaluated at a control plane) consid- ered in this study (C p ,C e ,C l ,T p ,T e ,T l ) follows lognormal distribution. Since the uncertainty of these EPMs comply with a known parametric PDF (lognormal), the log-scale parameters and consequently the entire distribution can be estimated from a small number of MC realizations which eventually leads to computationally cheap simulations. Furthermore, among the solute flux averaged concentrations computed at different times, C p exhibits the largest uncertainty, whereas in the arrival times of the plume, T l (late arrivals) manifests the maximum uncertainty when compared with T e and T p . As opposed to previous works, that focused on 47 the uncertainty of resident concentration (e.g. [68]), we numerically quantified the PDF of the flux averaged concentration. The impact of various spatially heterogeneous porous media on the statistics of the EPMs is scrutinized. Numerical simulations in multi-scale heterogeneous fields with a wide range of structural parameter (H) are carried out to explore how the persistent/anti-persistent correlation structure in hydraulic conductivity governs the level of uncertainty in EPMs. The impact of H on purely advective transport is documented in Di Federico and Neuman [41]; however, these authors focused on the macrodispersion tensor. Our work analyzes the influence of H on metrics that are relevant to risk assessments. Results show that the mean value of C p increases with H due to the presence of high-conductivity channels in the fields with persistent correlation in K. In contrast, by increasing H, the mean value of T p decreases since the abundance of fast flow paths in persistent K-fields help the plume to reach the environmentally sensitive target faster. By computing the information entropy, our numerical results reveal that the uncertainty of C p increases with H, whereas the uncertainty of the T p reduces by increasing H. However, in both cases (C p and T p ), the rate of entropy change is larger when H is low. Finally, the probability of solute concentration exceedance from a critical value (e.g. an environmentally regulated value) is calculated for various K-fields with different H. As a future research endeavor, chemically reactive contaminant can be used to expand the results of this study to human health risk analysis and incorporate the uncertainty arising from the human physiological variability. Moreover, an interesting question to be addressed is how the statistics of the EPMs behave for three dimensional flow and transport processes in the conductivity fields displaying 48 long correlations. In addition, investigating the influence of the contaminant’s sourcedimensionsonthePDFoftheEPMswouldbeapotentialresearchextension to the topics discussed in this chapter. The results of the presented research can aid decision makers to accurately predict the behavior of EPMs in the presence of disordered heterogeneous geological formations. It should be noted to achieve statistically meaningful EPMs and to explore the effect of a structural parameter of aquifers on such statistics, brute-force MC numerical simulations have been performed which is computationally expensive. Although we reveal that for the configurations analyzed in this study, the uncer- tainty of EPMs can be quantified by cheaper simulations, for other numerical setups the entire process should be replicated. Therefore, a framework is required to efficiently allocate computational budgets in the stochastic numerical modeling of processes in spatially heterogeneous porous media. 49 Chapter 3 Optimal Allocation of Computational Resources in Hydrogeological Models under Uncertainty 1 In previous chapters, we illustrated how stochastic hydrogeological simulations canbetimeconsuming. Inthischapter, wefocusondevelopingastrategytobetter alleviate the computational cost associated with stochastic simulations (such as the one displayed in Chapter 2). The interplay between different elements of the stochastic modeling of aquifers is discussed. An optimal framework is introduced to help modelers to distribute computational resources wisely. 3.1 Introduction Hydrogeological properties, such as the hydraulic conductivity, are spatially variableandduetothehighcostsofdataacquisitionandmeasurementerrors, afull detailed characterization is infeasible [5]. Therefore, hydrogeological predictions 1 The material presented in this chapter is published in Advances in Water Resources (Moslehi, Rajagopal, de Barros, 2015) 50 are subject to uncertainty and stochastic methods, like Monte Carlo (MC) that repeatedly simulates the system, are required. The accuracy of the MC framework depends on the number of realizations (i.e. statistical resolution) used in analyses. Besides the stochastic complexity, hydrogeological systems needs to be modeled numerically as mentioned in Chapter 1.1. To precisely capture the spatio-temporal fluctuationsofthesystem, hydrogeologicalmodelsneedtobediscretized. Likewise, the accuracy of the EPMs depends on the size of discretization (i.e. physical resolution). Giventheaforementionedreasons,thepredictedhydrogeologicalmodelresponse contains two major sources of error. The first source of the overall error is origi- nated from the truncation error of numerical approximations and the second arises from the statistical error associated with the brute-force MC framework. Other stochastic approaches could be chosen where the statistical error decreases at a faster rate when compared to the MC technique [74, 75, 76]. However the simplic- ity of the MC framework has made it the most commonly used approach among hydrogeologists. By considering higher statistical resolution (e.g. by increasing the number of MC runs), the accuracy of the solution can be improved and the statisti- cal error can be reduced. On the other hand, by decreasing the size of the physical discretization (e.g. smaller numerical grid blocks), the truncation error will be diminished. Due to limited computational resources, choosing high resolutions for both physical and statistical discretization is almost impractical. Considering that the costs associated with numerical computations can be alleviated through the use of parallel computing, there is still a need to allocate computational resources in an optimal manner. 51 Moreover, if a fine physical discretization is chosen, but the statistical resolu- tion is coarse, the truncation error may be negligible but the predicted solution is not statistically converged. The same situation can be experienced when many repetitions of model evaluation are used within the MC framework, but the phys- ical discretization is coarse. In this case, the truncation error due to physical discretization is dominant and the provided solution cannot represent the physics of the model accurately. Consequently, it can be implied that there is a tradeoff among physical and statistical discretization and method for optimally choosing these variables is required. Having this methodology at hand would provide a step towards addressing the following fundamental question: Given a limited compu- tational resource, what are the optimum spatial and statistical resolutions that minimize the total error of the model prediction? Ababou et al. studied the impact of grid refinement in spatially heterogeneous subsurface flow fields [15]. Based on heuristic arguments, they proposed a rule to determine a number of numerical grid blocks per heterogeneity correlation scale of the log-conductivity field to achieve physically meaningful solutions and to capture the effects of velocity variability in solute transport. However, this method does not guarantee that the chosen grid resolution is the computationally optimal res- olution (especially when confronted with the computational costs associated with MC). Cainelli et al. investigated the accuracy of numerical schemes for solving the flow governing equation and noted that numerical schemes should be chosen carefully when dealing with spatially heterogeneous formations [77]. By comparing velocity fields calculated by different numerical schemes, Cainelli et al. proposed a methodologyfordefiningthephysicalresolutionthatboundsthenumericalerrorof 52 the flow field. However, they did not consider the roles of computational resource limitations and statistical resolutions in their analysis. Investigations have been carried out to enhance the computational efficiency of numerical hydrogeological models. Leube et al. used the Temporal Moment method to reduce the complexity of the governing transport equation [20]. With the aim of improving the computational efficiency while maintaining the physics of the problem accurate, Battiato et al. developed a hybrid model that accounts for processes at both the pore-scale and the Darcy-scale [78]. Parameter upscaling also allows to alleviate the computational burden [3]. Examples consist of upscaling of thepermeabilityfield(e.g. [79])andthedispersiontensor(e.g. [80,81]). Although, in these papers, the influence of the model reduction on the computational cost has been regarded, the interplay of the uncertainty reduction and the computational cost is not investigated. In the context of uncertainty quantification, Ballio and Guadagnini suggested a methodology for convergence analysis of the MC framework [82]. As a result, they provided an estimation for the number of MC realizations required to refine the accuracy of the model prediction by a given percentage of the confidence interval. Passetto et al. constructed reduced-order models for fully saturated heterogeneous subsurface formations [83]. They evaluated the accuracy of the reduced model as a function of the correlation scale and variance of the log-transmissivity without considering the interaction between computational budget, physical and statistical resolutions. In a recent work, Leube et al. quantified the optimal computational resource allocation by jointly considering the trade-off between uncertainty quan- tification and discretization and model reduction [84]. They obtained cost-to-error surfaces which were generated by simulating different combinations of physical and 53 statisticalresolutionswhichisthenusedtodeterminetheoptimalpointsforagiven computational constraint. The limitation of the approach described in [84] is the need to run various numerical experiments in order to develop cost-to-error sur- faces. Thus, the process of finding the optimal physical and statistical resolutions (for a given computation budget) is a byproduct of these numerical investigations which may lead to an excessive computational burden. In this chapter, we develop a methodology to efficiently distribute the avail- able computational resources among physical and statistical discretization while minimizing the overall error. The main goal is to illustrate the existence of an appropriate choice of grid resolution and the number of MC runs that accurately estimate the model’s prediction with respect to the computational resource con- straints. A key and novel component in our methodology lies in the definition of the overall error in the model response which is based on the bias-variance trade- off [17]. The derived expression for the overall error explicitly accounts for the contribution stemming from the physical discretization and the stochastic resolu- tion. As opposed to Leube et al [84], the derived error metric does not rely on the assumption of statistical independence between the physical and stochastic errors. Through a series of examples, we show that the derived overall model response error is capable of efficiently handling the following conflicting objectives: compu- tational cost, discretization error and statistical error. We test the framework on a commonly used stochastic spatially heterogeneous groundwater flow and transport model set-up. Results indicate that rational allocation of resources can reduce the computational costs while keeping the overall error minimum. The optimal phys- ical and statistical resolutions arising from our method are successfully compared with the optimal results obtained by extensive brute-force numerical simulations. 54 Finally, we highlight the importance of the solute plume scale in controlling the tradeoffs between the statistical and physical resolutions. 3.2 Formulation Let Ω denote the output of a physically-based hydrogeological model. Because of the uncertainty associated with the model parameters, Ω must be treated stochastically. Due to the spatial and temporal variability of these models, the output (Ω), varies over the space (X) and time (T ) domain. The output Ω can be considered as an EPM such as human health risk, resident or flux-averaged concen- tration of a chemical mixture, and solute travel time [1]. Furthermore, the output Ω is a function of the model input parameters that can be represented by the input vector Θ. A variety of parameters such as the hydraulic conductivity, porosity and geostatistical parameters can be included in the input vector Θ. Thus, the output can be expressed as, Ω =f(x,t; Θ), (3.1) where, x and t are related to a specific point in the space and time domain, respectively. The function f assigns each feasible point in space and time to a corresponding output value by considering the input parameters. This function is the solution of a governing equation that can be a Partial Differential Equa- tion (PDE) or in more complex cases, a set of coupled PDEs that mathematically describes the physical model. The model output is a function of the physical con- figuration of the problem such as boundary conditions and sink and source terms. In general, given the physical-(bio)chemical complexity of the problem, Ω is com- puted via numerical methods. The common feature of most numerical methods 55 is the discretization of the physical domain of the model according to the resolu- tion vector Δ. Here Δ contains the dimensions of the numerical grid block. The proper resolution depends not only on the physics of the problem but also on the numerical approach that is used [77], bearing in mind that fine resolutions can significantly increase the computational efforts. Although conventional numerical methods require the definition of grid blocks, there exist grid-free methods such as Moving Particle Semi-Implicit method (MPS) [85] and Smooth Particle Hydro- dynamics (SPH) [86, 87]. However, our work and upcoming results are based on discretization-based methods. In subsurface environments, hydrogeological features vary spatially over multi- ple scales [5]. The spatial variability of hydrogeological properties makes the full characterization of the site properties a difficult task. Therefore, the hydraulic parameters present in Θ are subject to uncertainty that arises from the lack of complete knowledge about the site properties. Consequently, Ω should be mod- eled as a random function. In order to take this randomness into account, the stochastic input parameters can be discretized into Monte Carlo (MC) realizations by sampling from the underlying probability distribution function (pdf) that best represents the uncertainty in Θ. In other words, the MC approach discretizes the stochastic space into N mc equiprobable realizations Θ 1 ,..., Θ Nmc . Finally, by integrating all realizations, an estimation of the model output statistics can be obtained. The process of evaluating the statistics of the model output (Ω) by considering physical (Δ) and statistical (Θ) resolutions is illustrated in Fig. 3.1. The main difficulty associated with the process depicted in Fig. 3.1 is on select- ing the appropriate spatial resolutions Δ and number of MC realizations N mc to achieveaccurateresultsgivenacomputationaltimeconstraint. Themainobjective 56 of this work is to tackle this challenge by developing a framework that illustrates the existence of an appropriate choice of Δ and N mc that accurately estimates the model’s output Ω with respect to the computational resource constraints. Our analysis is based on deriving an expression for the error in approximating the statistics and physical response of Ω. The second goal is to show that optimizing the derived error expression will yield near optimal results for Δ andN mc . Details of the proposed framework is described in Section 6.2. (a) Physical Discretization (b) Statistical Discretization (c) 1 2 mc N 2 1 n Statistics of the Output ( Ω) ) ( pdf Figure 3.1: Schematic process of a typical stochastic numerical hydrogeological simulation: (a) Different grid resolutions for the physical domain (here only spatial discretization); (b) For a specific physical resolution Δ i , the MC framework is employed by samplingN mc realizations from the pdf of the random field parameter Θ; (c) The hydrogeological model is solved numerically for each realization and the results are integrated to achieve the statistical characterization of the output corresponding to a specific Δ and N mc value. 57 3.3 Methodology 3.3.1 Overall Error Estimation of the Model Response Let the vector denote the physical resolution of the numerical model, = (Δ 1 , Δ 2 ,..., Δ J ), (3.2) where J is the number of physical dimensions. In this work, focus is solely on the spatial discretization of the flow domain over a regularly spaced grid. Furthermore, the input parameters in are random variables. Within the MC framework, the ensemble of has a finite size N mc : = (Θ 1 , Θ 2 ,..., Θ Nmc ), (3.3) where Θ m corresponds to the mth realization of the random field. As a conse- quence, substituting the ensemble by a finite size sample and adopting a numerical grid will lead to an approximation of Ω. Our overall goal is to obtain Δ and N mc that minimizes the error for the approximation of Ω for a given computational resource constraint. Thus, the optimization problem can be written as, minimize ε(Δ,N mc ) subject to T tot (Δ,N mc )≤B, (3.4) with variables Δ∈ R J + and N mc ∈ N + . We define ε(Δ,N mc ) to represent the objective function and T tot (Δ,N mc ) is the resource function that determines the required resource for each physical and statistical resolutions. T tot (Δ,N mc ) can 58 represent the required computational time for a specific processor to reach the desired outputs. The required computational resources should be bounded by B which represents the available resources (e.g. time). Hence, by solving this optimization problem, we can estimate the optimal Δ opt and N opt mc such that the minimum error will be obtained according to the computational budget at hand. In order to perform the minimization Eq. (3.4), the error expression needs to be defined. In the rest of this section, the method of estimating the overall error is described. Let us define V as the estimator of Ω which is given by, V (x,t)≡E [Ω(x,t;)], (3.5) where E Θ [·] represents the expectation operator with respect to the random field Θ. AsdescribedinSection3.2, tosolveEq. (3.1), numericalapproachesareusually used. Let ˆ Ω(x,t; Θ, Δ) be the output of the numerical method at a point in space and time for a model with input parameter Θ and resolution vector Δ. Numerical errors are embedded into ˆ Ω, so it can be interpreted as an approximation for Ω. Consequently, the expected value of ˆ Ω is given by, ˆ V (x,t;) =E Θ h ˆ Ω(x,t;,) i . (3.6) In general, due to the complexity of the models, it is not usually tractable to calculate the above expectation analytically. Therefore, the sample mean, can be used as an approximation for Eq. (3.6). Next, we define ¯ V to be the sample mean of ˆ Ω over N mc realizations of, ¯ V (x,t;,N mc ) = 1 N mc Nmc X m=1 ˆ Ω(x,t; Θ m ,). (3.7) 59 Thus, the combined use of numerical methods for solving the governing equations and the MC framework leads to Eq. (3.7) which is an approximation for Eq. (3.5). This approximation is accurate as long as fine resolutions are considered (e.g. Δ→ 0) and a large number of MC simulations is used (N mc →∞). The error of this approximation can be expressed as, (x,t;,N mc )≡ ¯ V (x,t;,N mc )−V (x,t) (3.8) = h ¯ V (x,t;,N mc )− ˆ V (x,t;) i + h ˆ V (x,t;)−V (x,t) i . In the following, the notation (x,t;,N mc ) is simplified by using (,N mc ). The expected value of the error squared can be determined by, ε(,N mc )≡E Θ h 2 (,N mc ) i = E ¯ V (x,t;,N mc )− ˆ V (x,t;) 2 + E ˆ V (x,t;)−V (x,t) 2 −E h 2 ¯ V (x,t;,N mc )− ˆ V (x,t;) ˆ V (x,t;)−V (x,t) i . (3.9) Since ˆ V (x,t;) and V (x,t) are constant values and not random variables (see Eqs. (4) and (5)), and the expectation of sample mean ¯ V (x,t;,N mc ) is the expected value of the population ˆ V (x,t;) , the third term in the right hand side (RHS) of Eq. (3.9) can be expressed as, E h 2 ¯ V (x,t;,N mc )− ˆ V (x,t;) ˆ V (x,t;)−V (x,t) i = 2 ˆ V (x,t;)−V (x,t) E h ¯ V (x,t;,N mc )− ˆ V (x,t;) i = 0. (3.10) 60 Using Eq. (3.10) and the fact that ˆ V (x,t;) andV (x,t) are constant values, Eq. (3.9) can be rewritten as follows, ε(,N mc ) =E ¯ V (x,t;,N mc )− ˆ V (x,t;) 2 + ˆ V (x,t;)−V (x,t) 2 . (3.11) The equation above is known as the bias-variance decomposition where ˆ V (x,t;)−V (x,t) is the bias. The first term of the RHS of the Eq. (3.11) is the variance of the sample mean ¯ V (x,t;,N mc ) . Since the variance of the sample mean can be expressed as the population variance over sample size, Eq. (3.11) can be represented as, ε(,N mc ) = Var h ˆ Ω(x,t;,) i N mc + n E h ˆ Ω(x,t;,) i −E [Ω(x,t;)] o 2 = Var h ˆ Ω(x,t;,) i N mc + n E h ˆ Ω(x,t;,)− Ω(x,t;) io 2 , (3.12) where Var [·] corresponds to the variance operator over. The statistical depen- dency between the statistical and physical errors is manifested in both terms in the RHS of Eq. (3.12). Although the errors are not completely independent as it will be shown in the following examples, the first component of Eq. (3.12) can be viewed as the statistical error (MC error) and its second component is strongly linked to the discretization error. Moreover, the correlation between the errors will depend on the complexity of the PDE. 61 In order to apply the error expression, we need to evaluate the RHS of Eq. (3.12). To achieve this, we will provide an approximation to Eq. (3.12) based on heuristic arguments. The first component of the error depends on Var h ˆ Ω i which can be approximated by the sample variance of ˆ Ω denoted by S 2 ˆ Ω . The sample variance can be estimated on some preliminary evaluations of the model. The second term in RHS of Eq. (3.12), i.e. E h ˆ Ω− Ω i , is mainly associated with the truncation error in the discretization. For this component of the error, we hypothesize that it can be approximated by a polynomial P p i=0 a p kΔk p , see Ch. 3 of [88]. Note that the nature of the discretization error will highly depend on the flow and transport solver used and the numerical scheme adopted. Given the difficultyinherentinsolvingEq. (3.12),weassumethat(3.12)canbeapproximated as follows, ε(,N mc )' S 2 ˆ Ω N mc + (a p kΔk p +a p−1 kΔk p−1 +··· +a 1 kΔk 1 +a 0 ) 2 , (3.13) where S 2 ˆ Ω is the sample variance of ˆ Ω(Θ, Δ;x,t) and a i (with i = 0, 1,··· ,p) are the coefficients of the polynomial. Details pertaining the estimation of S 2 ˆ Ω and parameters a i , as well as the optimization process, are discussed in Section 3.3.2. 3.3.2 Description of the Optimization Process The constrained optimization problem in Eq.(3.4) should be specified and solved to find the optimal physical (Δ) and statistical resolutions (N mc ). The objective function, given by Eq. (3.13), contains the unknown parameters S 2 ˆ Ω and 62 a i (with i = 0, 1,··· ,p). These parameters can be obtained by estimating the overall error Eq. (3.13) from preliminary simulations. The first component of Eq. (3.13) embraces S 2 ˆ Ω as an unknown parameter that can be estimated by multiplying the MC error, obtained from preliminary simulations, with N mc . In the following examples, it will be shown that S 2 ˆ Ω does not display a significant dependence on Δ. The second component of Eq. (3.13) containscoefficientsassociatedwithapolynomialthatcanbedeterminedbyfitting the discretization error of the empirical results (obtained from simulations) to a polynomial. Once the estimate for the unknown coefficients of Eq. (3.13) is obtained, we can address the optimization problem in Eq. (3.4). There are various existing opti- mizing methods in the literature based on mathematical methods (nonlinear pro- gramming) or heuristic algorithms (e.g. [89, 90, 91]). For this work, we optimized the objective function by a nonlinear programming method using the MATLAB’s fmincon built-in function which enables us to find the optimal decision variables Δ and N mc . The optimal parameters resulting from the minimization of Eq. (3.13), constrained by a computational time budget, are denoted by Δ opt and N opt mc . 3.4 Evaluating the Performance of the Method In order to test the performance of the optimization based on Eq. (3.13), we will compare Δ opt and N opt mc with the optimal results obtained from a series of brute-force numerical simulations. The procedure for evaluating the performance of the method is given in the following steps: 63 1. This step consists of solving the numerical model for different combinations of Δ andN mc . For each combination of Δ andN mc , we will obtain a compu- tational cost and an overall error. Details of the error computation is given further below using Eq. (3.14). The outcome of this intensive numerical investigation is a data set from which we can estimate the empirical optimal points for a given computational budget, e.g. computational time B in Eq. (3.4). These empirical optimal points are denoted by Δ opt,∗ andN opt,∗ mc . This is the approach adopted in Leube et al. [84] to construct the cost-to-error surfaces. 2. The second step aims in determining the parametersS 2 ˆ Ω anda i in Eq. (3.13). This can be achieved by estimating the error through numerical simulations for a few discretizations and MC realizations. 3. Once we have an estimate for S 2 ˆ Ω and a i , we can evaluate and minimize Eq. (3.13) subject to the same computational budget B in step 1. This will allows us to estimate the optimal physical and statistical resolutions, namely Δ opt and N opt mc . 4. Lastly, we compare Δ opt and N opt mc (step 3) with the results obtained from the brute-force numerical investigation (step 1), namely Δ opt,∗ and N opt,∗ mc . We point out that the optimal results obtained from the full brute-force numer- ical investigation (step 1) should be based on an error formulation consistent with the proposed error definition given in Eq. (3.13). In other words, the error metric should be similar to Eq. (3.13) so that the results taken from step 1 and step 3 can be comparable. To construct the error formulation to be used in step 1, we need to analyze both terms of Eq. (3.13). The first component of the formula in 64 Eq. (3.13), i.e. the residual variance, is mainly associated with the error of the statistical resolution. The residual variance is inversely proportional to the number of MC realizations, i.e. by increasingN mc , statistical error will be diminished [92]. The second component of Eq. (3.13) evaluates the effect of physical discretization error. Thus, the error metric which is the sum of MC error and discretization error is proposed as, ε ∗ (Δ,N mc ) = " 1 N mc Nmc X i=1 ( ˆ Ω (i) − ˆ Ω mean ) 2 # +| ˆ Ω mean − ˆ Ω ref | 2 , (3.14) where ˆ Ω (i) is the output of the numerical method at a specific point in space and time for the ith realization of Θ. The sample mean of all realizations of the model is given by ˆ Ω mean and ˆ Ω ref denotes the model output for maximumN mc and finest resolution, i.e. small Δ. ˆ Ω ref is considered as a reference point and for this work, we evaluate it numerically. Evaluating Eq. (3.14) for a series of N mc and Δ values allows to perform the tasks described in both step 1 and step 2. In step 2, Eq. (3.14) is used to estimateS 2 ˆ Ω anda i needed for the error estimation Eq. (3.13). The main drawback in using Eq. (3.14) to execute step 2 is that, in hydrogeological applications, we rarely have ˆ Ω ref (especially when one wishes to, a priori, estimate the optimalN mc and Δ). However, from a practical perspective, ˆ Ω ref can represent a measurement from the actual field site. For example, ˆ Ω ref can be estimated through the site investigationandconceptualmodeldevelopmentprocessesand/orwhencalibrating the model. In other words, ˆ Ω ref in Eq. (3.14) can be obtained from hydraulic head measurements or breakthrough curves measured from tracer tests. In addition to the error metric in Eq. (3.14), the computational effort should be quantified. The computational budget is required and should be computed 65 as a function of Δ and N mc to be used as a constraint function for optimization problem Eq. (3.4). There are different budget norms, such as units of elapsed time or computational floating point operations (FLOPS), that can be used to measure the required computational budget of one realization of the model for a given physical discretization. For the total budget, the computational cost associated with one realization is simply multiplied with the total number of realizations, bearing in mind that different realizations are assumed to be independent. This assumption was tested and verified for the cases analyzed in this work. Therefore, the computational budget can be estimated by the following equation: T tot (,N mc ) =N mc ˜ T (), (3.15) where T tot is the total budget, which is a function of Δ and N mc , and ˜ T (·) is the computational time associated with simulating one realization of the model for a specific spatial discretization. 3.5 Simplistic Illustration To illustrate the methodology described in Sections 6.2 and 3.4, we consider a simple ordinary differential equation (ODE). In this example, a 1D steady-state transport of a non-reactive solute in an idealized horizontal homogeneous soil col- umn of dimensionx∈ [0,L] is considered. The main physical processes are advec- tion and local-scale dispersion. Hence, the advection-dispersion equation is given by: d 2 C(x) dx 2 −α dC(x) dx = 0, (3.16) 66 with α = v/D where v is the Darcy-scale velocity and D is the local-scale dispersion coefficient. C(x) corresponds to the concentration of the solute at point x. For the purpose of illustration, we assume that the v is a random parameter which has a uniform distribution bounded by 1 and 2 m/d, i.e.U [1, 2]m/d. The concentration at x = 0 and x = L are equal to C 0 and C L , respectively. The desired output (Ω) of this example is the concentration of the plume at a specific pointx 0 , which is denoted byC x 0 ≡C(x 0 ). The Finite Difference Method (FDM) is employed to solve Eq. (3.16) numerically within the MC framework to capture the effect of the uncertainty of v. Table 3.1 summarizes the relevant parameters associated with this example. Table 3.1: Parameters for the ODE example Parameters Values Units L 10 m C 0 1 ppm C L 0 ppm x 0 7.5 m D 1 m 2 /d v U [1,2] m/d Δ 0.66, 0.5, 0.4, 0.33, 0.28, 0.25, 0.22, 0.20 m N mc 10 2 , 5× 10 3 , 10 3 , 5× 10 3 , 10 4 - FollowingtheproceduredescribedinSection3.4, wehaveexecutedalargenum- ber of simulations with different grid resolutions (Δ =L/n wheren is the number of grid blocks) and MC realizations (N mc ). For each simulation, we recorded the computational error, using Eq. (3.14) and the computational time (step 1). The results of these simulations are illustrated in Figs. 3.2 and 3.3. 67 0 2000 4000 6000 8000 10000 10 −6 10 −5 10 −4 10 −3 No. of MC Realizations Total Error Δ = 0.66 Δ = 0.50 Δ = 0.40 Δ = 0.33 Δ = 0.28 Δ = 0.25 Δ = 0.22 Δ = 0.20 (a) 0.2 0.3 0.4 0.5 0.6 0.7 0.5 1 1.5 2 2.5 x 10 −4 Spatial Discretization size (Δ) Total Error N mc = 100 N mc = 500 N mc = 1000 N mc = 5000 N mc = 10000 (b) 0 2000 4000 6000 8000 10000 0 0.2 0.4 0.6 0.8 1 x 10 −4 No. of MC Realizations MC Error Δ = 0.66 Δ = 0.50 Δ = 0.40 Δ = 0.33 Δ = 0.28 Δ = 0.25 Δ = 0.22 Δ = 0.20 (c) 0.2 0.3 0.4 0.5 0.6 0.7 0 0.002 0.004 0.006 0.008 0.01 0.012 Spatial Discretization size (Δ) Discretization Error N mc = 100 N mc = 500 N mc = 1000 N mc = 5000 N mc = 10000 (d) Figure 3.2: (a) The total error (Eq. (3.14)) versus the number of MC realizations (statistical resolutions) for different spatial discretizations Δ. (b) Evolution of the total error (Eq. (3.14)) versus spatial discretization for given numbers of MC realizations. (c) MC error (first term of the RHS of the Eq. (3.14)) versus the number of MC realizations for different spatial resolutions (d) Discretization error (second term of the RHS of the Eq. (3.14)) versus the spatial discretization for different number of MC realizations. 68 Spatial Resolution (L / Δ) No. of MC Realizations 2.6e−06 3e−06 4e−06 6e−06 8e−06 1.1e−05 1.5e−05 2.3e−05 3.4e−05 3.4e−05 7.5e−05 7.5e−05 15 20 25 30 35 40 45 50 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time (Sec) 50 100 150 200 250 300 350 400 450 500 550 Error Contour Lines Time Contour Lines Optimal points Figure 3.3: Error and computational time for different combinations of N mc and Δ. The optimal points were computed by solving Eq. (3.4) subject to the following computational budget (time): T tot = [5, 10, 50, 100, 250]s. It can be observed that the empirical optimal points found by brute-force simulations and the optimal points achieved by solving Eq. (3.4) are almost identical. The total error evaluated in Eq. (3.14) associated with different combinations of Δ and N mc is shown in Figs. 3.2a-b (step 1). It can be observed that the total error diminishes by increasing the statistical and spatial resolutions. The dependence of the MC error (first component of the RHS of Eq. (3.14)) on N mc for various values of Δ is depicted in Fig. 3.2c. It is shown that the stochastic errors are nearly identical for all Δ values explored. This result indicates that for this class of problem, the statistical error can be assumed to be independent of Δ. 69 In order to estimate S 2 ˆ Ω in Eq. (3.13), the statistical error is multiplied by N mc . The results show thatS 2 ˆ Ω is almost a constant value (approximately equal to 0.01) and it does not change byN mc . As a result, we can use the set of simulations with minimum computational demand (minimum N mc and maximum Δ) to estimate S 2 ˆ Ω (step 2). The discretization error is computed by using the second term of the RHS of Eq. (3.14) and the results are shown in Fig. 3.2d. It can be observed that the trend of discretization error is almost the same for any value of statistical resolution. The evolution of the error is in agreement with the classic results from discretization error analysis [88]. For this problem, the discretization error is almost independent ofN mc (compare curves forN mc =10 2 and 10 4 in Fig. 3.2d). As mentioned in Section 3.3.1, a polynomial can be fitted to the discretization errors; in our work, we opted for a quadratic polynomial. Based on our simulations, the fitted polynomials obtained from different N mc are almost identical (see Fig. 2d), thus the curve achieved for N mc = 500 is considered as a representative for all cases. Therefore, the quadratic polynomial fitted with N mc = 500 is used in the objective function Eq. (3.13) (i.e. p = 2). By following the methodology described in Section 4, the parameters in Eq. (3.13), namely S 2 ˆ Ω and a i (for i=0,1,2), can be estimated (step 2). As a conse- quence, Eq. (3.13) can be optimized with respect to the computational resource constraints. The optimal solutions of different computational budget constrains are shown in Fig. 3.3 (step 3). In this figure we also have included the con- tour plots of computational time and error empirically obtained from the results of aforementioned simulations for all cases of N mc and Δ (step 1). This figure verifies the proper performance of the methodology since the optimal points found 70 Table 3.2: Comparison between the actual and estimated error for the ODE exam- ple N mc Δ(m) ε ε ∗ 10 2 L/15 2.47×10 −4 2.68×10 −4 10 3 L/40 1.35×10 −5 1.54×10 −5 10 4 L/30 1.06×10 −5 9.34×10 −6 by solving optimization problem described in step 3, Δ opt and N opt mc , are close to the minimum error contour lines restricted to the time constraints (i.e. the Δ opt,∗ and N opt,∗ mc in step 1 are located in the dashed blue lines crosses the solid black line). Next, we compare the error,ε ∗ , obtained from empirical results and Eq. (3.14), with the estimated error, ε, calculated by Eq. (3.13), to quantitatively investigate the accuracy of the proposed overall error equation Eq. (3.13). In order to show the error comparison, three different physical and statistical resolutions are chosen; the actual error (ε ∗ ) and the estimated error (ε) are presented in Table 3.2. As it is illustrated in Table 3.2, the values of ε and ε ∗ are similar. 3.6 Heterogeneous Groundwater Flow and Transport Problem In this section, the proposed methodology is applied to a groundwater het- erogeneous flow model. For this illustration, we consider a regularly spaced grid, i.e. Δ i ≡ Δ for i = 1, 2,··· ,J. As done in the previous section, the accuracy of the method is verified by comparing the optimal results from Eq. (3.4) (step 3) with the optimal results from the brute-force numerical simulations (step 1). 71 For this section, we have selected a flow and transport configuration widely used in the stochastic hydrogeological literature. Two distinct transport scenarios are investigated and their details are presented in the remaining parts of this section. 3.6.1 Physical Formulation, Model Setup and Implemen- tation A two-dimensional aquifer with a steady-state incompressible horizontal flow field in the absence of sinks and sources is considered, with a Cartesian coordinate systemx = (x 1 ,x 2 ). The transmissivity (T) is spatially variable, i.e. T = T (x), and isotropic. The porosity (φ) of the geological formation is considered uniform. Flow is governed by the following differential equation: ∇· [T (x)∇H(x)] = 0, (3.17) withno-flowboundaryconditionsinthetransverseboundariesandaconstanthead gradient is imposed at the longitudinal boundaries of the flow domain. This physi- cal set-up was selected for the purpose of illustration and different types of bound- ary conditions (deterministic or random) can be incorporated. The hydraulic head is denoted byH. For this example, the environmental performance metric of inter- est will be the time of the peak concentration (τ p ) at a control plane downstream from the injection source. Transport of an instantaneously released dissolved non- reactive tracer along a line source of transverse dimension (L 2 ) is assumed to be governed by the advection-dispersion equation: ∂C(x,t) ∂t +u(x)·∇C(x,t) =∇· (D d ∇C(x,t)), (3.18) 72 where D d is the local-scale dispersion tensor, u(x) is the Darcy-scale velocity field and C(x,t) is the resident concentration of the plume. The velocityu(x) is determined via Darcy’s law once the solution of Eq. (3.17) is obtained. For this investigation, the major source of uncertainty stems from the spatial distributionofT. Thelog-transmissivityfield,Y = lnT ismodeledasastatistically stationaryRandom Space Function (RSF)[5]withvarianceσ 2 , correlationlengthλ and covarianceC Y . We consider an isotropic exponential covariance model for the upcoming illustrations. To cast Eqs. (3.17) and (5.10) within a MC framework, we need to randomly generate multiple (N mc ) realizations ofY. For each equiprobable realization of Y, we will solve the flow and transport and obtain the statistics of τ p . To randomly generate the spatially variable Y, we will employ SGeMS [93]. Similar to the simulations in Chapter 2, flow is solved using MODFLOW [94] and solute transport was modeled through the random walk particle tracking code (RW3D) documented in [95, 96, 62, 64]. The concentration field within the random walk particle tracking framework is obtained using the methodology described in [63]. We performed extensive MC simulations using MODFLOW in a 200×200m 2 square domain, in which the spatial resolutions in both directions are identical. Table 3.3 summarizes the flow, transport and geostatistical parameters associated with this computational example. The time associated with the peak concentration is evaluated at the control plane located at x 1 = 100 m. In these scenarios, the contaminant is released from the vertical line segment source located at x 1 = 25m with lengthL 2 . The impact of L 2 on the optimal resource allocation will be addressed in Section 3.6.3. 73 Table 3.3: Parameters of Flow and Transport in Heterogeneous Field example Parameters Value Units σ 2 0.8 − λ 6.667 m K G 1 m/d φ 0.3 − D d,11 0.1 m 2 /d D d,22 0.02 m 2 /d Hydraulic Gradient 0.07 − Number of Particles 30000 − Δ λ 3 , λ 6 , λ 10 , λ 20 m N mc 10, 50, 100, 500, 1000, 5000, 10000 − 3.6.2 Computational Results and Method Performance Theupcomingresultsarebasedonacontaminantlinesegmentsourceofdimen- sion L 2 = 2.5λ m. Fig. 3.4 depicts the overall empirical error (Eq. (3.14)) and its two components (MC and spatial errors) for various combinations of statistical and physical resolutions (see step 1 in Section 3.4). 74 0 2000 4000 6000 8000 10000 10 1 10 2 10 3 10 4 No. MC Realizations Total Error Δ = λ/3 Δ = λ/6 Δ = λ/10 Δ = λ/20 (a) 0.5 1 1.5 2 0 500 1000 1500 2000 2500 3000 3500 Spatial Discretization size (Δ) Total Error N mc = 50 N mc = 100 N mc = 500 N mc = 1000 N mc = 5000 N mc = 10000 (b) 0 2000 4000 6000 8000 10000 10 0 10 1 10 2 10 3 10 4 No. of MC Realizations MC Error Δ = λ/3 Δ = λ/6 Δ = λ/10 Δ = λ/20 (c) 0.5 1 1.5 2 0 2 4 6 8 10 12 14 16 18 Spatial Discretization size (Δ) Discretization Error N mc = 50 N mc = 100 N mc = 500 N mc = 1000 N mc = 5000 N mc = 10000 (d) Figure 3.4: (a) The total error (Eq. (3.14)) versus the number of MC realizations (statistical resolutions) for different spatial discretizations Δ. (b) Evolution of the total error (Eq. (3.14)) versus spatial discretization for given numbers of MC realizations. (c) MC error (first term of the RHS of the Eq. (3.14)) versus the number of MC realizations for different spatial resolutions (d) Discretization error (second term of the RHS of the Eq. (3.14)) versus the spatial discretization for different number of MC realizations. 75 Spatial Resolution (λ / Δ) No. of MC Realizations 100000 200000 300000 450000 600000 25 35 50 80 100 120 120 150 150 200 200 300 4 6 8 10 12 14 16 18 20 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time (Sec) 2 4 6 8 10 12 x 10 5 Error Contour Lines Time Contour Lines Optimal points Figure 3.5: Error and computational time for different combinations of N mc and Δ. The optimal points were computed by solving Eq. (3.4) subject to following computational budget (time): T tot = [10 5 , 2× 10 5 , 3× 10 5 ] s. It can be observed that the empirical optimal points found by brute-force simulations and the optimal points achieved by solving Eq. (3.4) are almost identical. As it is shown in Figs. 3.4b and 3.4d, the discretization error and the total error increase as the grid block is coarsened. As expected, the error decreases for a higher statistical resolution (see Figs. 3.4a and 3.4c). In addition, Fig. 3.4d reveals that both statistical and discretization errors are not independent. The dependency between the errors is explicitly taken into account in Eq. (3.12). 76 Next, we utilize the results in Figs. 3.4c and 3.4d to determine the coefficients, S 2 ˆ Ω and a i . These are required for the evaluation of the objective function (3.13), see step 2 in Section 3.4. As depicted in Fig. 3.4c, the curves obtained from all physical resolutions are quite similar. Thus, S 2 ˆ Ω can be evaluated based on simulations with a coarse numerical grid block. The coefficients a i , second term in the RHS of Eq. (3.13), can be estimated by fitting a polynomial curve through the results presented in Fig. 3.4d. For this illustration, we fit the polynomial to the results computed for N mc = 500 since additional N mc leads to a incremental decrease of the discretization error (see Fig. 3.4d). As opposed to the ODE example presented in Section 3.5, where the discretization error had a clear quadratic growth independent of the number of MC runs (see Fig. 3.2d), the results shown in Fig. 3.4d display a dependency on N mc . This dependency is reduced for N mc & 500 (Fig. 3.4d). Notice that the error displayed in Fig. 3.4d accumulates the discretization error from both the MODFLOW software (based on the FDM) and the random walk particle tracking code [95, 96, 62]. The latter contains errors associated with the time step, velocity interpolation and smoothing of the dispersion tensor [95, 97]. We believe this to be the cause of the complex dynamics observed in Fig. 3.4d. The overall error and the computational time for all values of Δ and N mc are depicted in Fig. 3.5. This figure reflects the existence of a trade-off between the overall error and the computational budget. It also illustrates the comparison of the empirical results and the results obtained from optimizing the objective function subject to different resource constraints (steps 1 and 3 in Section 3.4). For each time constraint, the optimal point (Δ opt and N opt mc ) is close to the mini- mum error obtained empirically through extensive simulations (Δ opt,∗ andN opt,∗ mc ). 77 This observation implies that the proposed error metric and framework provide an acceptable estimation of the optimal decision variables Δ and N mc for a given limited computational budget. Table 3.4 reports the actual error (ε ∗ ) and the estimated error (ε) for three different statistical and physical resolutions to investigate the accuracy of the proposed overall error estimation (Eq. (3.13)). As it is shown in Table 3.4, Eq. (3.13) provides an appropriate estimation of the actual error since ε ∗ and ε are almost identical for any resolution. Table 3.4: Comparison between the actual and the estimated error in heteroge- neous flow and transport problem. N mc Δ ε ε ∗ 10 2 λ/20 963.8 1008.1 10 3 λ/10 128.9 130.8 10 4 λ/6 87.6 84.6 Next, we evaluate the performance of the methodology with commonly used gridsizevaluessuggestedbytheliterature, independentonthenumberofMCruns. Ababou et al. recommends using a grid block value Δ =λ/4 to capture the effects ofheterogeneityingroundwaterflowandtransport[15]. Inaddition, modelerstend to perform stochastic simulations withN mc on the order of 10 2 to 10 3 . Running our flow and transport simulation with these commonly used parameters (Δ = λ/4, N mc = 10 3 ), we obtain an error of ε ∗ = 290 and a computational cost of T tot = 33960 s. Using T tot as a constraint, i.e. B =T tot , we adopt the proposed method defined in Section 3.4 (Eq. (13)) and obtain the following near optimal values for the grid size and number of MC: Δ = λ/6.65 and N mc = 680 with ε = 206. Note that our method provides an overall error (ε = 206) smaller than the one 78 estimated from the recommended Δ and N mc values from the literature (about 29% decrease in the total error is observed). We believe that this difference will be enhanced when dealing with higher levels of aquifer heterogeneity and model complexity. Furthermore, the near optimum results for Δ opt and N opt mc suggest a finer grid resolution as opposed to the statistical resolution for the given time constraint adopted (B =T tot ). 3.6.3 Plume-Scale and Ergodicity Next, we analyze the impact of the source dimension,L 2 , in the optimal statis- ticalandphysicalresolutions. Weperformedsimulationswiththesameparameters as the previous scenario (Table 3.3) on the same Y fields generated by SGeMS in Section 3.6.2. The main distinction is the increase of contaminant line segment source from L 2 = 2.5λ m to L 2 = 9λ m. By repeating the analysis, we obtain the new optimal resolutions. Figure 3.6 displays the computational time contour map (solid lines) as a function of the grid resolution and the number of Monte Carlo realizations as well as the optimal points for L 2 = 2.5λ (red circles) and L 2 = 9λ (blue squares). The deviation of optimal points in this scenario (L 2 = 9λ m) from the optimal resolutions obtained from the previous scenario (L 2 = 2.5λ m) is illustrated in Fig. 3.6. 79 Spatial Resolution (λ / Δ) No. of MC Realizations 100000 200000 300000 450000 600000 5 10 15 20 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1st Scenario (L 2 =2.5λ) 2nd Scenario (L 2 =9λ) Figure 3.6: Optimal points of the two scenarios with different source zone dimen- sions are depicted subject to the 3 different time constraints: T tot = [10 5 , 2× 10 5 , 3× 10 5 ] s. Blue square symbols corresponds to L 2 = 9λ m, while red circle symbols are for L 2 = 2.5λ m. Solid lines correspond to the computational cost. As depicted in Fig. 3.6, by increasing the size of contaminant source relative to λ, the prediction error for τ p becomes more sensitive to the physical resolution rather than number of MC simulations. Fig. 3.6 shows the optimal path (the path that includes the optimal points) has a smaller slope in the scenario for a larger plume. As a consequence, the model output is more sensitive to the physical discretization when compared to the previous scenario investigated (i.e. with the smaller source zone, L 2 = 2.5λ m). This shift in the optimal allocation is due to the plume ergodicity. The results in Fig. 3.6 reveal that under near ergodic condition the model response (τ p ) is 80 less sensitive to the statistical resolution, as opposed to Δ. This result is in agree- ment with the stochastic hydrogeological theory, i.e. near ergodic plumes are less prone to uncertainty [5, 98]. Investigating the optimal resolution sensitivity to the plume scale can assist decision makers for the better allocation of computa- tional resources. Similar analysis was performed in the context of goal-oriented uncertainty reduction [13]. 3.7 Summary Weexaminedtheimpactofbothphysicalandstochasticresolutionsontheover- all error approximation of the model response subject to a computational budget constraint. In general, hydrogeological predictions of interest are large-scale with excessive computational complexity and uncertain characteristics. A huge variety of numerical schemes combined with Monte Carlo framework are widely used to handlethecomplexityassociatedwiththesemodels. However, inordertomaintain accurate predictions, one needs to wisely allocate limited computational resources. In this chapter, by considering the trade-off among two conflicting objectives, i.e. the available computational resource and the accuracy of predictions, we pro- posed a framework based on the bias-variance tradeoff analysis to determine the optimal physical and statistical resolutions for a given computational resource in order to minimize the error of predictions. Our objective is to illustrate the exis- tenceofanappropriatechoiceofphysicalandstochasticresolutionsthataccurately estimatethemodel’spredictionwithrespecttoagivencomputationalresourcecon- straint. To achieve this, we derived an error expression of the model response that accounts for both sources of error as well as its statistical dependency. 81 We applied the method to a series of examples in order to test its performance. Results show that the optimal points, achieved through the model error expression, are close to the actual optimal points obtained by extensive brute-force numerical simulation (i.e. examining all candidate resolutions to find the minimal error with the given time constraint). In addition, our analyses show that various factors such as the physics of the problem, numerical scheme and the available computa- tional resources affect the optimal resolutions and the allocation of computational resources. In particular, the optimal resolutions highly depends on the available computational budget. The initial dimension of the contaminant source plays a fundamental role in defining the allocation of computational resources. As the size of the contaminant source increases, with respect to the correlation scale, the model output becomes more sensitive to the physical discretization and the significance of the statistical resolution is diminished. This is an outcome of plume ergodicity: larger plumes are less prone to uncertainty since it samples more variability of the conductivity field [5, 98, 99]. Extending a similar analysis to investigate the impact of different error metrics on the computational resources allocation should be considered in future stud- ies. In addition, applying the ideas discussed in this chapter to address a real case-study would prove a future research endeavor. An interesting question to be addressed is how the tradeoffs for spatial and statistical resolutions behave for three dimensional groundwater flows and for geological formations with both high heterogeneity and connectivity. Moreover, results should be expanded to consider 82 the tradeoffs for both spatial and temporal resolutions in the presence of uncer- tainty and to investigate the impact of distinct boundary conditions in the final optimal design. It should be noted that the optimal resource allocation is goal-oriented and depends on the level of statistics being sought. If one is interested in evaluating extreme values (e.g. rare events in contamination [11, 87]), the error metric needs to be modified appropriately. Capturing these rare events would require several MC runs and a refined numerical grid to accurately predict peak concentration. Therefore, the target upon which decisions will be made plays a significant role in developing the error expression and consequently the resource allocation. As discussed in Section 4, the approach proposed requires determining, a priori, the reference solution. To avoid this problem, field data from preliminary site charac- terization (e.g. [100]) or under certain conditions, analytical solution or surrogate approximations (e.g. [83, 101]) can be utilized to evaluate the reference solution. Additional investigation should be carried out to propose an error metric without the requirement of having the reference point. Moreover, spatial refinement depends on the smoothness of the flow field. In the presence of high velocity gradients, such as in the vicinity of pumping or injec- tion wells and fractures, the discretization may become adaptive. Under these conditions, higher physical resolution might be required as opposed to a higher statistical resolution since a fraction of the flow is controlled by a pumping or injection well. Hence, new strategies should be explored to determine the optimal discretization based on the physical setting of the problem (e.g. distinct boundary conditions, sinks and sources) and the geostatistical characteristics. The results shown in this study are limited to our simulation configurations and the numerical 83 method adopted, however, the methodology can be applied and expanded to mod- els in other environmental flows that entails numerical schemes and uncertainty. We highlight that the computational effort associated with the physical dis- cretization can be alleviated through upscaling techniques. For example, the com- putational burdens can be relieved by developing block-scale conductivity and block-scale dispersion coefficients [79, 80, 81, 102, 103]. Therefore, to propose an optimal framework for stochastic hydrogeological modeling, an efficient upscaling method is required that decreases computational burdens while maintaining the desired accuracy for model predictions. 84 Chapter 4 Upscaling of Solute Transport in Disordered Porous Media by Wavelet Transformations 1 Up to here, we have shown the complexity of the computational modeling of stochastic hydrogeological systems (Chapter 2) and proposed an optimal frame- work to distribute computational resources effectively (Chapter 3). In this chap- ter, we discuss how to reduce computational costs associated with high-resolution numerical models using an efficient upscaling technique. Our aim is to reduce the computational cost associated with fine numerical grids for simulating flow and transport processes. 4.1 Introduction It is of fundamental and practical importance to incorporate the spatial het- erogeneity of porous geological formations in the models of flow and transport in such media [104, 5, 18]. At the field scale (e.g. orders of 10 2 -10 3 m), subsur- face properties, such as the permeability, vary many orders of magnitude across 1 The material presented in this chapter is published in Advances in Water Resources (Moslehi, de Barros, Ebrahimi, Sahimi, 2016) 85 multiple length scales (e.g. from 1 m to 10 3 m or larger) [104, 5, 18]. It is well recognized that the spatial fluctuations of the permeability field, i.e. many orders of magnitude difference between permeability values, have a significant role in the spreading rates of solute plumes, as well as estimates of their early or late arrival times. Thus, neglecting the effect of subsurface heterogeneity, and in particular the spatial distribution of the permeability, in numerical simulation leads to erroneous prediction of solute transport, which will have severe consequences for health risk assessment [12, 105], the likelihood of extreme events [39, 34, 27], and for reactive mixing [3, 106]. In general, to obtain accurate predictions for solute mixing at the field scale and calculate the properties that characterize the process, such as the distribu- tion of travel times and the dispersion coefficients, numerical simulations of flow and transport in large-scale aquifers with high-resolution computational grids are required. Simulating hydrogoelogical systems with high-resolution grids demand substantial computational resources, particularly when used within a Monte Carlo (MC) framework. Consequently, it is of considerable practical importance to allo- cate computational budgets efficiently to reduce simulation costs [19, 20, 21]. To alleviate the computational burden, upscaling methods are used. Upscaling flow and transport in heterogeneous porous formations has been studied intensively for several decades. One may divide the existing methods into those that are based on volume [107, 108, 109, 110], ensemble [111, 112], or stochastic averaging [113, 114, 115, 80, 116]. In the context of stochastic averaging, many analytical methods have been developed to evaluate the effective transport properties in a coarse-scale heterogeneous model, including analytical perturba- tion [113, 117, 81, 118] and self-consistent [104, 119, 120] methods. Alternatively, 86 numerical simulations [121, 122, 123] and the renormalization group transforma- tions [124, 125, 126] have been employed to compute the effective conductivity of field-scale porous media. Such works were reviewed by Renard and de Marsily [127], Wen and Gómez-Hernández [79], and Sahimi [18]. In general, most of the coarsening process is carried out by homogenizing the aquifer’smodelthrough,forexample,itshydraulicconductivityorthepermeability attributed to the blocks of a highly-resolved computational grid such that the upscaled permeability or conductivity field has identical symmetries as those of the fine-scale field, as defined earlier [128, 129, 130]. The size of the upscaled grid blocks is determined by considering the available computational resources. A drawback associated with the homogenization techniques is that they coarsen the subsurfacedomainuniformly, whichoftenleadstolargeerrorsinthepredictionsfor the flow and transport properties, especially in the presence of fractured regions or sink and source. This is because many homogeneization methods average out the effects of extreme events, such as fast flow paths or large flow barriers. To address this issue, Durlofsky et al. [131] proposed to first scale up the permeability field by a homogenization technique and then identify the high-velocity regions by solving for single-phase flow in the homogenized grid. Such regions are then discretized to the fine scale in order to capture the small-scale variability that the field contains. Although the issue of uniform blocks is addressed, the single-phase flow should still be computed prior to the upscaling process. An alternative method for upscaling is based on the wavelet transforms (WTs) [132, 133, 134, 102, 135, 136, 137, 66], which have been used to upscale heteroge- neous porous media by coarsening the permeability or hydraulic conductivity field. The method upscales the high-resolution geological model of a porous formation 87 non-uniformly, which leads to preserving the important information on the spatial distribution of the permeability field at all the relevant length scales, but coarsens those parts of the computational grid that contribute little to the flow field. Thus, the number of grid blocks in the computational grid and, hence, the number of flow andtransportequationstobesolvedarereduceddrasticallywithoutsacrificingany crucial information about the conductivity or permeability field. Use of the WTs in various applications has been the subject of intensive research over the past 25 years [132, 135, 138, 139, 140, 141], including the simulation of flow and transport in large-scale porous media, such as oil reservoir [132, 137, 66, 142, 143, 144, 145]. The focus of this part of the thesis is to extend the upscaling method by the WTs, introduced originally by Mehrabi and Sahimi [133] and Ebrahimi and Sahimi [134], to upscaling of solute transport in disordered porous media. Their work focused on upscaling a permeability or conductivity field. In contrast, our primary goal is to investigate the effect of multiple features, i.e. structural and geometri- cal parameters, in the conductivity field on the overall transport behavior, such as the arrival time, peak concentration and spatial moment, and to demonstrate how geostatistical parameters characterizing the heterogeneous fields influence the performance of the upscaling using wavelet transformation. Furthermore, we used the upscaled field to reconstruct the second central spatial moment of the plume that represents global features of the transport process. The spatial moment anal- ysis has not be been analyzed in the previous works related to WTs. We show that upscaling by the WTs efficiently coarsens the computational grid for simulat- ing solute transport in subsurface domains, and reduces substantially the number of grid blocks. To demonstrate the accuracy of the upscaling procedure via the 88 WTs, flow and transport are simulated in both the fine-resolution and the upscaled computational grids, and the results are compared. The rest of this chapter is structured as follows. In Section 4.2 we formulate the class of problem under investigation. Section 4.3 describes the details pertaining to the WTs and the methodology used for upscaling of the hydraulic conductivity or permeability field. We then present the details of the numerical simulations in Section 4.4. The results are presented and discussed in Section 4.5, where we test systematically the performance of the WT upscaling for a variety of disordered porous formations. The last section summarizes the chapter. 4.2 Problem Statement Similar to the previous chapters, we consider a fully-saturated steady-state flow of an incompressible and Newtonian fluid through a heterogeneous geological formation. The hydraulic conductivity K(x) is spatially distributed, where x rep- resents the Cartesian coordinate system. We assume that the flow is slow enough that the fluid velocity v(x) follows the Darcy’s law v(x) = K(x) P ∇ϕ(x), (4.1) whereϕ(x) is the hydraulic head, andP is the porosity of the medium, assumed to be uniform. As usual, the hydraulic head is computed by solving the flow equation ∇· [K(x)∇ϕ(x)] = 0. (4.2) 89 A constant hydraulic gradient in the mean sense is imposed on the system such that, on average, the flow is uniform along the longitudinal direction. An inert solute is released and advected and dispersed by the fluctuating velocity field and diffusion. The spatiotemporal evolution of the concentration field is governed by the advection-dispersion equation, ∂C(x,t) ∂t + v(x)·∇C(x,t) =∇· [D o (x)∇C(x,t)], (4.3) where C(x,t) is the solute concentration at point x at time t, and D o (x) is the local dispersion tensor. Alternatively, transport can be formulated in terms of the Langevin equation [146], dx(t|a) dt = v[x(t)] + √ 2D o (t), (4.4) where x(t) is the position of the particles, a is the initial location of the solute particle (i.e. x(t|a) = a) and (t) is the Gaussian white noise characterized by zero mean and covariancehη i (t)η j (t 0 )i = δ ij δ(t−t 0 ) where δ ij is the Kronecker delta, δ(t) is the Dirac delta distribution andh·i indicates the average over all noise realizations (i.e. (t)). In order to obtain an accurate solution for C(x,t), a highly-resolved (i.e. fine-scale) representation of K(x) is generated, which is then upscaled while maintaining the key features of its variations that produce the concentration field. 90 4.3 Methodology In this section, we briefly describe the WTs, and then explain the upscaling method used throughout this chapter. 4.3.1 Wavelet Transformations For an excellent introduction to the WTs and their properties see Nievergelt [141]. We define the WT of the spatially-varying permeability or hydraulic con- ductivity K(x), also referred to as the wavelet detail coefficient, by D(a, b) = ∞ −∞ K(x)ψ ab (x)dx = 1 √ a ∞ −∞ K(x)ψ[(x− b)/a]dx, (4.5) whereψ(x) isusually calledthemother wavelet. Thechoiceofψ(x) depends onthe intended application and, in fact, the possibility of developing separate wavelets for each application is a great advantage of the WT. Equation (4.5) makes it clear that a > 0 is a rescaling parameter, whereas b represents translation of the wavelet, and that using the WT of K(x) enables one to analyze the spatial distribution of the permeability at increasingly coarser (a > 1) or finer (a < 1) length scales. As is well-known,D(a, b) contains information on the differences between two approximations of K(x) in two successive length scales. But, information about K(x) at a fixed length scale is contained in another wavelet coefficient, the wavelet approximate or wavelet scale coefficient, defined by S(a, b) = +∞ −∞ φ ab (x)K(x)dx, (4.6) 91 whereφ ab (x) is called the wavelet scaling function and is orthogonal toψ(x), with its definition being similar to that of ψ ab (x). An important property of the WTs, which is most useful to upscaling, is that they are recursive; that is, they can be applied in succession to any set of properties produced by using the wavelets, to generate another level of averages and another level of details. Althoughwesimulateandupscaletwo-dimensional(2D)computationalgridsin this study, they can be applied to 3D with equal facility [147]. We utilize a square grid to each square block of which a permeability (or hydraulic conductivity) K is attributed. A one-level discrete WT is then applied to the permeability field. This means that we upscale the computational grid by a factor of 2 by joining the neighboring blocks, except that the upscaling is not uniform. If the center of a block is at x = (i 1 ,i 2 ), we associate with the WT of the permeabilities of that block a set of four wavelet coefficients, three of which are the wavelet detail coefficients, with the fourth one being the wavelet scale coefficient. More precisely, we compute D (d) j (i 1 ,i 2 ) = Ω K(x,y)ψ (d) j,i 1 ,i 2 (x,y)dxdy, (4.7) and S j (i 1 ,i 2 ) = Ω K(x,y)φ j,i 1 ,i 2 (x,y)dxdy, (4.8) wherej istheupscalinglevelsuchthatj = 1representstheinitialresolvedgridthat we begin the computations with, and Ω represents the domain of the problem or the computational grid.S j (i 1 ,i 2 ) carries information about K(x) associated with a block with its center at x in the coarser grid, whereasD (d) j (i 1 ,i 2 ) is a measure of the difference between K(x) in the current upscaled grid and those of the block’s 92 neighbors in the previous finer scale grid, withd = 1, 2, and 3 corresponding to the contrasts, respectively, between the blocks in thex,y, and the diagonal directions. 4.3.2 Upscaling To implement the upscaling, we introduce two thresholds, S and D . The value of S is a measure of the permeability of the grid blocks associated with the wavelet scale coefficient. For a given level of upscaling [a givenj in Eq. (4.7)], we compute the scale coefficients, normalize them with the largest coefficient, and then set S between 0 and 1 (or set it to be a fraction of the largest of such coefficients if we do not normalize them). Likewise, after computing the wavelet detail coefficients and normalizing them, the threshold D , which contains information on the contrast between the permeabilities of the neighboring blocks, is set between 0 and 1 (or a fraction of the largest detail coefficient in the computational grid, if the coefficients are not normalized). The upscaling procedure is as follows [133, 134]. The threshold S is compared with the scale coefficient of each block with its center at (i 1 ,i 2 ). If (the normalized) S j (i 1 ,i 2 ) > S , it means that the block’s permeability K is large and significant. Thus, we do nothing and move on to the next block. If, however, (the normalized) S j (i 1 ,i 2 )< S , the block’s (normalized) detail coefficientsD (d) j (i 1 ,i 2 ) are inspected and set to zero if they are smaller than their threshold D . Doing so implies that the neighbor of the block centered at (i 1 ,i 2 ) corresponding to the direction (d), which in the finer-scale computational grid is just one horizontal, or vertical, or diagonal block away from (i 1 ,i 2 ), is merged with that centered at (i 1 ,i 2 ) to form a larger block. Therefore, depending on the broadness and structure of the spatial 93 distribution of K(x) and the numerical values of the two thresholds, a number of blocks in the finer-scale grid are upscaled. We then take advantage of the recursive property of the WTs, and upscale the new grid further by applying the discrete WT to the scale coefficients obtained at the previous level j (recall that the scale coefficient contain information about the spatial distribution of the permeabilities at a fixed scale), and computing a new set of four wavelet coefficients for each block of the grid in its current state. The newly-computed (normalized) detail coefficients are again set to zero if they are smaller than the threshold D , and the corresponding blocks in the grid are merged. The iterative process for upscaling is repeated again until no significant number of the grid blocks is upscaled (merged). Typically, for given values of the two thresholds, three or four levels of coarsening j suffice to generate an upscaled grid that can no longer be upscaled significantly. Therefore, the procedure yields very quickly the final upscaled grid. It should be clear that larger thresholds result in larger number of the grid blocks that are upscaled, and are set by both the required accuracy and the amount of computational time that one can afford. The next step is computing the effective permeabilities of the upscaled blocks. They may be computed by any of several methods. One way is by reconstructing the spatial distribution of the permeability of the upscaled grid. This means that we compute the inverse WT of K(x) after upscaling the grid (given that many of the detail and scale coefficients are now zero) and assign the permeability of the enlarged upscaled blocks based on the reconstructed distribution. What we use in this study is based on the analogy between the laws of electrical circuits and fluid flow. Thus, for example, an enlarged upscaled block that consists of 94 four neighboring smaller blocks, each having its own permeability, is replaced after upscaling by a larger block with an equivalent permeability K e given by K e = 4(K 1 +K 3 )(K 2 +K 4 )[K 2 K 4 (K 1 +K 3 ) +K 1 K 3 (K 2 +K 4 )] α( P 4 i=1 K i ) + 3ζ , (4.9) with α = K 2 K 4 (K 1 +K 3 ) +K 1 K 3 (K 2 +K 4 ) ζ = (K 1 +K 2 )(K 3 +K 4 )(K 1 +K 3 )(K 2 +K 4 ) andK i (i = 1−4) are the permeabilities of the four blocks [102]. The key attribute of this method, in addition to its simplicity, is that no important information about the permeability field is lost, because the permeabilities of the smaller blocks are used in a rigorous manner for computing K e . If the permeabilities are direction- dependent, as in anisotropic porous media, then Eq. (4.9) is used once for each direction in order to compute the equivalent direction-dependent permeabilities. 4.3.3 The Wavelets Because we work with 2D computational grids, we must specify 2D wavelets. One way of constructing 2D wavelets used in the present study is by tensor prod- ucts of 1D wavelets, which produces a 2D wavelet scaling function, φ j,i 1 ,i 2 (x,y) = φ j i 1 (x)φ j i 2 (y), and three wavelets: ψ (1) j,i 1 ,i 2 (x,y) =φ j i 1 (x)ψ j i 2 (y), ψ (2) j,i 1 ,i 2 (x,y) =ψ j i 1 (x)φ j i 2 (y), (4.10) ψ (3) j,i 1 ,i 2 (x,y) =ψ j i 1 (x)ψ j i 2 (y). 95 The 1D discrete wavelets themselves are constructed by settinga = 2 j andb = 2 j i inEqs. (4.5)and(4.6), whereiandj arebothinteger. Byrescalingandtranslating ψ(x) and φ(x) using ψ j i (x) = 2 −j/2 ψ(2 −j x−i), and φ j i (x) = 2 −j/2 φ(2 −j x−i), one also constructs orthonormal wavelets that are non-zero over only small intervals of x. The compact support of the wavelets is important, because it makes the computations extremely rapid. An important set of wavelets, developed by Daubechies [148], consists of orthonormal wavelets of order M, and referred to as the DBM. Their first M moments are zero, hence the name. Moreover, φ(x) = √ 2 L−1 X i=0 h i φ(2x−i), (4.11) ψ(x) = √ 2 L−1 X i=0 g i φ(2x−i), (4.12) where L = 2M, with h i and g i - referred to as the filter coefficients - are related by, g i = (−1) i h L−i−1 , with i = 0, 1,··· ,L− 1. These coefficients are usually nonzero for only a few values of i. For example, for the DB2 wavelet, one has, (h 0 ,h 1 ,h 2 ,h 3 ) = (1/4 √ 2)(1 + √ 3, 3 + √ 3, 3− √ 3, 1− √ 3), and (g 0 ,g 1 ,g 2 ,g 3 ) = (−h 3 ,h 2 ,−h 1 ,h 0 ). Note that in both cases the extra factor 1/ √ 2 is necessary to ensure the orthonormality of φ and ψ. Press et al. [149] provide the numerical values of h k for many wavelets. Based on our previous work for upscaling of flow and transport [132, 133, 134, 102, 135, 136, 137, 66], the results reported in this chapter were computed by using the DB4 wavelet, for which the filter coefficients (h 0 ,h 1 ,h 2 ,h 3 ) are, 1 8 [( √ 2 + √ 6), (3 √ 2 + √ 6), (3 √ 2− √ 6), ( √ 2− √ 3)]. We have previously shown that the use of more complex wavelets will not result in much improved results. Note that the scale and translation parameters of the model 96 wavelet, i.e. a and b, are set by the physical constraints that one imposes on the wavelets, which in turn depend on the particular application. 4.4 The Spatial Distribution of the Permeability and Numerical Simulation We consider flow and transport in a spatially-heterogeneous aquifer, and gener- ate several broad distributions of the hydraulic conductivity in an initially uniform computation grid with square blocks of size Δ x = Δ y = Δ and various grid sizes L×L (i.e. size of the computational domain). In all the simulations and for all domain sizes L, we set Δ = 1 m. Similar to Chapter 2, we opt to generate the conductivity fields according to a fractional Brownian motion (fBm), which is a stochastic process that induces long-range correlations in the hydraulic conduc- tivity field. Our choice is not a limitation; any other field with other types of correlation functions may be adopted. At the same time, we point out that many field data acquired for oil reservoirs and aquifers display the fBm-type correlations [18, 40, 36, 150, 42, 43], after the original work of Hewett [43] who demonstated the existence of such long-range correlations in the porosity logs. The two-point correlation function of the fBm is given by C(r)−C(0)∼ r 2H , (4.13) where r is the lag-distance and H is the Hurst exponent. For 0<H < 0.5, the field’s hydraulic conductivities are negatively correlated, whereas for 0.5 < H < 1 they are positively correlated. Negative correlations 97 imply that a grid block with a high or low conductivity is less likely to be a neighbor to another grid block with a high or low conductivity. The opposite is true for positive correlations. To study the effect of the nature of the correlations - positive versus negative - simulations were carried out for four values of H that are listed in Table 4.1 and referred to as the “First scenario.” Note that a conductivity field with H < 0.5 represents a much more heterogeneous porous medium than one with H > 0.5. Furthermore, in order to examine the effect of the broadness of the conductivity distribution on the effectiveness of the upscaling method, three distinct orders of magnitude variations, S in the K values were considered in the numerical experiments, which are also listed in Table 4.1 and referred to as the “Second scenario.” Finally, the performance of the WT upscaling was investigated with various domain sizes listed in Table 4.1 and referred to as the “Third scenario.” Table 4.1: Summary of the parameters used in the simulations.R is the measure of the reduction in the number of grid blocks defined byR = (n f −n c )/n f × 100% andx CP is the location of the control plane where the concentration breakthrough curves are evaluated. Scenarios L H S R(%) x CP First Scenario 512 0.2 3 97.80 350 512 0.4 3 98.08 350 512 0.6 3 98.11 350 512 0.8 3 98.27 350 Second Scenario 512 0.7 3 98.20 350 512 0.7 5 96.57 350 512 0.7 7 98.23 350 Third Scenario 256 0.7 3 97.71 150 512 0.7 3 98.40 350 1024 0.7 3 97.45 500 98 All the conductivity fields were upscaled with the thresholds S = D = 0.9. Such high thresholds imply that a very large fraction of the blocks are upscaled but, as we demonstrate below, even they produce very accurate results. In the numerical simulations described below the maximum level of upscaling was 3, since higher levels of upscaling did not result in much coarser computational grids. As an example, Figure 4.1 presents a fine-scale conductivity field used in the study with L = 512 m, H = 0.2 and S = 3, together with the corre- sponding upscaled model. Let n c and n f denote, respectively, the number of grid blocks in the coarsened field and in the initial fine-grid structure. We define R = (n f −n c )/n f × 100% as the measure of the reduction in the number of grid blocks that we achieve by the WT upscaling. This quantity is also listed in Table 4.1. 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 5 10 15 20 (b) (a) Figure 4.1: Comparison of (a) the conductivity field in the high-resolution model of size L = 512 m with the Hurst exponent H = 0.2 and the order of magnitude variations of the K values, S = 3, and (b) the corresponding upscaled field. Flow is simulated by solving Eq. (4.2) in both the fine and coarse permeability fields. No-flow boundary conditions along the transverse boundaries are used, and a constant longitudinal head gradient is imposed between the leftmost and rightmost sides of the grid with their values being, respectively, 2 m and 1 m. As 99 for the transport simulation, we solve for the case in which we inject a non-reactive tracer along a transverse line source. The tracer is released with constant flow rate, q = 40 m 2 /d, at the left side of the computational grid. We evaluate the breakthrough concentration of the plume at a control plane (CP) with a longitudinal locationx CP from the line source. For each scenario, the value of x CP is listed in Table 4.1. Solute transport is simulated by solving Eq. (4.3 or 4.4) by a random walk method, with its detailed description reported in [102, 151, 152]. The longitudinal and transverse dispersion at the local-scale were 0.546 m 2 /d and 0.0045 m 2 /d, respectively. For each case, several realizations of the conductivity fields were generated and the results were averaged over all of them. 4.5 Results and Discussion Figure 4.2 compares the plume’s snapshots in both the high-resolution and upscaled computational grids for two distinct times. The temporal evolution of the concentration is expressed in terms of pore volume injected (PVI). The PVI is the fractional volume of the fluid injected relative to the total pore volume of the pore space. With uniform injection, as we do in this work, the PVI is the dimensionless time. The size of the grid is L = 512 m with the Hurst exponent being H = 0.2. The order of magnitude variations in the hydraulic conductivity is S = 3. As Figure 4.2 indicates, the concentration profiles in the upscaled field reproduce very accurately the results obtained with the corresponding fine-scale geological model. 100 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 0 0.5 1 1.5 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 0 0.5 1 1.5 (d) (c) (a) (b) Figure4.2: Soluteplumesnapshotsforthehigh-resolutionandcoarse-scaleconduc- tivity fields. The concentration C(x,t) was computed in both high-resolution and coarse-scale conductivity fields for the parameters,L = 512 m,H = 0.2 andS = 3, and for (a) high-resolution field at PVI = 0.2; (b) coarsened field at PVI = 0.2; (c) high-resolution field at PVI = 0.8, and (d) coarsened field at PVI = 0.8. We also analyzed systematically the influence of three important parameters of the upscaling, namely, the Hurst exponent H, the order of magnitude variations in the conductivities,S, and the sizeL of the computational grid. The travel time of plumes is controlled by the correlations and contrast between the conductivities of the different zones of a porous medium. Thus, we highlight the effects ofH and S, as the Hurst exponent characterizes the nature of the correlations between the conductivities, while S quantifies the broadness of their spatial distribution. Figure 4.3 compares quantitatively the concentration profiles at the CP in the highly resolved and coarsened computational grids for four values of H. There is very little, if any, difference between the profiles, demonstrating the accuracy 101 of the upscaling algorithm. Furthermore, in order to demonstrate the agreement between the results obtained from the fine-scale and upscaled fields, goodness-of-fit analysis was carried out and depicted in Figure 4.4 for the case with L = 512 m, H = 0.8 and S = 3. The equation of the straight line fitted to the contaminant concentration data in Figure 4.4 isy = 0.99x + 1.7 withR 2 = 0.9997, wherey and x are associated with the upscaled and fine-scale data, respectively. As Figure 4.3 demonstrates, for the K fields with H > 0.5, in which the conductivities are positively correlated and better connected flow and transport paths exist, the breakthrough occurs earlier than that in the K fields withH < 0.5. Note also the ability of the algorithm for capturing the precise first arrival times (in the PVIs), which are critical for risk analysis [27]. 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled H=0.2 H=0.4 H=0.8 (c) H=0.6 (a) (b) (d) Figure 4.3: The results for the first scenario (see Table 1): Concentration break- through curves for the high-resolution fields with S = 3 and dimension L = 512 m and the corresponding coarsened fields. Results shown for (a) H = 0.2; (b) H = 0.4; (c) H = 0.6, and (d) H = 0.8. 102 0 200 400 600 Fine 0 200 400 600 Upscaled Upscaled vs Fine linear Fitting Figure 4.4: Goodness-of-fit analysis between the concentration profiles obtained from fine-scale and upscaled fields for the case with L = 512 m, H = 0.8 and S = 3. The straight line is represented by y = 0.99x + 1.7 with R 2 = 0.9997. Figure 4.5 presents the dependence ofR, the percentage of the remaining grid blocks in the final upscaled computational grid, on the Hurst exponent H. R is slightly larger for H < 0.5, since the conductivity fields with negative long-range correlations represent more heterogeneous porous media. But, overall, for all 0<H < 1 the upscaled grid is extremely coarsened, having a very small number of enlarged grid blocks and yet, as Figures 4.2 and 4.3 indicate, the accuracy of the concentration profiles in such grids is completely comparable with that of the fine- scale grids. As pointed out by Ebrahimi and Sahimi [134, 102, 136], the wavelet scale coefficients preserve the distribution of the high permeability paths, while the wavelet detail coefficients preserve the grid resolution where the permeability contrasts are high, such as the interface between the low and high permeabilities. These are, of course, precisely what are expected of any accurate upscaling scheme. Thus, the WT upscaling is completely suitable for geological domains displaying fast flow paths induced by the persistence of high conductivities, but also for more heterogeneous porous formations in which slow transport paths may develop. 103 0.2 0.4 0.6 0.8 H 97.8 98 98.2 R ( % ) Figure 4.5: The percent reduction,R, of the number of grid blocks in the upscaled fields and that of the grid representing the corresponding high-resolution fields, and its dependence on the Hurst coefficient H in the first scenario (see Table 1). Next, we examine the sensitivity of the upscaling scheme to S, the order of magnitude variations in the hydraulic conductivity field, referred to as the second scenario in Table 4.1. Figure 4.6 compares the concentration profiles for S = 3, 5 and7. EvenwhentherearesevenordersofmagnitudevariationsinthevaluesofK, the breakthrough points (BTPs) are predicted accurately by the upscaled scheme, and the concentration profiles in all three cases in the coarsened and fine-resolution grids agree excellently. There is only a slight discrepancy between the fine- and coarse-scale fields for S = 7 around PVI = 0.7, but all other main features of the profiles are captured (including the first arrival times) even for S = 7. In their study of frequency-dependence conductivity in highly heterogeneous semiconduct- ing materials in which the local conductivities vary up to 15 orders of magnitude, Pazhoohesh et al. [147] demonstrated that, in fact, broader conductivity distri- butions lead to more accurate results and more efficient computations, because only a very small fraction of highly heterogeneous media contribute significantly to transport, and the WT upscaling identifies them accurately. 104 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled S=7 S=3 S=5 (a) (b) (c) Figure 4.6: Results obtained for the second scenario (see Table 1). Comparison between the concentration breakthrough curves for coarsened and fine scale con- ductivity fields for H = 0.7 and grid size L = 512 m. Results reported for (a) S = 3; (b) S = 5, and (c) S = 7. We present in Figure 4.7 the effect of the domain’s size L on the upscaling procedure. It presents comparisons of the concentration profiles in three compu- tational grids with increasing sizes. Increasing the size of grid results in more accurate, and hence computationally more efficient results. For L = 1024 m, the two profiles are virtually identical. 105 0 0.2 0.4 0.6 0.8 1 PVI 0 100 200 300 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 200 400 600 Concentration Fine Upscaled 0 0.2 0.4 0.6 0.8 1 PVI 0 500 1000 Concentration Fine Upscaled L=512 L=256 L=1024 (a) (b) (c) Figure 4.7: Results for the third scenario (see Table 1): The breakthrough curves for the high-resolution conductivity fields with H = 0.7 and S = 3 and the corre- sponding coarsened fields with (a) L = 256 m; (b) L = 512 m and (c) L = 1024 m. Since the dispersion coefficients essentially represent the second spatial moments of the concentration distribution, we also computedS xx , the second cen- tral moment of the plume in the longitudinal direction that represents the spread 106 of the solute body around its centroid. The second central spatial moment S xx is given by S xx (t) =μ xx (t)− [μ x (t)] 2 , (4.14) where μ xx (t) = 1 μ 0 (t) P Ω x 2 C(x,y,t)dxdy, (4.15) μ x (t) = 1 μ 0 (t) P Ω xC(x,y,t)dxdy, (4.16) with μ 0 (t) =P Ω C(x,y,t)dxdy ,which equals to the mass in the domain, and the rest of the notation is as before. In Figure 4.8 we present the results for two cases. Figure 4.8(a) depicts the results for the case, H = 0.2 and S = 3, while Figure 4.8(b) shows those for H = 0.7 and S = 7. The linear size of the grid for both cases was L = 512. It is clear that there is very little, if any difference between the computed second moments in the high-resolution and upscaled grids, implying that the longitudinal dispersion coefficient is provided highly accurately by the upscaling method. Note that in all the cases discussed, flow and transport were simulated in the absence of sinks and sources. As demonstrated by Rasaei and Sahimi [137, 66], who studied upscaling of two-phase flow in oil reservoirs, in the presence of sinks and sources, such as injection and production wells in such reservoirs, the grid resolution strategy should be adaptive since, due to the high velocity gradients in their vicinity, more refined grid blocks are required around them. Likewise, fully-refined grid blocks are required close to the boundaries where we need to capture the details of the flow and transport processes. To implement this, a 107 0.2 0.4 0.6 0.8 1 PVI 0 0.5 1 1.5 2 2.5 S xx #10 4 Fine Upscaled 0.2 0.4 0.6 0.8 1 PVI 0 0.5 1 1.5 2 2.5 S xx #10 4 Fine Upscaled (a) (b) Figure 4.8: Comparison of the second central spatial momentS xx (t) of the concen- tration distribution in the high-resolution and upscaled conductivity fields. The results are for (a) H = 0.2 andS = 3, and (b)H = 0.7 andS = 7 (see Table 4.1). The size of the numerical grid is L = 512 and the porosity is equal to 0.54. simple constraint is imposed on the upscaling process that prevents marked areas around sources, sinks and boundaries from being coarsened. The results reveal that the wavelet-based upscaling method is accurate for a single disordered porous medium. This is not, however, a limitation as the method can be extended to stochastic hydrogeology. Using a Monte Carlo framework one can, after generating multiple realizations of the heterogeneous fields, upscale the conductivityfields. Then, simulationofflowandtransportprocessescanbecarried out through the upscaled conductivity fields. 4.6 Efficiency of the Calculation One purpose of the present work was to demonstrate that if one upscales a het- erogeneous conductivity field by the WT method, simulation of solute transport in the upscaled field yields results for the characteistics of the transport phenomenon that are as accurate as those obtained with the original high-resolution grid. This is independent of the numerical method that one uses to solve the ADE. Clearly, if 108 one uses an efficient and accurate numerical scheme for solving the ADE, then the speed-up in the computations will be very large. In the present work, we used a simple random walk method to solve the ADE, which is not necessarily the most efficient method, but has the advantage that it avoids to a large extent numerical dispersion associated with other methods that are used to solve the ADE. The previous work on upscaling of flow of two-immiscible fluids in large-scale porous media [137, 66], which used a finite-volume method, demonstrated that the speed- up in the computations is as large as a factor of 5,000. We expect the same order of speed-up if a similar numerical scheme is used. It was also demonstrated in previous works [137, 66] how to eliminate possible numerical dispersion. 4.7 Summary This chapter presents the application of a methodology for upscaling of the spatial distribution of a hydraulic conductivity or permeability field in a computa- tional grid for simulation of solute transport in spatially-heterogeneous geological formations. The methodology is based on the wavelet transforms that preserve the resolution of the high-conductivity paths, as well as that of the regions in which there are sharp contrasts between the conductivities. The performance of the method was tested for a wide range of the important parameters of the simu- lations, ranging from the order of magnitude variations in the conductivities and the nature of their correlations (positive versus negative), to the size of the com- putational grid. We demonstrated in all cases that the upscaled grid constructed by the wavelet transform reproduces the concentration breakthrough curves very accurately when compared with the same in the initial highly-resolved grids. In particular, the breakthrough times are predicted extremely accurately. From a 109 computational point of view, the wavelet transform upscaling method generates non-uniform grids with a number of grid blocks that represent, on average, only about 2.1% of the number of grid blocks in the original high-resolution models. Thus, given any reasonable method of solving the advection-dispersion equation, the upscaling method results in an enormous reduction in the computational time of such simulations. 110 Chapter 5 Performance Evaluation of EnKF-based Hydrogeological Site Characterization using Color Coherence Vectors 1 Data assimilation techniques have been used to estimate hydrogeological model parameters, such as hydraulic conductivity, for the purpose of reducing the uncer- tainty associated with hydrogeological models. Here, we propose a vision-based approach that can quantify the accuracy of estimations by considering the spatial structure embedded in the estimated fields. Under various factors of data assimi- lation methods such as the number of measurements, we compare the performance of our proposed metric with the commonly-used criteria. By simulating flow and transport processes using estimated and true model parameters, we observe that the proposed metric outperforms other existing evaluation metrics. 1 The material presented in this chapter will be submitted to Water Resources Research (Moslehi and de Barros, 2018) 111 5.1 Introduction Hydrogeological properties such as hydraulic conductivity (K) and porosity are spatially heterogeneous and uncertain. As an outcome, dependent variables (e.g. velocity field) have spatial fluctuations throughout the domain. To capture these variabilities, modeling flow and transport processes in the subsurface need to incor- porate the structural patterns of the hydrogeological properties [104, 5]. However, high-resolution site characterization is both time consuming and expensive, par- ticularly when dealing with large-scale stochastic hydrosystems such as aquifers [60]. In real field conditions, only limited amount of data is available. Therefore, efficient modeling techniques should be utilized to characterize model parameters (e.g. K) using the available sparse measurements. To tackle this challenge, inverse modeling is customarily employed to infer model parameters using observed data [6, 7, 8]. One well-known class of data assimilation techniques is the Kalman Filter (KF) method [153]. KF is a sequential approach that obtains least squares estimates of model parameters by requiring consistency with noisy observations and lin- ear governing equations. However, it is not applicable to large-scale and non- linear systems due to imposing significant computational costs associated with calculations of error covariance matrices. Extended KF is a modification of KF, developed to address nonlinear systems using first order linearization [154]. The main drawback of extended KF is the excessive computational cost when deal- ing with large-scale systems. To overcome the problem associated with nonlinear and large-scale models, Evensen (1994) introduced the Ensemble Kalman Filter (EnKF) method. EnKF proposes an efficient method for estimating the required covariance matrices using a limited ensemble of model realizations [155]. EnKF 112 has been applied to various disciplines such as oceanography [155, 156], meteo- rology [157] and remote sensing data assimilation in hydrology [158, 159, 160]. In particular, hydrogeologists and petroleum engineers have been using EnKF to characterize porous media with the aim of reducing uncertainty in model predic- tions [161, 162, 163, 164, 165, 166, 167, 168, 169]. EnKF has also been used to study coupled surface-subsurface flow at the catchment scale [170]. In this study, we opt to use the EnKF since it is a commonly used data assimilation approach whichsequentiallyestimatesmodelparameterssuchasK usingtheavailablescarce measurements. In the literature, different methods have been utilized to evaluate the accu- racy of the model parameter estimations achieved by EnKF. One simple way is to visually compare the estimations with the true values (or only observation val- ues). However, it is not always straightforward to perform such visual compari- son, especially when dealing with three-dimensional large-scale model estimations. Furthermore, RMSE (root mean square error) has been primarily employed as a criterion to quantify the goodness of the EnKF-based estimations by measuring the similarity between the estimated and true values [171, 172, 173]. RMSE is usually accompanied by another metric, the ensemble spread, which computes the estimated uncertainty based on the ensemble [157, 167]. Alternatively, rank his- togram has been used to verify ensemble estimations by calculating the rank of an observation relative to values from an ensemble sorted from lowest to highest [174]. However, these commonly used methods do not take into account the spa- tial information of the hydrogeological domain. Therefore, two K-fields with very different spatial structures can have similar rank histograms or RMSE (see Fig. 5.1 for an illustrative example). 113 Figure 5.1: (a) Reference K-field together with two K-fields, (b) and (c), with similar RMSE (approximately equal to 0.55) with respect to the referenceK-field. In order to overcome the aforementioned drawbacks associated with the avail- able evaluation metrics, we propose to adapt a vision-based approach that quanti- fies the accuracy of estimations by considering the spatial structure embedded in the estimated fields. If we consider heterogeneous K-fields as images, where each pixel’s color value represents the K value corresponding to the pixel’s location, image processing tools can be utilized to verify the performance of EnKF-based K-field estimations. We adapt a novel metric, Color Coherence Vectors (CCV), to evaluate the performance of EnKF-based site characterizations. CCV was first introduced by Pass et al. (1997) [175] for the purpose of comparing images in the field of computer vision. In their study, the results of image comparison by CCV technique are juxtaposed with the results obtained from color histograms to show the superiority of CCV method. Color histogram evaluates the distribution of colors by counting the number of pixels that have colors in each of a fixed color ranges, which span the image’s colorspace [176]. One of the main advantages of CCV over color histogram is that as opposed to color histogram, CCV considers the information regarding the spatial configuration of the attribute, which makes 114 it a proper tool for evaluating the performance of spatial data assimilation meth- ods. CCV has been employed in several image processing and computer vision applications such as image retrieval [177, 178, 179] and image classification [180]. In order to examine whether a data assimilation method operates correctly, the obtained estimated hydrogeological fields are usually compared with the true field by various metrics discussed earlier. In the current study, through the use of a hydrogeological example, we show that CCV is a more accurate alternative com- paring to the commonly used metrics (e.g. RMSE) for evaluating the performance of spatially data assimilation techniques. The chapter is organized as follows. First, the theoretical background regarding EnKF and CCV is briefly reviewed in Section 5.2. Furthermore, the methodology employed for implementing CCV in analyzing the behavior of EnKF-based site characterizations is described in Section 6.2. Next, the illustrative hydrogeologi- cal examples, their numerical setups and the results are discussed in Section 6.3. Finally, our results are summarized in Section 6.4. 5.2 Background In this section, the theoretical background associated with the EnKF procedure and CCV is provided. 5.2.1 Ensemble Kalman Filter (EnKF) EnKF sequentially estimates model parameters (e.g. K) by reconciling mul- tiple sources of measurements (e.g. hydraulic head, h, and K) with governing processes. It consists of two major steps, forecast and assimilation steps, that are 115 carried out recursively to eventually obtain the estimation of model parameters. The forecast step can reflect the governing equations of the physical system of interest (e.g. flow), while the assimilation step can take available observations into account. EnKF is a Monte Carlo-based data assimilation approach that leverages an ensemble to calculate the statistics required for KF. Since EnKF concepts have been well studied in the literature [181, 182], only a brief description and formula- tion of EnKF will be provided in this section. For a comprehensive KF and EnKF review in hydrogeological systems, readers are referred to Chen et al. (2006) [167]. We start the EnKF process from an initial ensemble of state vectors consisting of model parameters (e.g. K) and model variables (e.g. h). The state vector corresponding to each realization (i.e. ensemble member) is iteratively updated using the available observations. The first step is to forward the current state vectors (X), provided from the previous iteration (i− 1) or the initial ensemble, to obtain new state vectors for the next time step (i) using the forecast model (e.g. flow simulator), X f j (i) =F h X a j (i− 1) i + f j (i), (5.1) whereF representstheforecastmodel(e.g. flowsimulator)thatcanbeeitherlinear or nonlinear, j corresponds to the jth realization in the ensemble, superscript f refers to the forecasted state and f is the error associated with the forecast model that can be different for each realization. Next, the state covariance matrix (P f ) can be approximated by the following equations, P f (i)≈ 1 N mc − 1 Nmc X j=1 h X f j (i)−hX f (i)i ih X f j (i)−hX f (i)i i T , (5.2) 116 where, hX f (i)i≈ 1 N mc Nmc X j=1 X f j (i), (5.3) withhX f (i)i denoting the mean value of state vectors over N mc realizations. The state covariance matrix is required to evaluate Kalman gain (G), which will be used to update state vectors in the data assimilation step. Another term that influences G is the error covariance matrix of observations (R), arising from the measurement error ( d j ) associated with observation vector (d j ), d j (i) = HX t (i) + d j (i), (5.4) G(i) = P f (i)H T h HP f (i)H T + R(i) i −1 , (5.5) where X t correspondstothetruevalue(e.g. trueK-field)and Histheobservations operator, consists of 0s and 1s, indicating the relationship between the state vector and observation vector. Then, state vectors obtained from the forecast step (X f j ) are updated using the equation below, X a j (i) = X f j (i) + G(i) h d j (i)− HX f j (i) i . (5.6) Eq. (5.6) is called the data assimilation step, where the updated state vector for each realization (X a j ) is evaluated. The assimilated state X a j will be used in Eq. (5.1) to evaluate a new forecasted state for the next time step. 117 5.2.2 Color Coherence Vector (CCV) CCV is a histogram-based technique for comparing images that incorporate spatial information. This specific property of CCV leads to superior results in image-related applications when compared to the regular color histogram method. Color histogram calculates the distribution of colors by counting the number of pixels that have colors in each of a fixed color ranges, which extend across the image’s colorspace. The main difference between CCV and color histogram is that in addition to reporting the number of pixels with a given color, the number of coherent and incoherent pixels within the given color are being tallied. A color’s coherenceisdefinedastheextenttowhichpixelsofthegivencolorbelongtoalarge group of pixels of the similar color. Therefore, CCV classifies a pixel as coherent if it is a member of a large similarly-color connected component; otherwise, it is classified as an incoherent pixel. A connected component with a certain color is a maximum set of pixels such that for any two distinct pixels, there is a path between them that consists of a sequence of adjacent pixels with the same given color. For an excellent introduction to CCV and its properties, see [175]. Below, the steps of the CCV calculation is outlined. 1. First, the image is blurred such that the color value of the pixel, that is presented with RGB colorspace, is replaced with the average value of small local neighborhood. 2. Second, the colorspace is discretized as if there are only m distinct color values in the image such that every pixel is labeled by a color i where 1≤ i≤m. 118 3. Next, each pixel is labeled as either coherent or incoherent depending on the number of pixels in the connected component it belongs to. If the number of pixelsexceedsaspecifiedthreshold,p c , thepixelwillbeclassifiedascoherent. 4. Last, the color coherent vector associated with the image is presented as w={(ξ 1 ,η 1 ),···, (ξ i ,η i ),···, (ξ m ,η m )}, where ξ i and η i are the number of coherent and incoherent pixels of the ith color, respectively. In order to demonstrate the steps of CCV calculation, a simple example is presented here. A grayscale image with size 5× 5 is considered, such that all three color components (i.e. RGB) have the same value at every pixel. Consequently, every pixel’s color can be represented with a single number. Let us assume that Fig. 5.2a is the resulting color values at every pixel after blurring the input image. Suppose that the colorspace is discretized as if there are only three different color intensities in the image (i.e. m = 3), so that pixels with color value 10 through 19, 20 through 29 and 30 through 39 are represented by 1, 2 and 3, respectively. Therefore, Fig. 5.2b is the outcome of the second step mentioned in the CCV calculation. 11 13 23 25 19 32 24 12 26 24 25 24 28 38 17 22 21 21 13 14 27 24 15 18 18 A A B B C D B C B B B B B D C B B B A A B B A A A 1 coh 1 inc 1 coh 1 coh 1 coh 1 coh 1 coh 1 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 2 coh 1 inc 1 inc 3 inc 3 inc 1 1 2 2 1 3 2 1 2 2 2 2 2 3 1 2 2 2 1 1 2 2 1 1 1 (a) (b) (c) (d) Figure 5.2: A simplistic example of CCV calculation steps: (a) color intensities of a grayscale image after blurring, (b) discretized colorspace with three distinct colors, (c) similarly-color connected components and (d) labeled pixels as either coherent or incoherent. 119 Next, similarly-color connected components are delineated in Fig. 5.2c with letters A, B, C and D. As stated in step 3, each pixel needs to be classified as either coherent or incoherent depending on the number of pixels in the connected component it belongs to. In this example, the threshold is assumed to be four (i.e. p c =4). Thus, if the connected component contains more than four pixels, the corresponding pixels are considered as coherent; otherwise, they are labeled as incoherent pixels. As shown in Fig. 5.2d, pixels in components A and B are classified as coherent (coh i , where i = 1, 2, 3), and pixels in components C and D are labeled as incoherent (inc i , wherei = 1, 2, 3). The associated subscripts in Fig. 5.2d refer to the discretized color values obtained from step 2. Finally, for each discretized color value, the number of coherent and incoherent pixels, represented by ξ and η respectively, are specified in Table 5.1. Hence, the CCV of the image is summarized as w={(7, 3), (13, 0), (0, 2)}. Table 5.1: Summary of the discretized color values with their corresponding num- ber of coherent and incoherent pixels. Color ξ η 1 7 3 2 13 0 3 0 2 5.3 Methodology In this section, we describe the overall procedure in which the accuracy of the parameter estimation method is quantified using CCV. For this work, and without 120 the loss of generality, we are interested in estimating the spatial distribution of the hydraulic conductivity (K) field. The steps in our methodology is as follows: 1. First, we estimate K-fields using the EnKF technique and the available hydrogeological measurements. It should be noted that the EnKF method used in this study is not a limitation and any other data assimilation tech- niques can be utilized to evaluate the estimation performance using CCV. 2. Second, each estimated K-field, that is discretized by numerical grid blocks, is considered as an image where pixels correspond to grid blocks in the esti- mated field. Consequently, the value of the estimated K in each discretized block represents the color value in the corresponding image. 3. Next, the color coherence vectors of images associated with the estimated fields and the true K-field are calculated by the procedure described in Sec- tion 5.2.2. 4. Finally, we designate a rank to each estimated field by comparing the CCVs corresponding to the estimated K-field (image) and the true K-field. The assigned rank<, which represents the similarity between each estimated field and the true field, is evaluated as follows [175], < = m X i=1 ξ i −ξ t i ξ i +ξ t i + 1 + η i −η t i η i +η t i + 1 ! , (5.7) where ξ t i and η t i specify the elements in the CCV of the true K-field. As it can be observed in Eq. (5.7), lower rank indicates higher similarity between the estimated field and the true field and consequently better performance in the parameter estimation. 121 5.4 Illustration In order to verify the performance of the CCV metric for the evaluation of EnKF-based parameter estimation, we carried out numerical experiments. First, the problem configuration, considered in our EnKF procedure for obtaining the estimated K-fields, is described in Section 5.4.1. Next, in Section 5.4.2, several metrics, including the CCV criterion, are employed to assess the performance of the parameter estimation. This assessment is done utilizing the estimatedK-fields from Section 5.4.1. Finally, the superiority of the CCV method over the other existing metrics is discussed in Section 5.4.2. 5.4.1 Problem Setup and Data Assimilation We start by generating an initial ensemble of heterogeneous K-fields withN mc number of realizations. For the purpose of illustration, the log-conductivity field, Y =ln[K] is modeled as a statistically stationary random space function with mean μ, variance σ 2 , correlation length λ and isotropic exponential covariance model. To randomly generate the spatially variable Y, we employ SGeMS [183]. Further- more, a reference field, denoted by Y ref =ln[K ref ], is drawn from another randomly generated K-fields with different properties, distinguished by the subscript "ref" (i.e. μ ref , σ 2 ref and λ ref ). This reference field will serve as the synthetic true field. A two-dimensional regularly-spaced grid is considered for the K-fields with dis- cretization size Δ in both directions. We mimic the numerical setup based on the grid and statistical convergence analyses performed in [19]. The length of the generated fields in each direction is denoted by L. The properties of the K-fields for the ensemble and the reference field, along with their values, are listed in Table 6.1. 122 Table 5.2: Summary of the parameters used in this study. Parameter Value Unit L 50 m Δ 1 m N mc 300 - μ 0 - σ 2 1.2 - λ 10 m μ ref 0 - σ 2 ref 1.5 - λ ref 9 m n k (Scenario I) 16 - n h (Scenario I) 25 - n k (Scenario II) 30 - n h (Scenario II) 42 - S s 10 −4 - h in 20 m h l 20 m h r 0 m Δt 0.1 d N iter 10 - J 0.05 - α L 0.1 m 2 /d α T 0.02 m 2 /d φ 0.3 - ` 10 m y 0 20 m y t 30 m x s 5 m x cp 45 m 123 A fully saturated transient groundwater flow is considered in the absence of sinks and sources. We consider no-flow boundary condition in transverse direction and initial hydraulic head (h in ) through the domain. Then, a fixed hydraulic head gradient is imposed in the longitudinal direction by specifying hydraulic head values at the left (h l ) and right (h r ) boundaries (see Table 6.1). This setup ensures that flow is uniform-in-the-mean along the longitudinal direction at all time steps. Flow is governed by the following partial differential equation: S s ∂h(x,t) ∂t = ∇· [K(x)∇h(x,t)], (5.8) whereS s denotesspecificstorage, xdelineatesthelocationinCartesiancoordinates and t represents time. The flow equation, which represents the forecast model in EnKF, is solved numerically using FloPy, a Python package that runs MODFLOW [61]. The state vector used in the EnKF procedure is constructed by stacking model parameters (K) and model variables (h) together. In order to assemble the observation vector in Eq. (5.4), n k number of K ref measurements are collected directly from the reference field and n h number of h measurements for each EnKF iteration step are drawn from the MODFLOW simulation based on the reference field. The measurement locations are uniformly selected across the reference field as shown in Fig. 6.1a. Moreover, the observations are perturbed by white noises, with zero mean and 0.001 standard deviation, to epitomize measurement errors. The model error introduced in Eq. (5.1) isassumed to be zerofor simplicity. The EnKF procedure is carried out forN iter total number of iterations, and the forecast model 124 forwards in time with Δt at each iteration. The values of all parameters, used in our study, are provided in Table 6.1. (a) (b) Hydraulic conductivity (K) measurements Hydraulic head (h) measurements 0 50 50 0 50 50 x x y y Figure 5.3: Measurement locations for K andh in (a) Scenario I and (b) Scenario II. Starting from the generated initial ensemble, the EnKF process with the afore- mentioned forecast model and observations is executed to estimate the K-field. Fig. 6.2b depicts the ensemble mean of the log-conductivity fields at the last itera- tion together with the reference log-conductivity field (Fig. 6.2a). The same EnKF procedure is repeated with higher number of observations (i.e. n k and n h ), and the obtained estimated ensemble mean of the log-conductivity fields from the last iteration is depicted in Fig. 6.2c. From now on, the lower and higher number of measurement setups are differentiated by Scenario I and Scenario II, respectively. The measurement locations associated with K and h in Scenario II are presented in Fig. 6.1b. 125 Figure 5.4: (a) Reference Y-field, (b) estimated Y-field with Scenario I and (c) estimated Y-field with Scenario II. 5.4.2 Performance Assessment By visually comparing the results obtained from Scenarios I and II (Fig. 6.2b and Fig. 6.2c) with the reference field (Fig. 6.2a), it can be observed that Scenario II results in a better estimation of the referenceK-field due to the presence of more measurements. To confirm this conjecture, flow and solute transport simulations are performed on theY ref and the estimatedY-fields obtained from Scenarios I and II. The idea is to investigate whether the model output responses resulting from hydrogeologicalprocessesontopoftheestimatedfieldswillleadtothesameresults obtained from the referenceK-field. One of the hydrogeological output responses, which is relevant to public health protection, is the value of the flux-averaged peak contaminant concentration observed at an environmentally sensitive location [184]. This is important mainly because the ultimate goal of parameter estimation is to accurately predict such output responses obtained from hydrogeological processes, occurring in the estimated K-fields. Therefore, evaluating the similarity between 126 such output responses, obtained from the estimatedK-fields and the referenceK- field, can be considered as an informative criterion for assessing the performance of parameter estimation. In order to compare the model outputs for both fields, we now consider a steady-stateincompressiblehorizontalflowfieldintheabsenceofsinksandsources. Consequently, the left-hand side of Eq. (6.1) equals to zero and the governing flow equation is modified as follows, ∇· [K(x)∇h(x,t)] = 0, (5.9) withno-flowboundaryconditionsinthetransverseboundariesandaconstanthead gradient, J, imposed in the longitudinal boundaries of the flow domain. Transport of an instantaneously released dissolved non-reactive tracer along a line source of transverse dimension (`), where` =|y t −y 0 | withy 0 andy t indicating the location of the endpoints of the tracer line, released at x s in longitudinal direction, is assumed to be governed by the advection–dispersion equation: ∂C(x,t) ∂t + v(x)·∇C(x,t) = ∇ [D d ·∇C(x,t)], (5.10) whereC(x,t) is the solute resident concentration, v(x) is the Darcy-scale velocity field and D d is the local-scale dispersion tensor which depends on the longitudinal and transverse dispersivities, namelyα L andα T . The porosity,φ, of the geological formation is assumed uniform. The values of all the parameters used in the steady- state flow and the transport simulations are listed in Table 6.1. 127 Similar to Eq. (6.1), flow is simulated using MODFLOW, whereas solute trans- portissolvedthroughtherandomwalkparticletrackingcode(RW3D)documented in [54] and [65]. One of the outputs of the solute transport simulation is the flux- averaged concentration breakthrough curve (BTC) at a given transverse control plane located at x cp in the longitudinal direction. From the concentration BTC, we extract the flux-averaged peak concentration, C p , of the plume at x cp . The location, x cp , of the control plane is reported in Table 6.1. Using the RW3D sim- ulator, the concentration BTC and the value of C p at x cp are evaluated based on the reference and estimated K-fields. The results are presented in Fig. 5.5 and the values of C p are summarized in Table 5.3. Figure 5.5: The concentration BTCs obtained from the reference K-field and the estimated K-fields in Scenarios I and II. As observed in Fig. 5.5, the concentration BTC obtained from Scenario II, when compared to Scenario I, better reproduces the concentration BTC achieved by the reference field. Furthermore, as presented in Table 5.3, the value of the C p , obtained from the transport simulations on the K-field estimated from Scenario II, when compared to Scenario I, is closer to its counterpart associated with the 128 true K-field (i.e. Y ref ). Moreover, the relative error, ε = Cp−C ref p C ref p , is reported in Table 5.3. The achieved results corroborate the conjecture that Scenario II, conditioned on more observation points, when compared to Scenario I, produced a better estimation of the true K-field. Table 5.3: Comparison of various criteria used to evaluate the performance of parameter estimations. Scenario C p ε RMSE < Scenario I 0.0062 0.27 0.47 18.85 Scenario II 0.0081 0.05 0.55 12.15 Reference 0.0085 - - - Next, the RMSE and the CCV-based metric< (see Eq. (5.7)) of the estimated ensemble mean resulted from Scenarios I and II are evaluated. The RMSE, which is a commonly-used criterion for measuring the goodness of estimations, is defined as follows, RMSE = v u u t 1 N b N b X z=1 (k z ref −hk z est i) 2 , (5.11) where z corresponds to the zth grid block, N b illustrates the total number of grid blocks. Furthermore,hk z est i and k z ref stand for the conductivity values associated with the zth grid block of the estimated ensemble mean and the reference field, respectively. Although in previous studies, the RMSE has shown good results in evaluating the accuracy of estimations, we have encountered several scenarios, like the one presented in Table 5.3, in which the RMSE metric fails (i.e. RMSE II > RMSE I ). 129 However, the CCV metric successfully quantifies the accuracy of parameter esti- mations. Meaning that, when C p obtained from the estimated Y is closer to the C p achieved fromY ref , the< shows a lower rank which implies a better parameter estimation. Therefore, the new introduced criterion CCV, which can evaluate the accuracy of estimations by considering the spatial structure embedded in the esti- mated parameters, outperforms the other commonly-used evaluation metric (i.e. RMSE). 5.5 Summary We proposed a new metric to evaluate the performance of parameter estima- tions obtained from spatial data assimilation techniques. By comparing our pro- posed CCV-based metric with the commonly used criterion in parameter estima- tion, the RMSE, we observed that the new metric outperforms the existing metric. The main property of the CCV-based metric that makes it a proper tool for assess- ing the performance of spatial data assimilation methods is that, as opposed to other commonly used criteria, it considers the information regarding the spatial configuration of the parameter. Using the EnKF method and the available h andK measurements, we estimated the K-field, considered as the model parame- ter, throughout a two-dimensional geological subsurface domain. Customarily, the goodness of the parameter estimation in hydrogeological models is evaluated by calculating the RMSE or visually comparing the estimated and referenceK-fields. However, it is not always possible to visually evaluate the estimation performance, especially when dealing with three-dimensional and large-scale systems. We obtained two distinct estimations of the K-field using EnKF with differ- ent number of measurements. Then, flow and transport processes are simulated to 130 obtainaphysicalresponse, peakconcentration, inthereferenceandbothestimated K-fields. Finally, the accuracy of the estimations are evaluated via RMSE, CCV, and the similarity of the concentration BTCs and the relative error of peak con- centrations obtained from the estimated K-fields and from the reference K-field. As expected, the estimated K-field with more data results in a better estimation, which is confirmed by the CCV-based metric and by comparing peak concentra- tions, while the RMSE metric failed to confirm that. Therefore, the CCV-based metric is a reliable and efficient criterion for evaluating the performance of spatial parameter estimations. 131 Chapter 6 Efficient Characterizations of Stochastic Hydrogeological Systems using Ensemble Kalman Filter Coupled with Wavelet Transformations Upscaling 1 In the previous chapter, we discussed that data assimilation methods, such as the Ensemble Kalman Filter (EnKF) technique, are customarily used to estimate hydrogeological parameters and consequently reduce the uncertainty associated with model predictions. However, the EnKF technique is computationally expen- sive since it is a Monte Carlo based data assimilation method which sequentially estimates model parameters. In this chapter we propose an efficient data assimi- lation framework by coupling the EnKF method with upscaling techniques, which results in a tremendous reduction in computational costs. In order to achieve this 1 The material presented in this chapter will be submitted to Physical Review E (Moslehi, de Barros, and Sahimi, 2018) 132 goal, weutilizeWTtocoarsencomputationalgridsassociatedwithhydrogeological properties. 6.1 Introduction Modeling flow and transport processes in subsurface domains are subject to uncertainty due to the presence of spatially heterogeneous hydrogeological proper- ties such as hydraulic conductivity. In order to capture the fluctuations of key envi- ronmental performance metrics (such as solute mass fluxes), high-resolution hydro- geological site characterization is required. However, due to the high cost of data acquisition, a full detailed characterization is not feasible and consequently only limited amount of measurements is available. Therefore, efficient data assimilation techniques should be utilized to characterize model parameters (e.g. hydraulic con- ductivity) using the available sparse measurements. The measurement network at the field site also dictates the scale of the grid blocks used in numerical models. In hydrogeological systems, various types of measurements, such as the hydraulic conductivity and hydraulic head are collected at different spatial and temporal scales. Thus, the challenge is to assimilate different types of data in a single framework to characterize model parameters. One well-known class of data assimilation methods is the Kalman Filter (KF) method, which is a sequential approach used to obtain least squares estimates of model parameters by requiring consistency with noisy observations and linear governing equations [153]. KF has been applied in hydrogeology to approach inverse problems [185, 186]. However, because of the excessive computational cost associated with the KF process, it is not practical to employ the method in large-scale and nonlinear systems. To over- come the aforementioned challenge, Evensen (1994) proposed the EnKF technique, 133 which is a Monte Carlo based approach that performs the required data assimi- lation calculations using a limited ensemble of model realizations [155]. EnKF has been applied to various disciplines such as oceanography [155, 156], meteorol- ogy [157], hydrology [158, 159, 160, 170], hydrogeology and petroleum engineering [161, 162, 163, 164, 165, 166, 167, 168, 169]. A key challenge in hydrogeological modeling is dealing with the inconsistency associated with the scale of the numerical grid block and the scale of the measure- ment support volume. In practice, hydrogeological data are collected at different support volumes [187, 188, 189] To tackle the inconsistency between measurement and numerical scales, one may need to consider numerical models at the scale of measurements (i.e. fine scales). Consequently, the parameter estimation (i.e. EnKF) procedure needs to be performed at the scale of measurements. However, this leads to high resolution models that are computationally extensive. In order to alleviate the computational burden associated with these high resolution data assimilation models, upscaling techniques can be utilized [133, 80]. In the literature, several upscaling methods have been utilized in the EnKF procedure with the aim of reducing the computational cost. Peters et al. (2010) used diagonal tensor upscaling to coarsen fine-scale permeability fields and com- bined it with EnKF for history matching as a part of their study [190]. Li et al. (2012) modeled transient groundwater flow using EnKF coupled with a flow upscaling approach [173]. In this work, the entire domain is upscaled uniformly, and measurement points and upscaled grids cannot be mapped precisely. Hence, the assimilated ensemble does not necessarily preserve the conditioning on the fine- scale observations. In order to address the aforementioned problem, we propose to couple WT upscaling with the EnKF method (EnKF-WT). 134 The WT upscaling method is capable of coarsening the high resolution het- erogeneous conductivity field non-uniformly, which allows to preserve the impor- tant information on the spatial distribution of the conductivity field, but coarsens those parts of the computational grid that contribute little to the flow field [133, 134, 132, 102, 136, 137, 66, 60]. Thus, the number of grid blocks in the computational mesh and, consequently, the number of simulations to be solved in the EnKF process are reduced drastically without sacrificing any crucial informa- tion about the conductivity field. An appealing characteristics of using WT to upscale hydrogeological properties is that it generates non-uniform upscaled grids. The WT approach enables to impose constraints to prevent specific zones of the geological formation from being upscaled. This feature makes WT upscaling a suitable choice to be coupled with the EnKF in which conditioning on small-scale observations needs to be preserved. The main goal of this chapter is to develop a novel and efficient data assimi- lation framework that simultaneously reduces the uncertainty and computational burdens associated with hydrogeological models. We provide a unified framework that combines the data assimilation using EnKF and upscaling via WT. For the theoretical background regarding the EnKF and WT, readers are referred to Chap- ter 5, Section 5.2.1 and Chapter 4, Section 4.3, respectively. The rest of this chap- ter is structured as follows. Next, Section 6.2 describes the steps required in our proposed EnKF-WT framework. The details of our numerical example and the results of the EnKF-WT method is presented in Section 6.3. Finally, the conclud- ing remarks are summarized in Section 6.4. 135 6.2 Methodology In this section, we describe the overall EnKF-WT procedure in which the WT upscaling method are combined with the EnKF data assimilation technique. In this chapter, and without the loss of generality, we are interested in estimating the spatial distribution of the K-field using the available hydraulic conductivity and hydraulic head measurements. The steps in the EnKF-WT framework are as follows: 1. Initial Ensemble: First, N mc number of high-resolution heterogeneous K- field realizations (model parameters) are generated. This set of N mc realiza- tions is denoted as an initial ensemble. 2. Upscaling: Second, each K-field grid is coarsened using the WT technique while the original resolution of grid blocks at the specific measurement loca- tions are preserved. 3. Forecast Model: Next, the flow process in the porous media is simulated through the upscaled K-fields. 4. State Vector Construction: Then, the upscaled K-fields and the desired model variables (i.e. hydraulic head) obtained from Step 3 are mapped into the high-resolution grid. Mapping is due to the fact that the size of state vectors should be the same in the assimilation step. 5. Data Assimilation: In this step, the state vectors are updated by recon- ciling the observed data with the results obtained from the forecast model (Step 3). 136 6. Iteration: Finally, the updatedK-fields achievedfrom Step5 areconsidered as a new ensemble and we return to Step 2 if new set of data is available. Additional information on the EnKF and WT upscaling methods can be found in Chapters 5 and 4, respectively. 6.3 Illustrative Example In order to assess the performance of the proposed EnKF-WT framework in parameter estimations, we carried out numerical experiments. The problem config- uration, considered in our data assimilation procedure for obtaining the estimated K-fields, is described in this section. We consider a two-dimensional regularly-spaced grid for the heterogenous K- fields. Following the statistical convergence analyses performed in Moslehi et al. (2015) [19], the discretization size of the numerical grid blocks in both direction is given by Δ. The length of the generated fields in each direction is denoted by L. We generate an initial ensemble of heterogeneous K-fields with N mc number of realizations (Step 1). For the purpose of illustration, the log-conductivity field, Y =ln[K] is modeled as a statistically stationary random space function with mean μ, variance σ 2 , correlation length λ and isotropic exponential covariance model. For our numerical example, we consider a reference field, denoted by Y ref =ln[K ref ] as the synthetic true conductivity field. The reference field is drawn from another randomly generated Y-fields with different properties, μ ref , σ 2 ref and λ ref . SGeMS is employed to randomly generate the described heterogeneous K-fields [183]. Next, each randomly-generated fine-scale K-field in the ensemble is coarsened usingWTupscalingtechnique(Step2)andtheobtainedupscaledK-fieldsareused 137 intheforecastmodel. Intheupscalingprocess, theoriginalresolutionofgridblocks at the measurement locations are preserved in order to maintain the conditioning on the fine-scale observations. To implement this, a constraint is imposed on the upscaling procedure that prevents marked areas around measurement points from being coarsened. For the forecast step (Step 3), a fully saturated transient groundwater flow is considered with no-flow boundary condition in transverse direction and initial hydraulichead(h in )throughthedomain. Inordertomimicthedescribedtransient flow, a fixed hydraulic head gradient is imposed in the longitudinal direction by stipulating hydraulic head values at the left (h l ) and right (h r ) boundaries. Flow is governed by the following partial differential equation in the absence of sinks and sources: S s ∂h(x,t) ∂t = ∇· [K(x)∇h(x,t)], (6.1) whereS s denotesspecificstorage, xdelineatesthelocationinCartesiancoordinates and t represents time. The flow equation, which represents the forecast model in the proposed EnKF- WTframework, issolvednumericallyusingMODFLOW-USG.Thisnumericalflow solver, MODFLOW-USG, is based on control volume finite difference formulations, which simulates flow in a wide variety of structured and unstructured grid types [191]. Sincethesizeofstatevectorsshouldbethesameintheassimilationstep, the upscaledK-fields and the corresponding hydraulic head obtained from the forecast model are mapped into the original high-resolution grid. Then, the state vector used in the EnKF-WT procedure is constructed by stacking the mapped fine-scale 138 model parameters (K) and model variables (h) together (Step 4). In order to assemble the observation vector in Eq. (5.4),n k number ofK ref measurements are collected directly from the reference field. Furthermore, at each data assimilation step of the EnKF-WT procedure, the values of observation for h are updated in the observation vector since the flow is transient and the values of h dynamically vary throughout the domain. Therefore, at each EnKF-WT iteration, n h number ofh measurements are drawn from the MODFLOW-USG simulation based on the reference field, so that they can be used in the construction of the observation vector. The measurement locations are uniformly selected across the reference field as illustrated in Fig. 6.1. Figure 6.1: Schematic representation of the computational domain and the mea- surement locations for both K and h. Moreover, the observations are perturbed by white noises, with zero mean and 0.001 standard deviation, to epitomize measurement errors. The model error intro- duced in Eq. (5.1) is assumed to be zero for simplicity. The observed data and the results obtained from the forecast model are combined to update the state vector (Step 5). The EnKF procedure is carried out for N iter total number of iterations 139 (Step 6), and the forecast model forwards in time with Δt at each iteration. The aforementioned physical set-up is selected for the purpose of illustration and differ- ent types of numerical configuration and boundary conditions can be incorporated. The properties of the parameters used in this chapter, along with their values, are reported in Table 6.1. Table 6.1: Summary of the model parameters used in this study. Parameter Value Unit L 64 m Δ 1 m N mc 300 - μ 0 - σ 2 1.2 - λ 10 m μ ref 0 - σ 2 ref 1.5 - λ ref 9 m n k 16 - n h 16 - S s 10 −4 - h in 20 m h l 20 m h r 0 m Δt 0.1 d N iter 10 - 140 Starting from the generated initial ensemble, the EnKF-WT process with the aforementionedforecast model and observations is executed to estimate theK-field using the steps described in Section 6.2. Fig. 6.2c depicts the ensemble mean of the log-conductivity fields at the last iteration together with the reference log-conductivity field (Fig. 6.2a). To evaluate theperformanceoftheEnKF-WTframework, theresultsobtainedfromtheEnKF- WT are compared with the regular EnKF data assimilation with the same initial K-field ensemble and measurements. The estimated ensemble mean of the log- conductivity fields using the EnKF method is presented in Fig. 6.2b. 16 32 48 64 16 32 48 64 16 32 48 64 16 32 48 64 16 32 48 64 16 32 48 64 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 (b) (a) (c) Figure 6.2: Estimated ensemble mean of conductivity fields obtained from EnKF- WT (c) and from EnKF (b) together with the reference conductivity field (a). As observed in Fig. 6.2, both estimated conductivity fields reproduced the true conductivity field successfully. In order to quantify the accuracy of parame- ter estimations, root mean square error (RMSE) and the CCV-based metric (<), described in the previous chapter (see Eq. 5.7 in Chapter 5), are calculated and reported in Table 6.2. As seen in Table 6.2, the RMSE and CCV-based metric (<) values obtained from the results achieved by the EnKF-WT framework are similar to the their counterparts in the regular EnKF procedure. However, when using our proposed EnKF-WT approach, the computational time is significantly 141 reduced, when compared to the regular EnKF. The computational time used in both data assimilation methods are presented in Table 6.2. Table 6.2: Performance comparison of our proposed EnKF-WT framework with the regular EnKF methods in terms of computational costs and the accuracy of parameter estimations. . Methods RMSE < Runtime (sec) EnKF-WT 0.7216 15.02 96.88 EnKF 0.7104 14.13 3448.05 As observed in Table 6.2, the computational time required in the EnKF-WT data assimilation technique is approximately two orders of magnitude less than the computational time needed in the regular EnKF method. Therefore, results indicate that using our proposed EnKF-WT framework leads to enormous compu- tational cost reduction, without sacrificing the accuracy of parameter estimations. 6.4 Summary In this chapter, we proposed a computationally efficient data assimilation framework by combining the EnKF and WT upscaling to estimate model parame- ters. Upscaling fine-scaleK-fields using WT in the EnKF forecast model results in a tremendous reduction in the number of computational grids (about 94% reduc- tion in the number of grid blocks). As an outcome, the computational cost, asso- ciated with the EnKF procedure, reduces significantly. Furthermore, our proposed method leads to a more accurate parameter estimation when compared with other upscaled EnKF methods since the EnKF-WT does not upscale uniformly which perturbs the data conditioning. This is due to the fact that in the WT upscaling 142 approach, we can impose a constraint to maintain the original fine scale of the grids at measurement locations which results in a more precise data conditioning and consequently, a better data assimilation. In conclusion, we have developed an efficient EnKF-WT framework that simultaneously reduces the uncertainty and computational costs associated with hydrogeological models and the predictions relevant to such models. 143 Chapter 7 Summary The main goal of this thesis is to provide smart approaches to optimally deal with stochastic numerical hydrogeological systems. In particular, enhancing the computational efficiency and reducing the uncertainty associated with hydrogeo- logical models are targeted. In general, hydrogeological predictions of interest are large-scale with exces- sive computational complexity and uncertain characteristics. Chapter 2 shows the cumbersomeness of typical problems in stochastic hydrogeology. We investigate how the structure of the heterogenous conductivity field affects the uncertainty of model predictions which are of relevance to probabilistic risk assessment and aquifer remediation. The impact of the spatially random heterogeneous porous media on the uncertainty of hydrogeological model predictions is scrutinized by the means of numerical MC simulations. Thus, to achieve statistically meaningful model predictions (such as the ones depicted in Chapter 2 and to explore the effect of a structural parameter of aquifers on such statistics, brute-force high resolu- tion MC numerical simulations are commonly performed which is computationally expensive. Therefore, a framework is required to efficiently allocate computational budgets in the stochastic numerical modeling of processes in spatially heteroge- neous porous media. In Chapter 3, the impact of both numerical and stochastic resolutions on the overallerrorapproximationofmodelpredictionssubjecttoacomputationalbudget 144 constraint are examined. By considering the trade-off among two conflicting objec- tives, i.e. the available computational resource and the accuracy of predictions, we develop a framework based on the bias-variance tradeoff analysis to determine the optimal physical and statistical resolutions for a given computational resource in order to minimize the error of predictions. We mainly focus on the existence of an appropriate choice of physical and stochastic resolutions that accurately estimate the model’s prediction with respect to a given computational resource constraint. To achieve this, we derive an error expression of the model response that accounts for both sources of error as well as its statistical dependency. As highlighted in this chapter, extensive, computationally demanding, stochastic numerical experiments are required to optimally obtain accurate model predictions. Therefore, to propose an optimal framework for stochastic hydrogeological modeling, an efficient upscal- ing method is required that alleviates computational burdens while maintaining the desired accuracy for model predictions. Chapter4developstheapplicationofamethodologyforupscalingofthespatial distribution of a hydraulic conductivity (or permeability) field in a computational grid for simulation of solute transport in spatially-heterogeneous geological forma- tions. The methodology is based on the WT that preserve the resolution of the high-conductivity paths, as well as that of the regions in which there are sharp con- trasts between the conductivities. The performance of the method was tested for a wide range of the important model parameters, ranging from the order of magni- tude variations in the conductivities and the nature of their correlations (positive versus negative), to the size of the computational grid. We demonstrated in all 145 cases that the upscaled grid constructed by the WT reproduces the model predic- tions very accurately when compared with their counterparts in the initial highly- resolved grids. From a computational point of view, the WT upscaling method generates non-uniform grids with a number of grid blocks that represent, on aver- age, only about 2% of the number of grid blocks in the original high-resolution models. Hence, the WT upscaling method results in an enormous reduction in the computational time of such stochastic numerical hydrogeological simulations. As stated previously, the uncertainty of hydrogeological systems arises from the multi-scale heterogeneity and insufficient measurements of their underlying param- eters such as hydraulic conductivity. An inadequate characterization of hydrogeo- logical properties can significantly decrease the trustworthiness of numerical mod- els that predict groundwater flow and solute transport. Therefore, to reduce the uncertainty associated with hydrogeological model predictions, a variety of data assimilation methods have been proposed. Such methods attempt to estimate hydrogeological parameters from spatially scarce data by incorporating them into the governing physical models. In Chapter 5, we first review the commonly-used data assimilation approach, EnKF, which reconciles multiple sources of measure- ments to sequentially estimate model parameters such as the hydraulic conductiv- ity. Next, we propose a novel framework for evaluating the performance of these estimation methods. Several methods have been used in the literature to quantify the accuracy of the estimations obtained by EnKF, including RMSE and Ensem- ble Spread. However, these commonly used methods do not regard the spatial information and variability of geological formations. This can cause hydraulic con- ductivity fields with very different spatial structures to have similar histograms or RMSE. We develop a vision-based approach that can quantify the accuracy of 146 estimations by considering the spatial structure embedded in the estimated fields. Our new approach consists of adapting a new metric, based on CCV, to evaluate the accuracy of estimated fields achieved by EnKF. We represent the estimated fields as digital three-channel images and use CCV to compare and quantify the accuracy of estimations. The sensitivity of CCV to spatial structure makes it a suitable metric for assessing the performance of spatial data assimilation tech- niques. The results in Chapter 5 show that the proposed CCV metric outperforms other existing evaluation metrics. As observed in Chapter 5, estimating hydrogeological properties using data assimilation methods, such as EnKF, is computationally demanding since it is a MC-based approach and it iteratively updates the model parameters and variables until reaching a desired accuracy. Therefore, in Chapter 6 we develop an efficient framework that combines the EnKF technique with the WT upscaling to reduce simultaneously the uncertainty of model predictions, the complexity of the numer- ical simulations and most importantly the computational cost corresponding to data assimilation in stochastic hydrogeological modeling. The performance of our proposed framework is tested against the regular EnKF to verify the computa- tional efficiency of the EnKF-WT method. The obtained results indicate that the proposed EnKF-WT framework outperforms the regular EnKF by decreasing the computationalcostswithseveralorderofmagnitude, whilepreservingtheaccuracy of the model parameter estimation and consequently model predictions. Further- more, our proposed method has a superiority over other upscaled EnKF methods existed in the literature since the EnKF-WT does not upscale the hydrogeological fields uniformly which prevents perturbation in the zones where data was collected. 147 This is due to the fact that in WT upscaling, we can impose a constraint to main- tain the original fine scale of the grids at measurement locations which results in a more precise data conditioning and consequently, a better data assimilation. In conclusion, we have developed a unified efficient framework that simultane- ously reduces the uncertainty and computational costs associated with hydrogeo- logical models and the predictions relevant to such models. 148 Reference List [1] F. P. J. de Barros, S. Ezzedine, Y. Rubin, Impact of hydrogeological data on measures of uncertainty, site characterization and environmental perfor- mance metrics, Adv Water Resour 36 (2012) 51 – 63. doi:10.1016/j. advwatres.2011.05.004. [2] M. Rolle, C. Eberhardt, G. Chiogna, O. A. Cirpka, P. Grathwohl, Enhance- ment of dilution and transverse reactive mixing in porous media: Experi- ments and model-based interpretation, J Contam Hydrol 110 (3–4) (2009) 130 – 142. doi:http://dx.doi.org/10.1016/j.jconhyd.2009.10.003. [3] M. Dentz, T. Le Borgne, A. Englert, B. Bijeljic, Mixing, spreading and reactioninheterogeneousmedia: Abriefreview, JContamHydrol120(2011) 1–17. doi:10.1016/j.jconhyd.2010.05.002. [4] F. P. J. de Barros, A. Fiori, F. Boso, A. Bellin, A theoretical framework for modeling dilution enhancement of non-reactive solutes in heterogeneous porous media, J Contam Hydrol 175–176 (2015) 72 – 83. doi:http://dx. doi.org/10.1016/j.jconhyd.2015.01.004. [5] Y. Rubin, Applied Stochastic Hydrogeology, Oxford University Press, USA, 2003. [6] P. K. Kitanidis, E. G. Vomvoris, A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations, Water Resour Res 19 (3) (1983) 677–690. doi:10.1029/ WR019i003p00677. [7] W.W.-G.Yeh,Reviewofparameteridentificationproceduresingroundwater hydrology: The inverse problem, Water Resour Res 22 (2) (1986) 95–108. doi:10.1029/WR022i002p00095. [8] Y. Rubin, X. Chen, H. Murakami, M. Hahn, A Bayesian approach for inverse modeling, data assimilation, and conditional simulation of spatial random fields, Water Resour Res 46 (10) (2010) n/a–n/a, w10523. doi:10.1029/ 2009WR008799. 149 [9] R. Helmig, et al., Multiphase flow and transport processes in the subsurface: a contribution to the modeling of hydrosystems., Springer-Verlag, 1997. [10] D. Fernàndez-Garcia, D. Bolster, X. Sanchez-Vila, D. M. Tartakovsky, A bayesian approach to integrate temporal data into probabilistic risk analysis of monitored napl remediation, Adv Water Resour 36 (2012) 108–120. [11] D.M.Tartakovsky, Assessmentandmanagementofriskinsubsurfacehydrol- ogy: A review and perspective, Adv Water Resour 51 (2013) 247–260. [12] R.M.Maxwell, W.E.Kastenberg, Y.Rubin, Amethodologytointegratesite characterization information into groundwater-driven health risk assessment, Water Resour Res 35 (9) (1999) 2841–2855. doi:10.1029/1999WR900103. [13] F. P. J. de Barros, Y. Rubin, R. M. Maxwell, The concept of comparative information yield curves and its application to risk-based site characteriza- tion, WaterResourRes45(6)(2009)n/a–n/a. doi:10.1029/2008WR007324. [14] A. Porporato, P. Dodorico, F. Laio, L. Ridolfi, I. Rodriguez-Iturbe, Eco- hydrology of water-controlled ecosystems, Adv Water Resour 25 (8) (2002) 1335–1348. [15] R. Ababou, D. McLaughlin, L. Gelhar, A. Tompson, Numerical simulation of three-dimensional saturated flow in randomly heterogeneous porous media, Transport Porous Med 4 (6) (1989) 549–565. doi:10.1007/BF00223627. [16] J.-R. de Dreuzy, A. Beaudoin, J. Erhel, Asymptotic dispersion in 2d hetero- geneous porous media determined by parallel numerical simulations, Water Resour Res 43 (10) (2007) W10439. doi:10.1029/2006WR005394. [17] S. Asmussen, P. W. Glynn, Stochastic Simulation: Algorithms and Analysis: Algorithms and Analysis, Vol. 57, Springer, 2007. [18] M. Sahimi, Flow and transport in porous media and fractured rock: from classical methods to modern approaches, John Wiley & Sons, 2011. [19] M. Moslehi, R. Rajagopal, F. P. J. de Barros, Optimal allocation of compu- tational resources in hydrogeological models under uncertainty, Adv Water Resour 83 (2015) 299–309. doi:10.1016/j.advwatres.2015.06.014. [20] P. C. Leube, W. Nowak, G. Schneider, Temporal moments revisited: Why there is no better way for physically based model reduction in time, Water Resour Res 48 (11) (2012) n/a–n/a. doi:10.1029/2012WR011973. 150 [21] P. C. Leube, F. P. J. de Barros, W. Nowak, R. Rajagopal, Towards optimal allocation of computer resources: Trade-offs between uncertainty quantifica- tion, discretization and model reduction, Environ. Model. Softw. 50 (2013) 97–107. doi:10.1016/j.envsoft.2013.08.008. [22] W. Nowak, Y. Rubin, F. P. J. Barros, A hypothesis-driven approach to optimize field campaigns, Water Resour Res 48 (6). doi:10.1029/ 2011WR011016. [23] R. Enzenhoefer, W. Nowak, R. Helmig, Probabilistic exposure risk assess- ment with advective dispersive well vulnerability criteria, Adv Water Resour 36 (2012) 121 – 132. doi:http://dx.doi.org/10.1016/j.advwatres. 2011.04.018. [24] E.Frind, J.Molson, D.Rudolph, Wellvulnerability: Aquantitativeapproach for source water protection, Ground Water 44 (5) (2006) 732–742. doi: 10.1111/j.1745-6584.2006.00230.x. [25] Y. Rubin, G. Dagan, Conditional estimation of solute travel time in hetero- geneous formations: Impact of transmissivity measurements, Water Resour Res 28 (4) (1992) 1033–1040. doi:10.1029/91WR02759. [26] F. P. J. de Barros, Y. Rubin, R. M. Maxwell, The concept of comparative information yield curves and its application to risk-based site characteriza- tion, Water Resour Res 45 (6). [27] C. V. Henri, D. Fernàndez-Garcia, F. P. J. de Barros, Probabilistic human health risk assessment of degradation-related chemical mixtures in heteroge- neous aquifers: Risk statistics, hot spots, and preferential channels, Water Resour Res 51 (6) (2015) 4086–4108. doi:10.1002/2014WR016717. [28] V. Kapoor, P. K. Kitanidis, Concentration fluctuations and dilution in aquifers, Water Resour Res 34 (5) (1998) 1181–1193. [29] A. Fiori, G. Dagan, Concentration fluctuations in aquifer transport: A rig- orous first-order solution and applications, J Contam Hydrol 45 (1) (2000) 139–163. [30] A. Fiori, The Lagrangian concentration approach for determining dilu- tion in aquifer transport: Theoretical analysis and comparison with field experiments, Water Resour Res 37 (12) (2001) 3105–3114. doi:10.1029/ 2001WR000228. [31] Y. Rubin, Transport in heterogeneous porous media: Prediction and uncer- tainty, Water Resour Res 27 (7) (1991) 1723–1738. 151 [32] V. Kapoor, L. W. Gelhar, Transport in three-dimensionally heterogeneous aquifers: 1. dynamics of concentration fluctuations, Water Resour Res 30 (6) (1994) 1775–1788. [33] Z.J.Kabala, G.Sposito, Statisticalmomentsofreactivesoluteconcentration in a heterogeneous aquifer, Water Resour Res 30 (3) (1994) 759–768. [34] F. P. J. de Barros, A. Fiori, First-order based cumulative distribution func- tion for solute concentration in heterogeneous aquifers: Theoretical analysis and implications for human health risk assessment, Water Resour Res 50 (5) (2014) 4018–4037. doi:10.1002/2013WR015024. [35] X.-H. Wen, J. Gomez-Hernandez, Numerical modeling of macrodispersion in heterogeneous media: a comparison of multi-Gaussian and non-multi- Gaussian models, J Contam Hydrol 30 (1998) 129–156. doi:http://dx. doi.org/10.1016/S0169-7722(97)00035-1. [36] D. A. Benson, R. Schumer, M. M. Meerschaert, S. W. Wheatcraft, Frac- tional dispersion, Lévy motion, and the MADE tracer tests, in: Disper- sion in Heterogeneous Geological Formations, Springer, 2002, pp. 211–240. doi:10.1007/978-94-017-1278-1_11. [37] B. Zinn, C. F. Harvey, When good statistical models of aquifer heterogeneity go bad: A comparison of flow, dispersion, and mass transfer in connected and multivariate Gaussian hydraulic conductivity fields, Water Resour Res 39 (3). [38] X.Sanchez-Vila, A.Guadagnini, D.Fernàndez-Garcia, Conditionalprobabil- ity density functions of concentrations for mixing-controlled reactive trans- port in heterogeneous aquifers, Math Geosci 41 (3) (2009) 323–351. [39] M. Dentz, D. M. Tartakovsky, Probability density functions for passive scalars dispersed in random velocity fields, Geophys Res Lett 37 (24) (2010) L24406. doi:10.1029/2010GL045748. [40] F. J. Molz, H. Rajaram, S. Lu, Stochastic fractal-based models of het- erogeneity in subsurface hydrology: Origins, applications, limitations, and future research questions, Rev Geophys 42 (1) (2004) RG1002. doi: 10.1029/2003RG000126. [41] V. di Federico, S. P. Neuman, Transport in multiscale log conductivity fields with truncated power variograms, Water Resour Res 34 (5) (1998) 963–973. doi:10.1029/98WR00221. 152 [42] M. Sahimi, S. E. Tajer, Self-affine fractal distributions of the bulk density, elastic moduli, and seismic wave velocities of rock, Phys Rev E 71 (4) (2005) 046301. doi:10.1103/PhysRevE.71.046301. [43] T. Hewett, et al., Fractal distributions of reservoir heterogeneity and their influence on fluid transport, in: SPE Annual Technical Conference and Exhi- bition, Society of Petroleum Engineers, 1986. doi:10.2118/15386-MS. [44] M. Sahimi, Y. C. Yortsos, Applications of fractal geometry to porous media: a review, in: Annual Fall Meeting of the Society of Petroleum Engineers, New Orleans, LA, Vol. 3, 1990. [45] G. Dagan, The significance of heterogeneity of evolving scales to transport in porous formations, Water Resour Res 30 (12) (1994) 3327–3336. [46] A. Bellin, A. Fiori, Non-ergodic solute transport in self-similar porous forma- tions: the effect of conditioning, Adv Water Resour 26 (7) (2003) 759–771. [47] A. Bellin, M. Pannone, A. Fiori, A. Rinaldo, On transport in porous for- mations characterized by heterogeneity of evolving scales, Water Resour Res 32 (12) (1996) 3485–3496. [48] A. Fiori, H. Gehrels, N. Peters, E. Hoehn, K. Jensen, C. Leibundgut, J. Grif- fioen, B. Webb, W. Zaadnoordijk, et al., Analysis of the longitudinal disper- sionofnon-reactivesolutesinlong-rangecorrelatedpermeabilityfields, IAHS Publication (2001) 255–262. [49] V. di Federico, Y.-K. Zhang, Solute transport in heterogeneous porous media with long-range correlations, Water Resour Res 35 (10) (1999) 3185–3191. doi:10.1029/1999WR900021. [50] M. Riva, A. Guadagnini, Effects of evolving scales of heterogeneity on hydraulic head predictions under convergent flow conditions, Hydrogeol J 17 (4) (2009) 817–825. [51] K.-C. Chen, K.-C. Hsu, A general fractal model of flow and solute transport in randomly heterogeneous porous media, Water Resour Res 43 (12). [52] K.-C. Hsu, K.-C. Chen, Multiscale flow and transport model in three- dimensional fractal porous media, Stoch Environ Res Risk Assess 24 (7) (2010) 1053–1065. [53] M. Riva, M. Panzeri, A. Guadagnini, S. P. Neuman, Simulation and analysis of scalable non-Gaussian statistically anisotropic random functions, J Hydrol 531, Part 1 (2015) 88 – 95. doi:http://dx.doi.org/10.1016/j.jhydrol. 2015.06.066. 153 [54] P. Salamon, D. Fernandez-Garcia, J. J. Gomez-Hernandez, A review and numerical assessment of the random walk particle tracking method, J Con- tam Hydrol 87 (3–4) (2006) 277 – 305. doi:http://dx.doi.org/10.1016/ j.jconhyd.2006.05.005. [55] B. B. Mandelbrot, J. W. Van Ness, Fractional Brownian motions, fractional noises and applications, SIAM review 10 (4) (1968) 422–437. [56] A.Fournier, D.Fussell, L.Carpenter, Computerrenderingofstochasticmod- els, Commun ACM 25 (6) (1982) 371–384. [57] M. F. Barnsley, R. L. Devaney, B. B. Mandelbrot, H. Peitgen, D. Saupe, R. F. Voss, Y. Fisher, M. McGuire, The science of fractal images, Springer Publishing Company, Incorporated, 2011. [58] A. Kreft, A. Zuber, On the use of the dispersion model of fluid flow, Appl Radiat Isotopes 30 (11) (1979) 705 – 708. doi:http://dx.doi.org/10. 1016/0020-708X(79)90113-3. [59] E. R. Siirila, R. M. Maxwell, A new perspective on human health risk assessment: Development of a time dependent methodology and the effect of varying exposure durations, Sci Total Environ 431 (2012) 221 – 232. doi:http://dx.doi.org/10.1016/j.scitotenv.2012.05.030. [60] M. Moslehi, F. P. J. de Barros, F. Ebrahimi, M. Sahimi, Upscaling of solute transport in disordered porous media by wavelet transformations, Adv Water Resour 96 (2016) 180 – 189. doi:http://dx.doi.org/10.1016/j. advwatres.2016.07.013. [61] A. W. Harbaugh, MODFLOW-2005, the US Geological Survey modular ground-water model: the ground-water flow process, US Department of the Interior, US Geological Survey, 2005. [62] D. Fernàndez-Garcia, T. H. Illangasekare, H. Rajaram, Differences in the scale-dependence of dispersivity estimated from temporal and spatial moments in chemically and physically heterogeneous porous media, Adv Water Resour 28 (7) (2005) 745 – 759. doi:http://dx.doi.org/10.1016/ j.advwatres.2004.12.011. [63] D. Fernàndez-Garcia, X. Sanchez-Vila, Optimal reconstruction of concen- trations, gradients and reaction rates from particle distributions, J Contam Hydrol 120-121 (0) (2011) 99 – 114. doi:http://dx.doi.org/10.1016/j. jconhyd.2010.05.001. 154 [64] D. Pedretti, D. Fernàndez-Garcia, An automatic locally-adaptive method to estimate heavily-tailed breakthrough curves from particle distributions, Adv Water Resour 59 (2013) 52 – 65. doi:http://dx.doi.org/10.1016/ j.advwatres.2013.05.006. [65] C. V. Henri, D. Fernàndez-Garcia, Toward efficiency in heterogeneous mul- tispecies reactive transport modeling: A particle-tracking solution for first- order network reactions, Water Resour Res 50 (9) (2014) 7206–7230. doi: 10.1002/2013WR014956. [66] M. R. Rasaei, M. Sahimi, Upscaling of the permeability by multiscale wavelet transformations and simulation of multiphase flows in heteroge- neous porous media, Comput Geosci 13 (2) (2009) 187–214. doi:10.1007/ s10596-008-9111-0. [67] H. Bhat, N. Kumar, On the derivation of the Bayesian information criterion, School of Natural Sciences, University of California. [68] A. Bellin, D. Tonina, Probability density function of non-reactive solute con- centration in heterogeneous porous formations, J Contam Hydrol 94 (1) (2007) 109–125. doi:http://dx.doi.org/10.1016/j.jconhyd.2007.05. 005. [69] M. B. Wilk, R. Gnanadesikan, Probability plotting methods for the analy- sis for the analysis of data, Biometrika 55 (1) (1968) 1–17. doi:10.1093/ biomet/55.1.1. [70] J. W. Falco, R. V. Moraski, Methods Used in the United States for the Assessment and Management of Health Risk Due to Chemicals, Springer US, Boston, MA, 1989, pp. 37–60. doi:10.1007/978-1-4684-5604-2_4. [71] C. E. Shannon, A mathematical theory of communication, ACM SIGMO- BILE Mobile Computing and Communications Review 5 (1) (2001) 3–55. [72] USEPA, Risk assessment forum white paper: Probabilistic risk assessment methods and case studies., EPA/100/R-09/001A. Washington, D.C.: Risk Assessment Forum, Office of the Science Advisor, USEPA. [73] A. V. Lazo, P. Rathie, On the entropy of continuous probability distributions (corresp.), IEEETInformTheory24(1)(1978)120–122. doi:10.1109/TIT. 1978.1055832. [74] R. G. Ghanem, P. D. Spanos, Stochastic finite elements: a spectral approach, Vol. 387974563, Springer, 1991. 155 [75] H. Li, D. Zhang, Probabilistic collocation method for flow in porous media: Comparisons with other stochastic methods, Water Resour Res 43 (9). [76] S. Oladyshkin, W. Nowak, Data-driven uncertainty quantification using the arbitrary polynomial chaos expansion, Reliab Eng Syst Safe 106 (0) (2012) 179 – 190. doi:http://dx.doi.org/10.1016/j.ress.2012.05.002. [77] O.Cainelli, A.Bellin, M.Putti, Ontheaccuracyofclassicnumericalschemes for modeling flow in saturated heterogeneous formations, Adv Water Resour 47 (0) (2012) 43 – 55. [78] I. Battiato, D. M. Tartakovsky, A. M. Tartakovsky, T. D. Scheibe, Hybrid models of reactive transport in porous and fractured media, Adv Water Resour 34 (9) (2011) 1140–1150. [79] X.-H. Wen, J. J. Gómez-Hernández, Upscaling hydraulic conductivities in heterogeneous media: An overview, J Hydrol 183 (1) (1996) ix–xxxii. [80] Y. Rubin, A. Sun, R. Maxwell, A. Bellin, The concept of block-effective macrodispersivity and a unified approach for grid-scale-and plume-scale- dependent transport, J Fluid Mech 395 (1999) 161–180. [81] F. P. J. de Barros, Y. Rubin, Modelling of block-scale macrodispersion as a random function, J Fluid Mech 676 (2011) 514–545. doi:10.1017/jfm. 2011.65. [82] F. Ballio, A. Guadagnini, Convergence assessment of numerical monte carlo simulations in groundwater hydrology, Water Resour Res 40 (4) (2004) n/a– n/a. doi:10.1029/2003WR002876. [83] D. Pasetto, A. Guadagnini, M. Putti, A reduced-order model for monte carlo simulations of stochastic groundwater flow, Computat Geosci 18 (2) (2014) 157–169. doi:10.1007/s10596-013-9389-4. [84] P. C. Leube, F. P. J. de Barros, W. Nowak, R. Rajagopal, Towards optimal allocation of computer resources: Trade-offs between uncertainty quantifica- tion, discretization and model reduction, Environ Modell Softw 50 (2013) 97 – 107. doi:http://dx.doi.org/10.1016/j.envsoft.2013.08.008. [85] L. Surhone, M. Tennoe, S. Henssonow, Moving Particle Semi- Implicit Method, VDM Publishing, 2010. [86] W. Benz, Smooth particle hydrodynamics: A review, in: J. Buchler (Ed.), The Numerical Modelling of Nonlinear Stellar Pulsations, Vol. 302 of NATO ASI Series, Springer Netherlands, 1990, pp. 269–288. doi: 10.1007/978-94-009-0519-1$\_$16. 156 [87] F. Boso, F. P. J. de Barros, A. Fiori, A. Bellin, Performance analysis of statistical spatial measures for contaminant plume characterization toward risk-based decision making, Water Resour Res 49 (6) (2013) 3119–3132. doi: 10.1002/wrcr.20270. [88] J. Tannehill, D. Anderson, R. Pletcher, Computational fluid mechanics and heat transfer, Series in computational and physical processes in mechanics and thermal sciences, Taylor and Francis, Washington, DC, 1997. [89] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Artif Intell, Addison-Wesley, 1989. [90] M. Dorigo, V. Maniezzo, A. Colorni, Ant system: optimization by a colony of cooperating agents, IEEE Sys Man Cybern 26 (1) (1996) 29–41. doi: 10.1109/3477.484436. [91] A. Kaveh, M. Motie, M. Moslehi, Magnetic charged system search: a new meta-heuristic algorithm for optimization, Acta Mech 224 (1) (2013) 85–107. doi:10.1007/s00707-012-0745-6. [92] J. S. Liu, Monte Carlo strategies in scientific computing, springer, 2008. [93] N. Remy, A. Boucher, J. Wu, Applied geostatistics with SGeMS: A user’s guide, Cambridge University Press, 2009. [94] A. W. Harbaugh, MODFLOW-2005, the US Geological Survey modular ground-water model: The ground-water flow process, US Department of the Interior, US Geological Survey Reston, VA, USA, 2005. [95] P. Salamon, D. Fernandez-Garcia, J. J. Gomez-Hernandez, A review and numerical assessment of the random walk particle tracking method, J Con- tam Hydrol 87 (3-4) (2006) 277 – 305. doi:http://dx.doi.org/10.1016/ j.jconhyd.2006.05.005. [96] C. V. Henri, D. Fernandez-Garcia, Toward efficiency in heterogeneous mul- tispecies reactive transport modeling: A particle-tracking solution for first- order network reactions, Water Resour Res 50 (9) (2014) 7206–7230. doi: 10.1002/2013WR014956. [97] C. Zheng, Analysis of particle tracking errors associated with spatial dis- cretization, Groundwater 32 (5) (1994) 821–828. [98] G. Dagan, Transport in heterogeneous porous formations: spatial moments, ergodicity, and effective dispersion, Water Resour Res 26 (6) (1990) 1281– 1290. 157 [99] A. Fiori, On the influence of pore-scale dispersion in nonergodic transport in heterogeneous formations, Transport porous med 30 (1) (1998) 57–73. [100] C. A. Ferro, T. E. Jupp, F. H. Lambert, C. Huntingford, P. M. Cox, Model complexity versus ensemble size: allocating resources for climate prediction, Phil Trans R Soc A 370 (1962) (2012) 1087–1099. [101] D. Zhang, Z. Lu, An efficient, high-order perturbation approach for flow in random porous media via karhunen–loeve and polynomial expansions, J Comput Phys 194 (2) (2004) 773–794. [102] F. Ebrahimi, M. Sahimi, Multiresolution wavelet scale up of unstable mis- cible displacements in flow through heterogeneous porous media, Trans- port Porous Med 57 (1) (2004) 75–102. doi:10.1023/B:TIPM.0000032742. 05517.06. [103] A. E. Lawrence, Y. Rubin, Block-effective macrodispersion for numerical simulations of sorbing solute transport in heterogeneous porous formations, Adv Water Resour 30 (5) (2007) 1272–1285. [104] G. Dagan, Flow and transport in porous formations., Springer-Verlag GmbH & Co. KG., 1989. [105] F. P. J. de Barros, Y. Rubin, A risk-driven approach for subsurface site characterization, Water Resour Res 44 (1) (2008) W01414. doi:10.1029/ 2007WR006081. [106] J. Luo, M. Dentz, J. Carrera, P. Kitanidis, Effective reaction parameters for mixing controlled reactions in heterogeneous media, Water Resour Res 44 (2) (2008) W02416. doi:10.1029/2006WR005658. [107] S. Whitaker, The method of volume averaging. theory and applications of transport in porous media, The Netherlands: Kluwer Academic. [108] J. Wang, P. K. Kitanidis, Analysis of macrodispersion through volume aver- aging: comparison with stochastic theory, Stoch Envi Res Risk A 13 (1-2) (1999) 66–84. doi:10.1007/s004770050032. [109] B. D. Wood, F. Cherblanc, M. Quintard, S. Whitaker, Volume averaging for determining the effective dispersion tensor: Closure using periodic unit cells and comparison with ensemble averaging, Water Resour Res 39 (8) (2003) 1210. doi:10.1029/2002WR001723. [110] B. D. Wood, F. J. Valdés-Parada, Volume averaging: local and nonlocal closures using a green’s function approach, Adv Water Resour 51 (2013) 139–167. doi:10.1016/j.advwatres.2012.06.008. 158 [111] D. L. Koch, J. F. Brady, Dispersion in fixed beds, J Fluid Mech 154 (1985) 399–427. doi:10.1017/S0022112085001598. [112] J. Rubinstein, S. Torquato, Flow in random porous media: mathematical formulation, variational principles, and rigorous bounds, J Fluid Mech 206 (1989) 25–46. doi:10.1017/S0022112089002211. [113] L. W. Gelhar, C. L. Axness, Three-dimensional stochastic analysis of macrodispersion in aquifers, Water Resour Res 19 (1) (1983) 161–180. doi:10.1029/WR019i001p00161. [114] G. Dagan, Solute transport in heterogeneous porous formations, J Fluid Mech 145 (1984) 151–177. doi:10.1017/S0022112084002858. [115] S. P. Neuman, C. L. Winter, C. M. Newman, Stochastic theory of field-scale fickian dispersion in anisotropic porous media, Water Resour Res 23 (3) (1987) 453–466. doi:10.1029/WR023i003p00453. [116] S. Attinger, Generalized coarse graining procedures for flow in porous media, Computat Geosci 7 (4) (2003) 253–273. doi:10.1023/B:COMG.0000005243. 73381.e3. [117] A. L. Gutjahr, L. W. Gelhar, A. A. Bakr, J. R. MacMillan, Stochastic anal- ysis of spatial variability in subsurface flows: 2. evaluation and application, Water Resour Res 14 (5) (1978) 953–959. doi:10.1029/WR014i005p00953. [118] F. P. J. de Barros, M. Dentz, Pictures of blockscale transport: Effective versus ensemble dispersion and its uncertainty, Adv Water Resour 91 (2016) 11–22. doi:10.1016/j.advwatres.2016.03.004. [119] Y. Rubin, J. J. Gómez-Hernández, A stochastic approach to the problem of upscaling of conductivity in disordered media: Theory and unconditional numerical simulations, Water Resour Res 26 (4) (1990) 691–701. doi:10. 1029/WR026i004p00691. [120] A.Fiori, G.Dagan, I.Jankovic, Upscalingofsteadyflowinthree-dimensional highly heterogeneous formations, Multiscale Modeling & Simulation 9 (3) (2011) 1162–1180. doi:10.1137/110820294. [121] J. Warren, H. Price, et al., Flow in heterogeneous porous media, Soc Petrol Eng J 1 (03) (1961) 153–169. doi:10.2118/1579-G. [122] A. J. Desbarats, Numerical estimation of effective permeability in sand- shale formations, Water Resour Res 23 (2) (1987) 273–286. doi:10.1029/ WR023i002p00273. 159 [123] R. Ababou, Three-dimensional flow in random porous media, Ph.D. thesis, Massachusetts Institute of Technology (1988). [124] P. King, The use of renormalization for calculating effective permeability, Transport Porous Med 4 (1) (1989) 37–58. doi:10.1007/BF00134741. [125] P. King, A. Muggeridge, W. Price, Renormalization calculations of immis- cible flow, Transport Porous Med 12 (3) (1993) 237–260. doi:10.1007/ BF00624460. [126] S. Mukhopadhyay, M. Sahimi, Calculation of the effective permeabilities of field-scale porous media, Chem Eng Sci 55 (20) (2000) 4495–4513. doi: 10.1016/S0009-2509(00)00098-1. [127] P. Renard, G. De Marsily, Calculating equivalent permeability: a review, Adv Water Resour 20 (5) (1997) 253–278. doi:10.1016/S0309-1708(96) 00050-4. [128] A. Desbarats, Spatial averaging of hydraulic conductivity in three- dimensional heterogeneous porous media, Math Geol 24 (3) (1992) 249–267. doi:10.1007/BF00893749. [129] L. J. Durlofsky, Numerical calculation of equivalent grid block permeability tensors for heterogeneous porous media, Water Resour Res 27 (5) (1991) 699–708. doi:10.1029/91WR00107. [130] P. K. Kitanidis, Effective hydraulic conductivity for gradually vary- ing flow, Water Resour Res 26 (6) (1990) 1197–1208. doi:10.1029/ WR026i006p01197. [131] L. J. Durlofsky, R. C. Jones, W. J. Milliken, A nonuniform coarsen- ing approach for the scale-up of displacement processes in heterogeneous porous media, Adv Water Resour 20 (5) (1997) 335–347. doi:10.1016/ S0309-1708(96)00053-X. [132] M. Sahimi, Large-scale porous media and wavelet transformations, Comput Sci Eng 5 (4) (2003) 75–87. doi:10.1109/MCISE.2003.1208648. [133] A. R. Mehrabi, M. Sahimi, Coarsening of heterogeneous media: applica- tion of wavelets, Phys Rev Lett 79 (22) (1997) 4385–4388. doi:10.1103/ PhysRevLett.79.4385. [134] F. Ebrahimi, M. Sahimi, Multiresolution wavelet coarsening and analysis of transport in heterogeneous media, Physica A 316 (1) (2002) 160–188. doi:10.1016/S0378-4371(02)01199-8. 160 [135] A. Heidarinasab, B. Dabir, M. Sahimi, Multiresolution wavelet-based simu- lation of transport and photochemical reactions in the atmosphere, Atmos Environ38(37)(2004)6381–6397. doi:10.1016/j.atmosenv.2004.08.024. [136] F. Ebrahimi, M. Sahimi, Grid coarsening, simulation of transport pro- cesses in, and scale-up of heterogeneous media: Application of multireso- lution wavelet transformations, Mech Mater 38 (8) (2006) 772–785. doi: 10.1016/j.mechmat.2005.06.013. [137] M. R. Rasaei, M. Sahimi, Upscaling and simulation of waterflooding in het- erogeneous reservoirs using wavelet transformations: application to the spe- 10 model, Transport Porous Med 72 (3) (2008) 311–338. doi:10.1007/ s11242-007-9152-1. [138] C. K. Chui, Wavelet analysis and its applications, Academic press, 1992. [139] Y. Meyer, Wavelets and applications, Springer, 1992. [140] M. Holschneider, Wavelets, an analysis tool, Oxford University Press, 1995. [141] Y. Nievergelt, Y. Nievergelt, Wavelets made easy, Vol. 174, Springer, 1999. [142] G. Moridis, M. Nikolaou, Y. You, et al., The use of wavelet transforms in the solution of two-phase flow problems, Soc Petrol Eng J 1 (02) (1996) 169–178. doi:10.2118/29144-PA. [143] J. Kikani, M. He, et al., Multi-resolution analysis of long-term pressure tran- sient data using wavelet methods, in: SPE Annual Technical Conference and Exhibition, Society of Petroleum Engineers, 1998. doi:10.2118/48966-MS. [144] P. Lu, R. N. Horne, et al., A multiresolution approach to reservoir parameter estimation using wavelet analysis, in: SPE annual technical conference and exhibition, Society of Petroleum Engineers, 2000. doi:10.2118/62985-MS. [145] M. Sahimi, M. Hashemi, Wavelet identification of the spatial distribu- tion of fractures, Geophys Res lett 28 (4) (2001) 611–614. doi:10.1029/ 2000GL011961. [146] H. Risken, Fokker-planck equation, in: The Fokker-Planck Equation, Springer, 1984, pp. 63–95. [147] E. Pazhoohesh, H. Hamzehpour, M. Sahimi, Numerical simulation of ac con- duction in three-dimensional heterogeneous materials, Phys Rev B 73 (17) (2006) 174206. doi:10.1103/PhysRevB.73.174206. 161 [148] I. Daubechies, Orthonormal bases of compactly supported wavelets, Com- mun Pur App Math 41 (7) (1988) 909–996. doi:10.1002/cpa.3160410705. [149] W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical recipes in c3rd ed (2007). [150] S. P. Neuman, M. Riva, A. Guadagnini, On the geostatistical character- ization of hierarchical media, Water Resour Res 44 (2). doi:10.1029/ 2007WR006228. [151] U. G. Araktingi, F. Orr Jr, et al., Viscous fingering in heterogeneous porous media, SPE Advanced Technology Series 1 (01) (1993) 71–80. doi:10.2118/ 18095-PA. [152] M. J. King, H. Scher, Probability approach to multiphase and multicom- ponent fluid flow in porous media, Phys Rev A 35 (2) (1987) 929. doi: 10.1103/PhysRevA.35.929. [153] A. Gelb, Applied optimal estimation, MIT press, 1974. [154] W. Graham, D. McLaughlin, Stochastic analysis of nonstationary subsurface solute transport: 2. conditional moments, Water Resour Res 25 (11) (1989) 2331–2355. doi:10.1029/WR025i011p02331. [155] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J Geophys Res 99 (C5) (1994) 10143–10162. doi:10.1029/94JC00572. [156] G. Evensen, The ensemble Kalman Filter: Theoretical formulation and practical implementation, Ocean Dynam 53 (4) (2003) 343–367. doi: 10.1007/s10236-003-0036-9. [157] P. L. Houtekamer, H. L. Mitchell, Data assimilation using an ensem- ble Kalman Filter technique, Mon Weather Rev 126 (3) (1998) 796–811. doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2. [158] S. A. Margulis, D. McLaughlin, D. Entekhabi, S. Dunne, Land data assim- ilation and estimation of soil moisture using measurements from the south- ern great plains 1997 field experiment, Water Resour Res 38 (12). doi: 10.1029/2001WR001114. [159] R.H.Reichle, D.B.McLaughlin, D.Entekhabi, Hydrologicdataassimilation with the ensemble Kalman Filter, Mon Weather Rev 130 (1) (2002) 103–114. doi:10.1175/1520-0493(2002)130<0103:HDAWTE>2.0.CO;2. 162 [160] R. H. Reichle, J. P. Walker, R. D. Koster, P. R. Houser, Extended ver- sus ensemble Kalman Filtering for land data assimilation, J Hydrome- teorol 3 (6) (2002) 728–740. doi:10.1175/1525-7541(2002)003<0728: EVEKFF>2.0.CO;2. [161] G. Naevdal, L. M. Johnsen, S. I. Aanonsen, E. H. Vefring, et al., Reservoir monitoring and continuous model updating using ensemble Kalman Filter, in: SPE Annual Technical Conference and Exhibition, Soc Pet Eng, 2003. doi:https://doi.org/10.2118/84372-MS. [162] G. S. Herrera, G. F. Pinder, Space-time optimization of groundwater quality sampling networks, Water Resour Res 41 (12) (2005) n/a–n/a, w12407. doi: 10.1029/2004WR003626. [163] R. J. Lorentzen, G. Naevdal, B. Valles, A. Berg, A.-A. Grimstad, et al., Analysis of the ensemble Kalman Filter for estimation of permeability and porosityinreservoirmodels, in: SPEAnnualTechnicalConferenceandExhi- bition, Soc Pet Eng, 2005. doi:https://doi.org/10.2118/96375-MS. [164] M. Zafari, A. C. Reynolds, et al., Assessing the uncertainty in reservoir description and performance predictions with the ensemble Kalman Filter, in: SPE Annual Technical Conference and Exhibition, Soc Pet Eng, 2005. doi:https://doi.org/10.2118/95750-MS. [165] N. Liu, D. S. Oliver, Ensemble Kalman Filter for automatic history matching of geologic facies, J Pet Sci Eng 47 (3) (2005) 147–161. doi:http://dx.doi. org/10.1016/j.petrol.2005.03.006. [166] Y. Gu, D. S. Oliver, The ensemble Kalman Filter for continuous updating of reservoir simulation models, J Energy Res Technol 128 (1) (2006) 79–87. doi:10.1115/1.2134735. [167] Y. Chen, D. Zhang, Data assimilation for transient flow in geologic forma- tions via ensemble Kalman Filter, Adv Water Resour 29 (8) (2006) 1107– 1122. doi:http://dx.doi.org/10.1016/j.advwatres.2005.09.007. [168] A. Schoniger, W. Nowak, H.-J. Hendricks Franssen, Parameter estimation by ensemble kalman filters with transformed data: Approach and application to hydraulic tomography, Water Resour Res 48 (4) (2012) n/a–n/a, w04502. doi:10.1029/2011WR010462. [169] M. Panzeri, M. Riva, A. Guadagnini, S. P. Neuman, Data assimilation and parameter estimation via ensemble Kalman Filter coupled with stochastic moment equations of transient groundwater flow, Water Resour Res 49 (3) (2013) 1334–1344. doi:10.1002/wrcr.20113. 163 [170] M. Camporese, C. Paniconi, M. Putti, P. Salandin, Ensemble Kalman Filter data assimilation for a process-based catchment scale model of surface and subsurface flow, Water Resour Res 45 (10) (2009) n/a–n/a, w10421. doi: 10.1029/2008WR007031. [171] G. Evensen, Sampling strategies and square root analysis schemes for the enkf, Ocean Dynam 54 (6) (2004) 539–560. doi:10.1007/ s10236-004-0099-2. [172] G. Liu, Y. Chen, D. Zhang, Investigation of flow and transport processes at the MADE site using ensemble Kalman Filter, Adv Water Resour 31 (7) (2008) 975 – 986. doi:http://dx.doi.org/10.1016/j.advwatres.2008. 03.006. [173] L. Li, H. Zhou, H.-J. Hendricks Franssen, J. J. Gomez-Hernandez, Mod- eling transient groundwater flow by coupling ensemble Kalman Filtering and upscaling, Water Resour Res 48 (1) (2012) n/a–n/a. doi:10.1029/ 2010WR010214. [174] T. M. Hamill, Interpretation of rank histograms for verifying ensemble forecasts, Mon Weather Rev 129 (3) (2001) 550–560. doi:10.1175/ 1520-0493(2001)129<0550:IORHFV>2.0.CO;2. [175] G.Pass, R.Zabih, J.Miller, Comparingimagesusingcolorcoherencevectors, in: Proceedings of the Fourth ACM International Conference on Multimedia, MULTIMEDIA ’96, ACM, New York, NY, USA, 1996, pp. 65–73. doi: 10.1145/244130.244148. [176] C. L. Novak, S. A. Shafer, Anatomy of a color histogram, in: Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1992, pp. 599–605. doi:10.1109/CVPR.1992.223129. [177] G. Pass, R. Zabih, Histogram refinement for content-based image retrieval, in: Applications of Computer Vision, 1996. WACV ’96., Proceedings 3rd IEEE Workshop on, 1996, pp. 96–102. doi:10.1109/ACV.1996.572008. [178] L. Wen, G. Tan, Image retrieval using spatial multi-color coherence vectors mixing location information, in: 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, Vol. 1, 2008, pp. 299–302. doi:10.1109/CCCM.2008.323. [179] M. Salmi, B. Boucheham, Content based image retrieval based on cell color coherence vector (cell-ccv), in: 2014 4th International Symposium ISKO- Maghreb: Concepts and Tools for knowledge Management (ISKO-Maghreb), 2014, pp. 1–5. doi:10.1109/ISKO-Maghreb.2014.7033451. 164 [180] A. Vailaya, A. Jain, H. J. Zhang, On image classification: city vs. landscape, in: Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173), 1998, pp. 3–8. doi:10.1109/IVL.1998. 694464. [181] G.Evensen, Dataassimilation: theensembleKalmanFilter, SpringerScience & Business Media, 2009. [182] S. I. Aanonsen, G. Nævdal, D. S. Oliver, A. C. Reynolds, B. Vallès, et al., The ensemble kalman filter in reservoir engineering–a review, SPE Journal 14 (03) (2009) 393–412. doi:https://doi.org/10.2118/117274-PA. [183] N. Remy, A. Boucher, J. Wu, Applied Geostatistics with SGeMS: A User’s Guide, Cambridge University Press, 2009. [184] M. Moslehi, F. P. J. de Barros, Uncertainty quantification of environmental performance metrics in heterogeneous aquifers with long-range correlations, J Contam Hydrol 196 (2017) 21 – 29. doi:https://doi.org/10.1016/j. jconhyd.2016.12.002. [185] Z. Şen, Adaptive pumping test analysis, J hydrol 74 (3-4) (1984) 259–270. [186] J. Wilson, P. Kitanidis, M. Dettinger, State and parameter estimation in groundwater models, applications of kalman filter to hydrology, hydraulics, and water resources, L. Chiu (1978) 657–679. [187] R. Beckie, Measurement scale, network sampling scale, and groundwater model parameters, Water Resour Res 32 (1) (1996) 65–76. [188] X. Sánchez-Vila, J. Carrera, J. P. Girardi, Scale effects in transmissiv- ity, J Hydrol 183 (1) (1996) 1 – 22. doi:https://doi.org/10.1016/ S0022-1694(96)80031-X. [189] W. Labaky, J. Devlin, R. Gillham, Probe for measuring groundwater velocity at the centimeter scale, Environmental Science & Technology 41 (24) (2007) 8453–8458. doi:10.1021/es0716047. [190] L. Peters, R. Arts, G. Brouwer, C. Geel, S. Cullick, R. J. Lorentzen, Y. Chen, N. Dunlop, F. C. Vossepoel, R. Xu, et al., Results of the brugge benchmark study for flooding optimization and history matching, SPE Reservoir Eval- uation & Engineering 13 (03) (2010) 391–405. [191] S. Panday, C. D. Langevin, R. G. Niswonger, M. Ibaraki, J. D. Hughes, Modflow-usg version 1.4.00: An unstructured grid version of modflow for simulating groundwater flow and tightly coupled processes using a control volumefinite-differenceformulation,Tech.rep.,USGeologicalSurvey(2017). 165
Abstract (if available)
Abstract
Hydrogeological models, that represent flow and transport processes in subsurface domains, are subject to uncertainty due to the presence of spatially multi-scale heterogeneous hydrogeological properties, such as hydraulic conductivity, and the insufficient characterization of such domains. As an outcome, dependent variables, such as velocity field, contaminant concentration, and consequently hydrogeological predictions relevant to human health risks are prone to uncertainty. Therefore, stochastic numerical methods are utilized to capture the uncertainty of hydrogeological systems and model predictions. However, such methods are computationally expensive and efficient approaches are required to relieve the computational burden. In this study, various components that influence computational complexities of hydrogeological models are discussed. Then, by considering the interaction between these components, novel frameworks are developed to reduce the computational cost associated with stochastic numerical hydrogeological systems. In particular, first, a method is proposed to optimally allocate computational resources in numerical hydrogeological models under uncertainty. Next, the numerical complexity corresponding to hydrogeological models is alleviated tremendously by employing Wavelet transformations upscaling technique. Finally, a unified framework, which combines the ensemble Kalman filter data assimilation technique with the Wavelet transformations upscaling method, is developed to simultaneously reduce the computational costs associated with stochastic hydrogeological systems, and the uncertainty corresponding to hydrogeological model predictions. Results indicate that using our proposed frameworks leads to enormous computational cost reduction, by several orders of magnitude, while maintaining the accuracy of model predictions. The outcome of this research provides reliable solutions that can aid decision makers to efficiently model complex hydrogeological systems and to achieve accurate predictions relevant to aquifer remediations and human health risks.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Hybrid physics-based and data-driven computational approaches for multi-scale hydrologic systems under heterogeneity
PDF
Machine-learning approaches for modeling of complex materials and media
PDF
Algorithms for stochastic Galerkin projections: solvers, basis adaptation and multiscale modeling and reduction
PDF
Molecular- and continuum-scale simulation of single- and two-phase flow of CO₂ and H₂O in mixed-layer clays and a heterogeneous sandstone
PDF
Reproducible and rapid computational approaches for assessing contamination in natural aquifers
PDF
Groundwater contaminant transport predictions in a sustainable water management scenario
PDF
Efficient control optimization in subsurface flow systems with machine learning surrogate models
PDF
Efficient simulation of flow and transport in complex images of porous materials and media using curvelet transformation
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Risk assessment, intrinsic interpolation and computationally efficient models for systems under uncertainty
PDF
Stochastic data assimilation with application to multi-phase flow and health monitoring problems
PDF
Multiscale and multiresolution approach to characterization and modeling of porous media: From pore to field scale
PDF
Model, identification & analysis of complex stochastic systems: applications in stochastic partial differential equations and multiscale mechanics
PDF
Hydraulic fracturing and the environment: risk assessment for groundwater contamination from well casing failure
PDF
Stochastic and multiscale models for urban and natural ecology
PDF
Effective flow and transport properties of deforming porous media and materials: theoretical modeling and comparison with experimental data
PDF
Probabilistic data-driven predictive models for energy applications
PDF
Stochastic oilfield optimization under uncertainty in future development plans
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
Uncertainty management for complex systems of systems: applications to the future smart grid
Asset Metadata
Creator
Moslehi, Mahsa
(author)
Core Title
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Civil Engineering
Publication Date
10/31/2018
Defense Date
04/04/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
data assimilation,OAI-PMH Harvest,optimization,stochastic hydrogeology,uncertainty quantification,upscaling
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
de Barros, Felipe (
committee chair
), Sahimi, Muhammad (
committee member
), Sanders, Kelly (
committee member
)
Creator Email
mahsa.moslehy@gmail.com,Moslehi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-101143
Unique identifier
UC11675534
Identifier
etd-MoslehiMah-6923.pdf (filename),usctheses-c89-101143 (legacy record id)
Legacy Identifier
etd-MoslehiMah-6923.pdf
Dmrecord
101143
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Moslehi, Mahsa
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
data assimilation
optimization
stochastic hydrogeology
uncertainty quantification
upscaling