Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Reproducible and rapid computational approaches for assessing contamination in natural aquifers
(USC Thesis Other)
Reproducible and rapid computational approaches for assessing contamination in natural aquifers
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
I dedicate this thesis to Alberto, for the endless love, encouragement and support throughout my life. ii Acknowledgements I am grateful to many people for their support, understanding and help through my PhD study. First and above everyone else, I would like to thank my advisor, Dr Felipe P.J. de Barros, for being the best advisor that I could ever wish for. Dr de Barros is not only a brilliant, creative and knowledgeable Professor, but he is also an emphatic, kind and generous soul. The combination of all those unique and rare qualities made his guidance through my PhD and through my life an unforgettable experience. His excitement for new ideas, constant availability for discussions and wise supervising made our research challenges a way to express my creativity, push my limits and reinforce my attitude in addressing difficult tasks. Through his intelligent insights and profound knowledge he was able to transform small ideas in bright and unique research adventures. He gave me support in every single step of my graduate studies, going above and behind all my expectations and making every difficult moment an occasion to become a better person. His joyful and serene attitude transformed our meetings into moments of great laughs and pleasant discussions, and with his spontaneity and proactivity created the circumstances for unforgettable research group gathering. I conclude this course knowing that working that I was extremely lucky to have the opportunity to work with Dr de Barros and that I will never be able to find a better person. I would like to thank Dr Rizzo, for his help and constrictive inputs in my studies and acknowledge him for his research and computational software that have been considerably applied in the my PhD research. I would like to thank Dr Achiiro and Dr Nomura for their interest and help in developing an approach in an unfamiliar field of science. I would like to thank all the Staff of the Sonny Astani Department of Civil and Environmental Engineering for their constant help and for making our PhD students life so much better and fun in the department. A special thanks to Christine Hsieh for being constantly present throughout iii all the phases of my PhD studies and for being a solid and kind support for every challenge that I encountered. My most sincere thanks to my PhD collegues and friends for being next to me in the last year and for creating a supportive and enjoyable environment in our offices. To conclude I want to thank my family. My parents and my sisters are and have been a constant source of unconditional love and thanks to them I always found assistance and I never felt alone. iv Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Abstract xiii Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Overview and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2: Improving the computational efficiency of first arrival time uncertainty estimation using a connectivity-based ranking Monte Carlo method 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Minimum Hydraulic Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Generating a connectivity-based ranked hydraulic conductivity ensemble . 11 2.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.1 Physical set-up and input data . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 3: A Scalable Parallel Algorithm for Reactive Particle Tracking 25 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Reactive Random Walk Particle Tracking: An Overview . . . . . . . . . . . . . . 29 3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 An Optimized Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Particles Mass’ Re-allocation . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.3 Evaluation of Reaction Probability . . . . . . . . . . . . . . . . . . . . . . 39 3.3.4 Computation of the Reaction Product C . . . . . . . . . . . . . . . . . . . 41 3.4 Performance Assessment and Comparison with other Works . . . . . . . . . . . . 43 3.4.1 Accuracy and Speedup Analysis . . . . . . . . . . . . . . . . . . . . . . . 43 v 3.4.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2.1 Reactive Transport in a Well-Mixed Flow Scenario . . . . . . . . 48 3.4.2.2 Reactive Transport in a Plug Flow Scenario . . . . . . . . . . . . 50 3.4.2.3 Reactive Transport in a Viscous Flow Scenario . . . . . . . . . . 50 3.5 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5.1 Darcy Flow Past a Permeable Circular Inclusion . . . . . . . . . . . . . . 52 3.5.2 V ortical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Chapter 4: Computationally Efficient Characterization of Groundwater Contaminant Transport for Supporting Risk-Based Decision Making 60 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Methods and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.5 Application to Risk and Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5.1 Probability of Concentration Exceedance and Resilience Loss Maps . . . . 75 4.5.2 Impact of contaminant source efficiency on aquifer reliability . . . . . . . 79 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter 5: Summary 83 References 86 Appendices 95 A Appendix: Flow and transport formulations . . . . . . . . . . . . . . . . . . . . . 96 B Appendix: Derivation of the kernel function . . . . . . . . . . . . . . . . . . . . . 97 C Appendix: Comparison with an Eulerian solution for a 2-D non-well mixed reactive scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 D Appendix: Flow and transport equations . . . . . . . . . . . . . . . . . . . . . . . 104 vi List of Tables 2.1 Input parameters used in the simulations . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Input parameters used in the simulations. Left column reports the parameters’ values used for the Darcy flow past a permeable inclusion (DPC). Right column reports the parameters’ values for the Taylor-Green vortex (TGV) flow. . . . . . . . 53 4.1 Input parameters used in the simulations. . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 Schematic summary of what needed and what has to be installed by the user to run the tutorial that reproduces the data showed in the analysis carried out in this work. 70 vii List of Figures 2.1 Illustration of the correlation between first arrival times t 1% and the minimum hy- draulic resistanceR m . An ensemble of 2047 spatially random two-dimensional hydraulic conductivity fields were used for each level of heterogeneity. Heterogene- ity is measured through the log-conductivity varianceσ 2 Y . Results are obtained for σ 2 Y = 3 (blue) andσ 2 Y = 1 (red), using the parameters reported in Table 2.1. . . . . 10 2.2 Example of the connectivity-based ranked distribution methodology for N= 7. (a) The randomly generated K fields, each one of them characterized by a value of minimum hydraulic resistivityR m . (b) The K fields are reordered for increasing R m values. (c) A balanced binary search tree is built. (d) The subsets are cre- ated searching the tree through BFS algorithm that screens the tree following the directions indicated by the red dotted line in (c). The three generated subsets are highlighted by the horizontal square brackets in (d). . . . . . . . . . . . . . . . . . 13 2.3 Convergence analysis of the first arrival ( t 1% ) CDF. The probability that t 1% will take a value less than a specified value τ as a function of the number of Monte Carlo realizations N (where 1≤ N≤ 2047) is shown for spatially heterogeneous hydraulic conductivity fields characterized by σ 2 Y = 3. All curves reach their respective asymptotic value around N= 1500. . . . . . . . . . . . . . . . . . . . . 18 2.4 Convergence analysis for the (a) meanµ t 1% and (b) standard deviationσ t 1% of t 1% obtained from the connectivity-based ranked Monte Carlo (light blue) and for the classic Monte Carlo (dark blue). Results obtained for hydraulic conductivity fields characterized byσ 2 Y = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Relative errorE of (a) µ t 1% and (b) σ t 1% as a function of the ensemble size N. Results for the connectivity-based ranked Monte Carlo (red solid line) and for the classic Monte Carlo ensemble randomly shuffled 200 times (gray lines). The black solid line corresponds to the mean of the relative error over all 200 shuffles, while the black dotted line represents the confidence interval. . . . . . . . . . . . . . . . 20 2.6 Comparison of the convergence rate of the probabilities obtained from the traditional Monte Carlo method (solid line) and the proposed connectivity-based ranked Monte Carlo (dashed line) method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 viii 2.7 Convergence rate for the (a) mean and (b) standard deviation of t 1% forσ 2 Y = 1,3 and 4. Comparison between the proposed connectivity-based Monte Carlo method (filled markers) with the traditional Monte Carlo method (void markers). . . . . . . 22 2.8 Relative errorE of (a)µ t 1% and (b)σ t 1% as a function of the ensemble size N of 3D K fields. Results for the connective-based ranked Monte Carlo (red solid line) and for the classic Monte Carlo ensemble randomly shuffled 200 times (gray lines). The black solid line corresponds to the mean of the relative error over all 200 shuffles, while the black dotted line represents the confidence interval. . . . . . . . . . . . . 23 3.1 Illustration of the kernel function representing a generic particle P A a with a = 1,..,N A and N A denoting the total number of A particles in the computational domain covered by the grid G. The red grid shows the shape of the local support of the kernel function, the red dot represents P A a andλ g is the size of the squared grid cell 33 3.2 Graphical comparison between a Gaussian function and a tent function that share the same varianceσ 2 = 2D∆t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Schematic representation of one quarter of the kernel function of height H. The value of the kernel function at point Q, defined by the distance from the center of the kernel r, and the angle α, is h. Here d represents the length of the segment connecting the center of the kernel to the edge of its local support, passing through the point Q. The resolution of the grid is given byλ g and corresponds to half the side of the kernel function’s local support. . . . . . . . . . . . . . . . . . . . . . . 35 3.4 A (red) and B (blue) reagents in a 2D domain. (a.) A and B particles positions at the initial time denoted by t 0 , (b.) A and B particles positions after moving at the first simulation time step t 0 +∆t, (c.) representation of the grid G necessary for the reagents mass particles allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5 (a.) Mass re-allocation process of three particles of A reagent P A 1 , P A 2 and P A 3 . Each particle is represented by the kernel and its mass M A,a (with a=1,2,3) is distributed in each grid vertex accordingly to the value of the kernel on that grid vertex, centered where P A a is located and having H = M A,a , such that: M A,a =Σ j∈C(i) h a j , where i denotes the index of the cell where P A a is located and j the indexes of the i th cell’s vertices. (b.) New mass distribution configuration, after the mass re-allocation process. The mass of the reagent A located in a vertex of the grid is equal to the sum of the kernels’ values in that vertex. The A mass allocated to a generic vertex j is equal to M A j =∑ N A a=1 h a j with a= 1,..,N A . . . . . . . . . . . . . . . . . . . . . . 37 3.6 (a). Four optimized kernels centered on the vertices of the red grid cell. (b). V olume resulting from the sum of the values of the four kernels illustrated in a in each point of the red cell. (c). Graphical explanation of the equivalence between the sum of the values of a kernel, centered in the black dot, on the four intercepted grid vertices, h 1 , h 2 , h 3 and h 4 , and the sum of the values that the kernels 1 2 3 and 4, respectively centered in the four grid vertices, assumes in the black dot position. . . . . . . . . 38 ix 3.7 Graphical representation of the algorithm steps. (a). shows how the mass of one species (in this case A, for illustrative reasons) is re-allocated on the vertices of the grid G over the domain. (b). clarifies how the mass of A is distributed over the domain after the re-allocation step. (c). represents the structure of the reactive step, where the B particles (in blue) react proportionally to the amount of A mass available, depicted by the product of the probability density function h ∗ j and the masses of A in the vertices j of the grid, M A j . . . . . . . . . . . . . . . . . . . . . . 39 3.8 Computation of the algorithm reactive step. Visualization of the other species influence (in this case A, the red one) on the probability of reaction of one species particles (B, the blue one). Given two B particles in a certain cell, only the masses allocated in the vertices of the cell where the two B particles are located will affect their probability of reaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.9 Methodologies’ accuracy comparison. PARxn 2 , LSS and DCC approaches’ mean absolute errors (MAE) are computed for increasing number of particles (N P ). The shaded areas represent the 95% confidence interval of the simulation ensemble. . . 46 3.10 Methodologies runtime comparison. PARxn 2 , LSS and DCC approaches’ simula- tion ensembles mean runtimes T m are computed for increasing number of particles (N P ). The markers represent the values of the measured T m , while the solid line corresponds to the best fit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.11 Comparison between the numerical results for the normalized concentration of the reagent A obtained by the reactive RWPT algorithm PARxn 2 for different Dahmk¨ ohler numbers. Results are compared with the analytical solution provided in equation (3.14). Results displayed for the well-mixed flow scenario. . . . . . . . 49 3.12 Comparison between the numerical results for m C , obtained by the reactive RWPT algorithm PARxn 2 for different Dahmk¨ ohler numbers, and the analytical solution given by equation (3.11). The vertical light green line marks the diffusion time scaleτ D . Results shown for a plug flow scenario. . . . . . . . . . . . . . . . . . . 51 3.13 Comparison between the numerical results for m C , obtained by the reactive RWPT algorithm PARxn 2 for different Dahmk¨ ohler numbers, and the analytical solution given by equation (3.11). Results shown for the viscous laminar flow scenario. Theoretical results are shown for both early (t<τ v ) and late (t>τ D ) time regimes. The early and late times regimes are marked by the vertical dark and light green lines respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.14 Temporal evolution of the product mass m C in a Darcy flow past a permeable circular inclusion. Results obtained with PARxn 2 . The top row of figures shows six snapshot of the 2D plumes for reagents A (red) and B (blue) as well as the plume for the product C (purple). The flow domain is characterized by a permeability contrast ofκ = 0.5. The streamlines for this flow system are illustrated in the inset on the bottom right vertex of the plot (see eq. 3.16). . . . . . . . . . . . . . . . . . . . . 54 x 3.15 Temporal evolution of the product mass m C in a Taylor-Green vortex flow. Results obtained with PARxn 2 . The top row of figures shows five snapshot of the 2D plumes for reagents A (red) and B (blue) as well as the plume for the product C (purple). An inset is included on the bottom right vertex of the plot to illustrate streamlines. . 55 3.16 Illustration of the temporal evolution of m A , m B and m C in the Darcy flow past a permeable circular inclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1 Jupyter Notebook schematic representation to illustrate the tutorial framework. Each number indicates one of the Jupyter Notebook code cells, and the description clarifies its output. Colored are the names of the employed software and N MC stands for the number of Monte Carlo realizations. . . . . . . . . . . . . . . . . . . . . . 68 4.2 Jupyter Notebook schematic representation to illustrate the code cells of the pro- posed tutorial. In the Jupyter Notebook, code cells are one after the other and more extended, compared to the illustration above. . . . . . . . . . . . . . . . . . . . . 69 4.3 Schematic representation of the contaminant transport simulations code cell output. The image schematizes a possible output (depending on the declared parameters in the YAML file) of a randomly chosen iteration among the Monte Carlo realizations, the 47 th . This realization outputs: result-47.csv, a file that contains the data related to the contaminant’s cumulative breakthrough curve at the evaluated control plane, in the image the red vertical line. In addition to that, snapshot files are generated every 400 simulation time steps. snap-0-47.csv, snap-400-47.csv, snap-800-47.csv, and snap-1200-47.csv containing respectively the positions (x=[x,y,z]) of each particle representing the contaminant plume. For visualization purposes, the position of the 1 st ,2 nd and 6478 th particles are highlighted by red circles. . . . . . . . . . . 74 4.4 Schematic representation of the contaminant plume, in green, of a Monte Carlo realization, characterized by the illustrated hydraulic conductivity field. The figure shows where observations wells (circles) are positioned over the target area (rectan- gle) and how the locations of the leading edge of the plume (blue cross) and of the maximum contaminant concentration (red cross), experienced at each time steps of the simulation, evolve in time. As we can observe, the leading edge of the plume tends to follow a trajectory, highlighted by the dotted blue line, while the maximum concentration experienced in the field bounces from a location to another. . . . . . 76 4.5 Maps of the probability of concentration exceedance (top), and its uncertainty (bottom) at different instant of time in the simulation (columns). Plots of the top row shows how⟨Ψ⟩ [-] evolves in space respectively after 0, 10 and 30 days from the beginning of the contamination process.⟨Ψ⟩ is expressed as a probability, as indicated by its values on the color bar on the right. The bottom row showsσ 2 Ψ , as a measure of the uncertainty related to the information given by the plots in the row above. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 xi 4.6 Maps of the average loss of resilience (top) in the model domain, and its variance values (bottom) throughout the whole simulation. The plot on the top shows how ⟨R L ⟩ [d] evolves in space, expressing the amount of days necessary for each point in the domain to recover up to a reliable status. The amount of days associated with the different sheds of blue are indicated by the values on the color bar on the right of the top plot. The plot at the bottom shows the values of the variance of R L ,σ 2 R L , as a measure of the uncertainty related to the information given by the plot above. . 78 4.7 Maximum value of resilience loss (R L [d]) experienced within the target area versus contaminant source efficiency ( η [-] ) for all the Monte Carlo realizations. Data (gray dots) are fitted by a logarithmic distribution, the blue line in the plot. The red dots highlight three Monte Carlo realizations, the 33 th , 34 th and 102 nd , in which the shape of the contaminant plume is investigated and illustrated in the insets respectively connected to the simulated scenario by the black arrow. The black rectangles in the insets are representative of the target area location. . . . . . . . . 81 5.1 Graphical visualization of the summation values of four optimized kernel functions positioned on the corners of a cell (projected on the top of the Figure in red for clarity) of dimensionsλ g × λ g . Over the area of interest, i.e. the central cell, the summation of the four kernel functions is uniformly equal to H with a maximum error equal to 0.15%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Graphical representation of how the b parameter could vary withα. The blue line shows the linear behaviour, while the green and the red ones illustrate the parabolic variation, respectively in a convex and concave way. . . . . . . . . . . . . . . . . 100 5.3 Errorε estimates provided by eq. (B.11) versus the parameterϑ. The inset displays the error withinϑ∈[0.8,1.2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Comparison between the numerical results for m C , obtained by the reactive RWPT algorithm PARxn 2 for different number of particles N P , and the Eulerian solution integrated over the space domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.5 The mean absolute errors (MAE) between PARxn 2 and the Eulerian solution for different number of particles (N P ). . . . . . . . . . . . . . . . . . . . . . . . . . . 104 xii Abstract The ubiquitous presence of multi-scale heterogeneity in hydrological properties is the cause of complex subsurface flow patterns that impact the transport behavior of a solute plume. Fluctuations in the velocity field lead to increased solute spreading which enhances mass transfer mechanisms and impact solute arrival times. This thesis proposes a series of methods which accounts for the effects of aquifer heterogeneity on transport observables which are essential for risk analysis, performance assessment of waste disposal facilities and the selection of optimal remediation cleanup strategies. The approaches proposed in this dissertation are computationally rapid and reproducible. The first contribution of this thesis consists of the development of a novel aquifer connectivity-ranked Monte Carlo method that accelerates the statistical convergence of the statistics of the first arrival times of a solute body in an environmentally sensitive location. Secondly, I propose an innovative kernel-based reactive random walk particle tracking method to improve the computational efficiency associated with reactive transport in spatially variable groundwater flows. Finally, we present a computational package that links the various components relevant for the estimation of the concentration of a pollutant at an environmentally sensitive target and its uncertainty to support decision making in risk analysis. xiii Chapter 1 Introduction 1.1 Motivation Modeling flow and contaminant transport in subsurface formations is a computationally expensive task [1]. In fact, since full characterization of contaminated aquifers is unfeasible [2], due to their large-scale and the high cost of data acquisition, groundwater models are complex and subject to uncertainties. Among the sources of uncertainties that plague hydrogeological models, a major one is the incomplete description of the heterogeneous natural porous formations. Subsurface properties such as hydraulic conductivity field and porosity vary many orders of magnitude in space which lead spatio-temporal fluctuations in the flow field and consequently impact the dispersive behaviour of the solute plume [2]. As a result, in order to address risk related to groundwater contamination and optimize the selection of most proper remediation strategy, solute transport has to be characterized in details. Therefore, computationally expensive modeling techniques are needed to quantity the uncertainties associated to hydrogeological models parameters that have been inferred using the limited amount of available data. Estimating the uncertainty due to data scarcity in solute transport predictions can be achieved by adopting probabilistic tools in combination with geostatistical techniques [3]. There are several approaches that aim to quantify uncertainty of model predictions in the subsurface environment. Existing approaches consist of perturbation methods, Monte Carlo simulations and polynomial chaos expansion [4]. Stochastic methods are implemented to quantify uncertain quantities of interest 1 such as solute travel times from source to receptor, maximum concentration of contaminant at a target location, aquifer resilience and estimation of adverse human health effects. The actual prediction of the spatio-temporal dynamics of the solute concentration relies on the solution of the reactive advection-dispersion equation (rADE), by using analytical or numerical approaches [2]. In general, analytical solutions are limited to low levels of heterogeneities [[5], [6], [7]] and consequently numerical solutions have been wildly used to address non linearities and complexities of most of the hydrogeological models. However, numerical solutions can be grid based approaches and suffer of problems such as numerical (non-physical) dispersion and artificial oscillations [8]. To avoid those problems, smaller simulation time steps and higher grid resolution are necessary, implying longer execution times and very high computational burden. Numerical Monte Carlo (MC) simulation is considered to be the main tool to be used in groundwater hydrology to quantify the uncertainty in flow and transport predictions [9]. This approach implies a repetitive simulation of the hydrogeological model, by using numerical methods, to provide statistical meaningful results [10]. In parallel to the computational expenses of this modeling approach, in hydrology many subjective and interpretive steps go into the translation of data to models [11], leading to the most diversified simulation paths. To recap of what described above, groundwater contamination scenarios are often modeled through Monte Carlo framework where numerical methods are used to solve flow and transport equations. This modeling approach is needed to quantify uncertainties related to the heterogeneous nature of natural aquifers and other not fully characterized model parameters and to deal with non linearities and complexities that characterize hydrogeological systems. Models complexities plus the need to cast the simulations in Monte Carlo loop, lead to extremely computationally expensive routines and to the involvement of subjective and interpretive steps to translate real data to model’s parameters. For this reasons, firstly stochastic frameworks as the Monte Carlo one, have to become more efficient structured to speed up the uncertainty quantification analysis. Secondly, rapid computational tools are needed reduce the computational burden associated with the numerical simulations itself. And 2 finally, consistency and reproducibility should be the base of hydrogeological models, to ensure efficiency and uniformity while generating results on which risk analysis are based. 1.2 Research Overview and Scope In this section we provide an overview of the research tasks that are presented in this dissertation. The purpose of this thesis consists in developing approaches to address the above (Section 1.1) limitations of hydogeological models. The first step of this work consisted in accelerating uncertainty quantification of first arrival time of a solute plume migrating from its source to an environmentally sensitive target, by re-structuring the typically implemented Monte Carlo framework. This is done by taking advantage of strict correlation between static and dynamic modeling components. In fact, first arrival times are correlated with the hydraulic connectivity properties of spatially heterogeneous porous formations. Hydraulic connectivity leads to the presence of preferential flow paths which in turn control the transport dynamics of the leading edge of the solute plume and therefore first arrival times. In applications, these arrival times are subject to uncertainty given the lack of a detailed site characterization. The Monte Carlo method is commonly adopted to estimate the uncertainty of solute arrival times, however it leads to a computational burden. In this work, we build upon the existing knowledge regarding the correlation between connectivity and first arrival times to propose an innovative connectivity-based ranking Monte Carlo approach to quantify the uncertainty of first arrival times. The proposed method is tailored to predict first arrival times and allows to alleviate the computational costs when compared to the traditional Monte Carlo method. Our method consists of ranking the randomly generated spatially heterogeneous hydraulic conductivity fields according to their connectivity. The connectivity metric adopted is based on the concept of the minimum hydraulic resistance and can be obtained at a very low computational cost through the use of graph theory. We illustrate the methodology by analyzing the convergence rate of the first arrival time mean and standard deviations. We compare the convergence rate of the first arrival times statistics 3 obtained through the proposed methodology with those computed through the traditional Monte Carlo method. Overall, our results indicate the that the proposed methodology ensures a faster convergence of the considered quantities, thus reducing the time required for their estimation and the associated computational burden. Next, a novel modeling approach is introduced with the objective of alleviating the computational burden associated with numerical simulation of contaminant reactive transport. As a matter of fact, modeling chemical reactions on the basis of probabilistic rules has been employed in Lagrangian schemes to simulate transport of solutes in porous media. Reactive random walk particle tracking allows reagents particle pairs to interact proportionally to their separation distance and the nature of the chemical process. However, most of the available algorithms result to be computationally prohibitive, since their efficiency depends on the number of pairs of particles interacting. In this work, a novel framework for modeling reactive solute transport is introduced, with the target of reducing the computational burden characteristic of existing approaches. Our method introduces an innovative optimal kernel function and permits the parallelization of a bimolecular reaction modeling scheme which leads to computational speedup. To test the performance of our method, we execute a computational speedup analysis with other approaches reported in the literature. The proposed method is successfully validated with exact analytical solutions. We illustrate the applicability of the method to simulate reactive transport in two-dimensional flow fields that are relevant for environmental applications. The computational time of the proposed methodology scales linearly with the number of particles employed in the simulation, feature that makes the algorithm suitable for simulating reactive transport with large number of particles. As a conclusive effort to facilitate assessment of natural contaminated aquifers, an efficient and reproducible modeling framework is presented and made available to the hydrogical community, to support contamination uncertainty quantification and related risk analysis. In fact, obtaining accurate and deterministic predictions of the risks associated with the presence of contaminants in aquifers are illusive goals given the presence of heterogeneity in hydrological properties and limited 4 site characterization data. For such reasons, a probabilistic framework is needed to quantify the risks in groundwater systems. In this work, we present a computational toolboxVisU-HydRA that aims to statistically characterize and visualize metrics that are relevant in risk analysis with the ultimate goal of supporting decision making. TheVisU-HydRA software toolbox is an open-source Python package that can be linked to a series of existing codes such as MODFLOW and PAR 2 , a GPU-accelerated transport simulator. To illustrate the capabilities of the computational toolbox, we simulate flow and transport in a heterogeneous aquifer within a Monte Carlo framework. The software toolbox allows to compute the probability of a contaminant’s concentration exceeding a safe threshold value as well as the uncertainty associated with the loss of resilience of the aquifer. To ensure consistency and a reproducible workflow, a step-by-step tutorial is provided and available on a GitHub repository. Details about the thesis organizations are the following ones: In Chapter 2, the computational efficiency of contaminate first arrival time uncertainty estimation is improved using a connectivity- based ranking Monte Carlo method. Then a scalable parallel algorithm for simulation bimolecular reactions through reactive particle tracking is introduced in Chapter 3. Chapter 4 shows how to computationally efficiently characterize groundwater contaminant transport, making available a tool box for supporting risk-based decision making. Finally, Chapter 5 reflects on the general message of this dissertation. 5 Chapter 2 Improving the computational efficiency of first arrival time uncertainty estimation using a connectivity-based ranking Monte Carlo method 2.1 Introduction Computing the uncertainty in first arrival times of a solute body at an environmentally sensitive location is a key component in subsurface risk analysis. The hydraulic properties characterizing the subsurface, such as the hydraulic conductivity K, are spatially heterogeneous over a multitude of length scales [2]. The heterogeneous structure of the subsurface environment leads to the presence of preferential flow paths that can significantly impact the spatiotemporal dynamics of transport [12]. These preferential flow paths are an outcome of well-connected highly permeable porous materials which in turn control early solute breakthrough at a given location downstream from the source zone [13, 14]. These first arrival times (or first passage times) are subject to significant amount of uncertainty given the limited resources for site characterization. The impact of hydrogeological heterogeneity on solute arrival times has been topic of study in the past [15–21]. Highly connected channels in the spatially heterogeneous K-field have been shown as key factors in controlling contaminant early arrival times [22–26]. Sahimi et al. [27] showed how first arrival times in a disordered porous media in a two-phase flow were affected 6 by the degree of saturation. Bianchi et al. [13] investigated the geological connectivity at the Macrodispersion Experiment (MADE) site (Mississippi, USA) and how it impacted the asymmetry of the solute breakthrough curves. Harvey and Gorelick [28] showed how the temporal moments of the concentration breakthrough curve were affected by heterogeneity. The importance of solute arrival times on the estimation of human health risk has also been reported in the literature [29–33]. For example, Henri et al. [25, 31] showed how preferential flow paths could be beneficial or detrimental to human health risks due to the presence of chlorinated solvents in groundwater. The impact of both permeability and porosity spatial heterogeneities on first and late arrival times was also topic of investigation [34]. Within the context of groundwater management of non-point sources, it was shown that the homogenization of the hydraulic conductivity field heterogeneity affected the uncertainties on travel times [35, 36]. The study of Andricevic and Cvetkovic [29] illustrates how travel times can be used as indicators to quantify how geologic heterogeneity influences the amount of released contaminant mass that cross a control plane. Several metrics have been employed to quantify connectivity in porous media [37–43]. Measures of connectivity fall within two categories: static and dynamic [38]. Static connectivity indicators are based on the K-field whereas dynamic connectivity indicators use quantities based on flow and/or transport features. Geological objects have been defined by Deutsch [22] in order to identify the preferential paths in a heterogeneous field. An alternative connectivity indicator was proposed by Trinchero et al. [23] which is based on the hydraulic response time in a pumping test. Savoy et al. [44] investigated the impact of different geological conceptualizations on both connectivity and early arrival times. Rizzo and de Barros [26] used graph theory (i.e. using the Dijkstra’s algorithm [45]) to compute the least resistance path and used the minimum hydraulic resistance [24, 46] as a connectivity metric. The approach proposed by Rizzo and de Barros [26, 47] is computationally efficient and was tested in both two- and three-dimensional heterogeneous porous media for generic source-to-receptor conditions (i.e. hydraulic connectivity between point source to control plane, point source to point receptor, etc). Knudby and Carrera [48] also employed the Dijkstra’s algorithm to study connectivity in a two-dimensional aquifer. The minimum hydraulic resistance has been 7 shown to be an indicator of the first solute breakthrough times [24, 26, 46]. For example, Rizzo and de Barros [26] showed that the minimum hydraulic resistance is a lower bound estimation of the first arrival times. However, due to challenges in site characterization, the minimum hydraulic resistance (and therefore first arrival times) are subject to uncertainty. As highlighted in Chapter 9 of Rubin [2], early arrival times are subject to the largest uncertainty (see also [49]). To address this challenge, the graph theory-based minimum hydraulic resistance introduced in Rizzo and de Barros [26] has been recently used to estimate the uncertainty of the minimum hydraulic resistance and improve data acquisition campaigns that aim to reduce the uncertainty in first arrival times [47]. There are several approaches that aim to quantify uncertainty of model predictions in the subsur- face environment. Existing approaches consist of perturbation methods, Monte Carlo simulations and polynomial chaos expansion (for a review of the use of these methods in subsurface hydrol- ogy, see Zhang et al. [4]). Although conceptually straightforward, the Monte Carlo approach is computationally demanding since the statistical accuracy of its predictions depends on the number of realizations used. Given its conceptually straightforward approach, the Monte Carlo method is considered the standard for estimating uncertainty. However it suffers from slow convergence and the statistical accuracy of a given model prediction depends on the number of K field real- izations utilized [9, 50] and the degree of heterogeneity of the subsurface environment [51, 52]. As a consequence, full-blown stochastic modeling based on exhaustive Monte Carlo leads to a significant computational burden. Works have reported methodologies to reduce the computational time associated with the uncertainty estimation of solute arrival times. Berrone et al. [53] proposed to combine a multilevel Monte Carlo (MLMC) and a graph-based primary subnetwork identification algorithm (see [54]) to estimate the mean and variance of first passage times (i.e. first arrival times) in fractured media. Gotovac et al. [55] employed a maximum entropy algorithm based on Fup basis functions within a Monte Carlo framework to characterize the uncertainty in solute travel times. In this paper, we propose a hydraulic connectivity-based methodology to speed up Monte Carlo simulations that aim to predict the uncertainty of first arrival times of a solute plume in a spatially heterogeneous porous media. To achieve our goals, our methodology combines elements from the 8 graph theory-based connectivity concept proposed in Rizzo and de Barros [26] with the connectivity ranking concepts introduced in Deutsch [22] in order to accelerate the Monte Carlo convergence of the statistical moments of the first arrival times.. 2.2 Minimum Hydraulic Resistance We are interested in computing the first arrival times of a solute released in a spatially heterogeneous flow through a porous medium. Heterogeneity stems from the spatially variable, locally isotropic, hydraulic conductivity field K(x) where x=(x 1 ,...,x d ) represents the Cartesian coordinate system and d is the dimensionality of the flow domain. The first arrival times are associated with solute particles that travel through well-connected zones of high K values. In order to identify these well connected zones, we will use the concept of the Minimum Hydraulic Resistance (MHR) [24, 26]. The MHR is a static connectivity measure [24] that relies solely on the K-field and has units of time. The hydraulic resistance is an indicator of the resistance found by a solute particle traveling along a path denoted byΓ. If the pathΓ follows a segment of well-connected high conductivity zones, it will likely be associated with a low hydraulic resistance. That implies that the time needed for a solute particle (following the pathΓ) to reach a given receptor will be low. Let us consider a solute source zone characterized by a volumeV S and a receptor (i.e. an environmentally sensitive target) of volumeV T . The MHR betweenV S andV T is mathematically expressed as R m = min Γ∈P V T V S Z Γ 1 K dγ, (2.1) whereP V T V S denotes all the existing paths from every point withinV S to every point withinV T . The path ˆ Γ, through which the hydraulic resistance is minimized, is defined as the Least Resistance Path (LRP) and its hydraulic resistance value is equal to the MHRR m (see Rizzo and de Barros [26] for further details). The MHRR m in (2.1) is equivalent to the shortest path problem and can be computed through a graph theory framework and solved with a variation of the Dijkstra’s algorithm [26]. Details regarding the algorithm and the procedure to transform the K-field into a graph can 9 be found in Rizzo and de Barros [26, 47]. The tool denoted as “Lazy Mole” was used to compute theR m between two given locations in a heterogeneous K field and is shared openly on GitHub (https://github.com/GerryR/lazymole) [26]. As initially showed in Tyukhova and co-workers [24, 46], there is a correlation between the R m and the solute first arrival time. As depicted in fig. 3 of Tyukhova and Willmann [24] and fig. 7 of Rizzo and de Barros [26], larger values ofR m correspond to larger solute first arrival times. For the sake of completeness, this correlation is plotted in Fig. 2.1. Figure 2.1 plotsR m versus the first arrival times (i.e. the time in which 1 % of the solute mass arrives at a control plane) for two levels of heterogeneity in the conductivity fields. For every K-field, there is one value of R m and a corresponding first arrival time computed through a flow and transport simulator. Figure 2.1: Illustration of the correlation between first arrival times t 1% and the minimum hydraulic resistanceR m . An ensemble of 2047 spatially random two-dimensional hydraulic conductivity fields were used for each level of heterogeneity. Heterogeneity is measured through the log-conductivity varianceσ 2 Y . Results are obtained forσ 2 Y = 3 (blue) andσ 2 Y = 1 (red), using the parameters reported in Table 2.1. 10 2.3 Methodology 2.3.1 Generating a connectivity-based ranked hydraulic conductivity ensemble As mentioned in Rizzo and de Barros [47], one of the key advantages of using a static connectivity metric such asR m is that important features of the hydrogeological system can be extracted without resorting to flow and transport simulations and therefore, without much computational costs. This is a relevant advantage when dealing with stochastic systems that rely on the use of a Monte Carlo framework [9]. The computational costs associated with Monte Carlo simulations are particularly high when the quantity of interest is the first arrival time which is subject to largest uncertainty [2]. Given the positive correlation betweenR m and first arrival times (see Fig. 2.1), in the present study we propose a methodology which consists of ranking the hydraulic conductivity fields based on theirR m values with the goal of speeding up the statistical convergence of the mean and variance of the first arrival times ( t 1% ), defined as the time required to the first 1% of contaminant mass to reach a given location downstream from the source zone. Deutsch [22] introduced the idea of combining the usage of static quantities (related to connectivity) and ranking process to estimate specific dynamic field characteristics. The idea consisted of the following steps: 1) defining an indicator variable that depends on several geological parameters such as lithofacies, porosity and permeability in order to define which geo-objects within the domain were well-connected; and 2) ranking the given geostatistical realizations according to their connectivity level. In our work, we propose to employ Deutsh’s ranking process [22] in order to improve Monte Carlo convergence of the statistical moments of t 1% . Our methodology is based on the following steps: 1. Hydraulic conductivity field generation : We start by randomly generating a large ensemble of size N of log-conductivity fields, i.e. Y = log(K). These log-conductivity fields can be generated using any random space function model and has no restrictions regarding its statistical nature, i.e. multi-Gaussian or non multi-Gaussian. 11 2. R m computation: Once the solute sourceV S and the targetV T zones are defined (which are identical for all the N realizations of the conductivity field), the “Lazy Mole” tool [26] is employed to evaluate theR m value between those two locations for each generated conductivity field. 3. Ranking: Based on the previously computed values ofR m (see item 2), the set of N conduc- tivity fields is reordered according to the following procedure: (i) sort, in descending or ascending order, the N conductivity fields according to their minimum hydraulic resistanceR m ; (ii) rearrange the N conductivity fields into a balanced binary search tree (i.e., for each node of the binary search tree, the height of the left and right sub-trees differ by at most 1 for each node) [56]; (iii) reorder the fields by traversing the balanced binary search tree using a Breath-First Search (BFS). 4. Subsets generation: This step consists of generating subsets from the ranked hydraulic conductivity field ensemble. From the ranked set of N hydraulic conductivity fields, α subsets are generated, whereα = log 2 (N+ 1); each of theα subsets includes the first 2 n − 1 conductivity fields, with n= 1,...,α. 5. Flow and transport simulations: For each generated hydraulic conductivity field, groundwater flow and solute transport are simulated. For each K field, we compute the first arrival time t 1% . 6. Evaluation of the first arrival time statistics : We compute the mean and variance of the first arrival times for every given subset of the hydraulic conductivity field ensemble. The convergence is analyzed by utilizing the α subsets generated from the ranked set of N hydraulic conductivity fields. 12 Figure 2.2: Example of the connectivity-based ranked distribution methodology for N= 7. (a) The randomly generated K fields, each one of them characterized by a value of minimum hydraulic resistivityR m . (b) The K fields are reordered for increasing R m values. (c) A balanced binary search tree is built. (d) The subsets are created searching the tree through BFS algorithm that screens the tree following the directions indicated by the red dotted line in (c). The three generated subsets are highlighted by the horizontal square brackets in (d). To test the performance of the methodology, we will compare the convergence rate of the mean and variance of t 1% between the ranked approach described above and the traditional Monte Carlo method without ranking (i.e. running flow and transport simulations on the non-ranked K fields, see item 1 of the list above). Additional details regarding the ranking and the creation of the subsets are provided in Section 2.3.2. 2.3.2 Implementation The goal of our ranking procedure is to obtain subsets of K fields that will lead to a quicker convergence of the statistical moments of t 1% . In order to achieve this goal, a ranking procedure 13 based on theR m value (see equation 2.1) of each K field is applied such that the subsets are balanced. A balanced subset is defined as the subset where, to the number of K fields with an higher value of R m , corresponds an equal number of counterparts of K fields with lower R m values. To illustrate the implementation of the methodology, Fig. 2.2 shows the procedure to rank N= 7 hydraulic conductivity fields following item 3 listed in Section 2.3.1. As shown in Fig. 2.2.a, initially theR m values of the randomly generated K fields do not follow any specific order. The first step of our ranking procedure is represented in Fig. 2.2.b, where the hydraulic conductivity fields are sorted in increasing order ofR m values; after that, a balanced binary search tree is constructed from the fields in Fig. 2.2.b starting from the center and continuing with the center of the left and right subsets, creating the tree structure reported in Fig. 2.2.c. Each subset is finally created by searching the tree with the BFS algorithm (red dotted line in Fig. 2.2.c), as highlighted by the horizontal square brackets displayed in Fig. 2.2.d. If in the tree sort algorithm [57] the elements of the generated binary search tree are sorted following the in-order traversal, then by adopting the BFS, the sorting process 1) starts from the root of the tree (see level 1 in Fig. 2.2), 2) then moves to the next tree level (see level 2 in Fig. 2.2) and 3) proceeds as indicated by the red dotted line in Fig. 2.2.d. To better grasp the rational behind the proposed connectivity-based ranking methodology, we can consider that the first subset contains one single conductivity field, the one whose value of R m is the median of the whole distribution shown in Fig. 2.2.b. Based on the relationship betweenR m and t 1% (see Fig. 2.1), we can assume that to this field corresponds a value of t 1% close to the mean value of t 1% of the K ensemble of size N. To build the second subset, we consider the remaining fields as partitioned in two groups: those with R m values lower than the median value (the one associated with the field that constitutes the first subset), and those with higher R m values; in each of these two groups, the hydraulic conductivity field with the median value of R m is added to the first subset to generate the second one. In general, we can assume that the β fields contained in a subset divide the complete set of fields (in increasing order of R m ) inβ+ 1 groups. The field 14 corresponding to the median value ofR m of each one of these groups is added to the existing subset to create the subsequent one. 2.4 Illustration 2.4.1 Physical set-up and input data For the upcoming illustrations, we will test our methodology on two- and three-dimensional (2D and 3D) random K-fields. The porous domain has dimensions ℓ i along the i th direction where i= 1,...,d. The log-conductivity field Y = logK is modelled as a multi-Gaussian process characterized by its mean valueµ Y , varianceσ 2 Y , correlation length along the i th directionλ i and an exponential spatial correlation function. The K fields are randomly generated utilizing the SGeMS tool [58] which is based on a sequential Gaussian simulation model. All the generated Y fields are characterized by the same value of µ Y , σ 2 Y andλ i . All values used to generate the conductivity fields are reported in Table 2.1. In order to compute the first arrival times, we simulate flow and transport on the randomly generated K fields. The governing equations for flow and transport are provided in the Appendix A. Flow is considered to be at steady state and in the absence of sinks and sources with permeameter- like boundary conditions, i.e. prescribed hydraulic heads at the entrance and exit of the domain and no-flow conditions at the remaining boundaries. The hydraulic head at the entrance and exit of porous medium are denoted by h in and h out respectively. The flow field is simulated numerically using MODFLOW [59] together with the Python interface Flopy [11]. An inert solute is instantaneously injected along a source zone of areaA o = s 1 × s 2 (for the 2D case) or volumeV o = s 1 × s 2 × s 3 (for the 3D case). We consider both advective and local dispersive mechanisms for the transport problem. The solute plume is simulated through the used of a random walk particle tracking (RWPT) code denoted as PAR 2 [1]. PAR 2 is an open source GPU-accelerated RWPT simulator and it can be downloaded following the instructions provided in Rizzo et al. [1]. For this work, we compute the mass cumulative breakthrough curve at a control plane located at a 15 longitudinal distance L 1 from the solute source zone. From the mass breakthrough curve, we extract the first arrival time, denoted here as t 1% . Values for the parameters used in the flow and transport simulations are provided in Table 2.1. Table 2.1 also provides information on the grid resolution and the number of particles used in the numerical simulations. Table 2.1: Input parameters used in the simulations Y = logK random generator Symbol Value Units ℓ 1 × ℓ 2 (2D) 205× 100 [m] ℓ 1 × ℓ 2 × ℓ 3 (3D) 181× 91× 31 [m] µ Y 1.6 [m/day] K G = exp[µ Y ] 5 [m/day] σ 2 Y (2D) 1,3,4 [-] σ 2 Y (3D) 3 [-] λ 1 ,λ 2 (2D) 8,8 [m] λ 1 ,λ 2 ,λ 3 (3D) 10,10,5 [m] Flow Simulations h in , h out 10,0 [m] ∆x 1 × ∆x 2 (2D) 1× 1 [m] ∆x 1 × ∆x 2 × ∆x 3 (3D) 1× 1× 1 [m] Transport Simulations α 1 ,α 2 (2D) 0.01,0.001 [m] α 1 ,α 2 ,α 3 (3D) 0.01,0.001,0.001 [m] D m 8.6× 10 − 5 [m 2 /day] s 1 , s 2 (2D) 1,100 [m] s 1 , s 2 , s 3 (3D) 1,91,31 [m] L 1 (2D) 200 [m] L 1 (3D) 180 [m] N p (2D) 10 5 [-] N p (3D) 10 7 [-] 2.4.2 Test cases We demonstrate the capability of the proposed methodology to speed-up the convergence of the first two statistical moments of t 1% . In order to evaluate the performance of the ranked-based 16 methodology, we compare the results with the ones obtained through the classic Monte Carlo approach. We start our analysis by considering a 2D computational domain with characteristic dimensions listed in Table 2.1. For all upcoming results, we setσ 2 Y = 3 unless stated otherwise. Prior to illustrating the features of the proposed connectivity-based ranked distribution method- ology, we need to ensure that the mean and standard deviation of t 1% are converged in the classic Monte Carlo approach. Fig. 2.3 presents the convergence analysis of the t 1% cumulative distri- bution function (CDF). Convergence is assumed to occur when the fluctuations in the probability Pr[t 1% <τ] (withτ = 60, 90, 100, 120 and 200 days) become negligible. The results depicted in Fig. 2.3 also show that the probabilities evaluated for the selected values ofτ do not change significantly for N≳ 1500. With a conservative mindset, we consider that the first two statistical moments of t 1% are converged for N≳ 2000. Moreover, the number of K field realizations has been chosen in order for the last balanced subset to include all the N fields according to the relationship for α provided in item 4 of section 2.3.1. For such reasons, N has to be compliant with: N> 2000 (2.2) N= 2 α − 1, (2.3) thus we setα = 11 and consequently N = 2047. Therefore, for all upcoming results, we set the mean and standard deviation of t 1% evaluated at N= 2047 to be the reference values in order to test the performance of the connectivity-based ranked distribution methodology presented in Section 2.3. From the aforementioned N= 2047 realizations, we can obtain the number of evaluated subsets, namely α = 11 (see Section 2.3.2) which is needed to apply the proposed connectivity-based ranking Monte Carlo method. To keep the comparison with the classic (non-ranked) Monte Carlo unbiased, we only report values of the mean and standard deviation of t 1% obtained from the Monte Carlo samples of size equal to the number of realizations in theα subsets. In Fig. 2.4 we show how the first two statistical moments of t 1% at the control plane (see Table 2.1) vary as a function of sample size N of the selected subsets of K fields. The results reported in 17 Figure 2.3: Convergence analysis of the first arrival ( t 1% ) CDF. The probability that t 1% will take a value less than a specified value τ as a function of the number of Monte Carlo realizations N (where 1≤ N≤ 2047) is shown for spatially heterogeneous hydraulic conductivity fields characterized by σ 2 Y = 3. All curves reach their respective asymptotic value around N= 1500. Fig. 2.4 show that both the meanµ t 1% (Fig. 2.4.a) and standard deviationσ t 1% (Fig. 2.4.b) reach convergence at N≈ 500 with the proposed connectivity-based ranked distribution methodology, while the results obtained from the classic Monte Carlo present a more unstable trend. The insets shows how the fluctuations of the classic Monte Carlo approach are more pronounced. To analyze the error associated the Monte Carlo simulations, we define the error metric E Ω as: E Ω (N)= Ω| N=2 n − 1 − Ω| N=2047 Ω| N=2047 × 100, (2.4) withΩ=[µ t 1% ,σ t 1% ] and n= 1,...,α. Note that equation (2.4) represents the relative error measure with respect to the reference values obtained for the statistical moments (i.e. using N=2047, see Fig. 2.3). The behaviour of the relative error given by equation (2.4) is illustrated in Fig. 2.5, where its value for the ranked methodology is compared to the values of the classic Monte Carlo approach. Note that for this error analysis, the classic Monte Carlo ensemble was randomly shuffled 200 18 Figure 2.4: Convergence analysis for the (a) mean µ t 1% and (b) standard deviation σ t 1% of t 1% obtained from the connectivity-based ranked Monte Carlo (light blue) and for the classic Monte Carlo (dark blue). Results obtained for hydraulic conductivity fields characterized by σ 2 Y = 3. times, i.e. all N= 2047 K-field realizations have been randomly reordered in 200 different ways. This is done in order to account for of all possible convergence behaviours since the randomness of the K-field generation process can impact the convergence rate of a given quantity of interest. For example, each “gray” line depicted in Fig. 2.5 corresponds to a convergence rate of the first arrival time statistics for a specific random sampling of the K field. For completeness, we include the average value and confidence interval of the convergence rate obtained over all randomly reordered fields. Close inspection of Fig. 2.5 reveals that the proposed methodology leads to a faster error reduction for both the mean and standard deviation of the first arrival times. Furthermore, the connectivity-based ranked distribution methodology removes the randomness effect of the convergence rate of the output statistics (compare “red” solid curve with the “gray” curves in Figs. 2.5.a and b). The trend of the relative error (2.4) obtained forµ t 1% is reported in Fig. 2.5.a. It can be seen that our methodology leads to a much faster convergence of µ t 1% , having an error consistently lower than the average error obtained from the traditional Monte Carlo approach. The only exception is observed for one N value for which the values of the error for the proposed connectivity-based 19 Figure 2.5: Relative errorE of (a)µ t 1% and (b)σ t 1% as a function of the ensemble size N. Results for the connectivity-based ranked Monte Carlo (red solid line) and for the classic Monte Carlo ensemble randomly shuffled 200 times (gray lines). The black solid line corresponds to the mean of the relative error over all 200 shuffles, while the black dotted line represents the confidence interval. ranked scheme and the average error curve obtained from the classic Monte Carlo shuffled ensembles coincide (see overlapping red and black solid lines). Fig. 2.5.b shows that for N= 31 the error for σ t 1% is kept under 5% for the ranked methodology (red solid line), while the mean of the traditional Monte Carlo shuffled ensembles (black solid line) needs more realizations ( N≈ 255) to reach the above mentioned error value. Both Figs. 2.5.a and 2.5.b show that the proposed connectivity-based ranked scheme leads to a smaller error (in the average sense) when compared to the traditional Monte Carlo method. Similar to Fig. 2.3, we now compute the probability Pr[t 1% <τ] forτ = 60, 90, 100, 120 and 200 days using the connectivity-based ranking method and the classic Monte Carlo approach (see Fig. 2.6). Fig. 2.6 displays the convergence rate of the probabilities for both approaches. We remark that the connectivity-based ranking method leads to a faster convergence of the probabilities. With N≈ 100, the probabilities obtained by the proposed method are almost converged. As a consequence, we can assume that the shape of the CDF reaches a stable structure at reasonably small N values, meaning that not only the first two statistical moments ( µ t 1% andσ t 1% ), but also the 20 higher order moments such as skewness and kurtosis can be evaluated using a smaller number of realizations N if the proposed methodology is applied. Figure 2.6: Comparison of the convergence rate of the probabilities obtained from the traditional Monte Carlo method (solid line) and the proposed connectivity-based ranked Monte Carlo (dashed line) method. Next, we apply our methodology for porous formations displaying different levels of hetero- geneity in the hydraulic conductivity field. Heterogeneity is epitomized by the log-conductivity varianceσ 2 Y . Fig. 2.7 reports the convergence analysis forσ 2 Y = 1, 3 and 4. Results displayed in Fig. 2.7 reveal that the performance of proposed connectivity-based ranking method improves whenσ 2 Y increases. For example, Fig. 2.7.a and b show that bothµ t 1% andσ t 1% computed forσ 2 Y = 4 tend to their converged values at lower N (i.e. N≈ 100) (and with less oscillations) for the proposed connectivity-based ranking scheme when compared to the values obtained via the traditional Monte Carlo approach. The key reasons for this are as follows: when heterogeneity increases, 1) the likelihood of the occurrence of well-connected highly conductive channels increases and 2) the corresponding least resistance paths are better delineated given the higher contrasts in K values within the domain. Therefore, the performance of the graph theory methodology used to estimate R m [26, 47] improves, which is the foundation of the proposed connectivity-based ranked scheme, 21 see Section 2.3. Additionally, there is more variability in the first arrival time predictions when considering larger values ofσ Y . On one hand, this leads to more oscillations on the convergence rate of the statistical output of the first arrival times when the traditional Monte Carlo approach is adopted. On the other hand, for higher heterogeneity, the ranking procedure (which makes use of balanced subsets) reduces this noisy effect leading to faster convergence of the first two statistical moments. Figure 2.7: Convergence rate for the (a) mean and (b) standard deviation of t 1% forσ 2 Y = 1,3 and 4. Comparison between the proposed connectivity-based Monte Carlo method (filled markers) with the traditional Monte Carlo method (void markers). For completeness, we now briefly illustrate the performance of the methodology for a 3D case. Parameter values used in the 3D simulations are reported in Table 2.1. Similar to the 2D case, the 3D scenario analysis takes into account an ensemble of N= 2047 hydraulic conductivity fields with σ 2 Y = 3. In Fig. 2.8 we present the same error analysis employed for the 2D scenario (Fig. 2.5). As shown in Fig. 2.8.a, the adoption of our ranked methodology grants that the error is consistently lower than the mean value of the shuffled classic Monte Carlo ensembles for µ t 1% . Fig. 2.8.b reveals that the error forσ t 1% is lower than the average (over all shuffled classic Monte Carlo ensembles) error for N> 15. As previously mentioned, it is important to note that the ranking procedure in our approach removes the randomness effect of the sampling on the convergence of the model 22 output statistics. The results depicted in Fig. 2.8 show the potential of the proposed methodology in reducing the computational burden associated with the estimation of first arrival times in 3D simulations. Figure 2.8: Relative errorE of (a) µ t 1% and (b)σ t 1% as a function of the ensemble size N of 3D K fields. Results for the connective-based ranked Monte Carlo (red solid line) and for the classic Monte Carlo ensemble randomly shuffled 200 times (gray lines). The black solid line corresponds to the mean of the relative error over all 200 shuffles, while the black dotted line represents the confidence interval. 2.5 Summary This work provides a new methodology to improve the computational efficiency of Monte Carlo simulations aimed at estimating the uncertainty of first arrival times of a solute plume in a spatially heterogeneous porous formation. As shown in Ch. 9 of Rubin [2], first arrival times are subject to largest uncertainty. The methodology proposed in this work is based on ranking the randomly generated hydraulic conductivity fields according to their connectivity and re-ordering the ranked ensemble through a balanced binary tree sampling procedure. We employ the minimum hydraulic resistance as a connectivity metric given 1) its strong correlation with first arrival times and 2) that it can be 23 obtained at a very low computational cost through graph theory (see details in Rizzo and de Barros [26]). Due to the latter, the processing time needed to perform the ranking procedure is negligible. We test the proposed methodology against the results obtained through the traditional Monte Carlo method (i.e. non ranked). Our results show that the mean and standard deviation of the first arrival times (obtained from the ranked and re-ordered ensemble) converge faster for both 2D and 3D domains and with less oscillations. Furthermore, we show how the level of heterogeneity of the porous formation impacted the performance of the connectivity-ranked Monte Carlo simulations. The results reported in this work show that the performance of the proposed method increases with the increasing level of heterogeneity since the connectivity metric employed in our work becomes a better indicator of the first arrival time [26]. Future research directions consist of applying this methodology to real sites. In this work, the proposed methodology was illustrated in an ensemble of unconditional hydraulic conductivity fields. As shown in Rizzo and de Barros [47], the graph theory-based connectivity metric is also a random variable and can be obtained in both unconditional and conditional random fields. Details regarding the statistical features of the minimum hydraulic resistance can be found in the literature [47]. If the geostatistical model is uncertain, our approach can be expanded by adopting Bayesian averaging concepts to generate the conductivity fields [60] needed to estimate the connectivity metric ensemble. 24 Chapter 3 A Scalable Parallel Algorithm for Reactive Particle Tracking 3.1 Introduction Reactive solute transport plays a key role in numerous applications in geological media such as CO 2 storage [61, 62], aquifer remediation [63–65], groundwater contaminant risk analysis [25, 66, 67], enhanced oil recovery [68, 69] and long-term performance of nuclear waste repositories [70, 71]. The spatiotemporal dynamics of a solute body in a porous medium is controlled by the interplay between advection and local-scale dispersion mechanisms as well as chemical reactions. The prediction of the spatiotemporal dynamics of the solute concentration relies on the solution of the reactive advection-dispersion equation (rADE). However, predicting transport dynamics is a major challenge since the hydraulic properties characterizing natural porous formations are spatially heterogeneous over a multitude of scales and uncertain [72]. As an outcome of geological heterogeneity, transport is characterized by early breakthrough and long tailing behavior of the solute concentration at late times [73, 74]. Heterogeneous flows in porous media also enhance dilution and reactive mixing [5, 75, 76]. Numerical solutions for the rADE can be classified into Eulerian, Lagrangian and hybrid methods. Typical grid-based methods are, for example, finite-element, finite-volume and finite difference schemes [8]. These grid based approaches suffer of numerical (non-physical) dispersion and artificial oscillations as a consequence of the discretization scheme [77]. Hence, smaller time steps and higher grid resolution are necessary, implying longer execution times and cumbersome 25 processes to capture zones with high concentration gradients which are critical for quantifying solute mixing of heterogeneous flows [78–80]. Combinations of Eulerian and Lagrangian methods can alleviate these problems [81, 82]. These hybrid approaches treat the advection part using a Lagrangian scheme, while the dispersion component is tackled through an Eulerian scheme [79, 83]. Although these methods attempt to avoid drawbacks related to both Eulerian and Lagrangian schemes, they are not computationally efficient. The approximation of the advection term requires to sacrifice a great deal of computational efficiency since each solute particle has to carry information about concentration at each time step [83]. Differently from the Eulerian-based methods, Lagrangian approaches do not suffer from numer- ical dispersion and are well-suited for advection-dominated conditions. The Random Walk Particle Tracking (RWPT) method is a typical example of a Lagrangian framework and it treats the transport of solute mass M 0 by considering a large number of moving particles N P . Each particle carries a fraction of the solute mass equal to M P = M 0 /N P . These mass particles are moved accordingly to the velocity field to simulate advection. To simulate diffusion and dispersion, a random displacement is added [79, 84]. Not only numerical problems, typically associated with the direct solution of the transport equation, are avoided, but also the implementation of the transport problem becomes easier [83]. In addition, since each particle is seen as a discrete mass of solute, global mass conservation is always satisfied [79]. From a computer memory point of view, the absence of a grid structure makes Lagrangian methods computationally more appealing and suitable for simulations of transport in large and heterogeneous flow systems [84–87]. Since RWPT is based on particles’ motion, numerical difficulties can be encountered when there are high irregularities or many stagnation zones in the geological system [79, 88]. Moreover, the main drawback of this method is the discrete nature of the concentration field. The concentration field, at each time step, is estimated from the masses carried by particles. Thus, the quality of the RWPT result is strictly related to the number of particles used during the simulation. Choosing the number of particles in any particle-based methodology is not a trivial task. The number of particles impacts the smoothness of the concentration field and the ability model incomplete mixing [89–91]. 26 In the context of reactive transport, the noise associated with particle number may reflect the actual physics of the system [92] or in some cases, it may represent a numerical artifact [89]. However, a high number of particles is required to accurately represent the concentration field of a solute plume in realistic hydrological applications ( e.g. spatially heterogeneous hydrological systems, typically characterized by scales on the order of 10 2 ∼ 10 3 m) [64, 79, 80, 93–95]. Handling a high number of particles is now possible due to the development of computationally efficient codes such as RW3D [96], SLIM-FAST [97, 98] and PAR 2 [1, 99]. The work of Rizzo et al. [1] stands out for its low computational time. Rizzo et al. [1] developed the PAR 2 platform which is a GPU-accelerated RWPT code that takes advantage of the parallelizable structure of the RWPT method to compute the particles positions all at the same time. GPU-based computations for transport can also be found in Hansen et al. [100]. Reactive transport problems are even more sensible to the number of particles employed in simulations. In the context of reactive mixing, each particle usually represents a large group of potentially reactive molecules [89] and consequently a large number of particles is necessary to properly characterize the mixing process, responsible for reactants’ contact and thus reaction [101]. In addition, as already mentioned, the discrete nature of RWPT (i.e., particle based) methods makes it difficult to estimate macroscopic quantities, such as concentration and its spatiotemporal derivatives [94], leading to an inaccurate prediction of the reaction rate [see discussion in 89]. Furthermore, the definition of reaction rules applicable to a finite number of particles that still properly represent the ones observed at the macroscale is a challenging task [89]. A large variety of methods have been proposed in the literature to incorporate multispecies chemical reactions into RWPT methods. Optimal reconstruction of concentrations has been de- veloped through Kernel Density Estimators (KDE) by Fernandez-Garcia and Sanchez-Vila [94]. The KDE approach allows to reconstruct the probability density function (PDF) of a variable of interest (such as mass, concentration etc). Sole-Mari et al. [102] developed a grid-projected KDE methodology to compute local densities from particle positions. The KDE method in [102] is also locally adaptive to account for boundary effects. Schmidt et al. [91] used a Gaussian kernel to tackle 27 problems associated with particle numbers in diffusion-driven bimolecular reactions. Tompson & Dougherty’s [103] introduced a particle-in-cell model, which consists of using RWPT to diffuse and/or advect solute particles; these particles are subsequently mapped on a grid, to determine the changes in mass due to the reaction process through numerical integration. This approach is computationally intensive given that it requires the computation of the concentration values at the grid points at every time step in order to calculate the reaction term [103]. Schmidt et al. [104] studied the accuracy of RWPT-based mass-transfer algorithms. The challenge of defining reaction rules for particles that respect the macroscale behaviour was addressed by Benson & Meerschaert [105]. In the approach proposed by Benson & Meerschaert [105], particle pairs react following a probabilistic rule that is a function of the reaction rate coefficient and the particles’ relative distance, avoiding the necessity of reconstructing concentration fields at each iteration. Here, reactions are simulated as a birth/death process of particles, leading to segregation problems and poor convergence of the particle tracking algorithm at late times. To circumvent this drawback, Bolster & Benson [90] modified this algorithm by preserving the number of particles through the whole simulation and accounting for the proceeding of the reaction by decreasing the mass of reactants carried by the particles. To facilitate the implementation of arbitrarily complex reactions, Benson & Bolster [106] extended the method by allowing each particle to carry more than a single constituent. Consequently, reactions take place over particles and not between particles. Engdahl et al. [107] simulated complex reactions by coupling the algorithm provided in [106] with the PhreeqcRM reaction module [108]. Slightly different is the method proposed and implemented by Perez et al. [109], where particle pairs still react following a probabilistic rule. The probabilistic rule provided in Perez et al. [109] is a function of the reaction rate coefficient, total number of particles, and an interaction radius. Rahbaralam et al. [89] combined the ideas of using KDE [94] and probabilistic rules to simulate reactions through particle tracking methods. Kernels are used as the weighting function for particles interaction, following the probabilistic rule of Benson & Meerschaert [105]. Sole-Mari et al. [110] extended the methodology to account for complex kinetic bimolecular reactions. 28 Most of the applications of these algorithms are limited to one dimensional scenarios since the computational cost associated with multidimensional domains is high. The computational time complexity of some of the above mentioned approaches is O(N 2 P ), meaning that in order to evaluate how a particle of mass evolves at a given time step, the algorithm requires to iterate over all the other particles in the domain. Higher computational efficiency is reached through the use of approaches such as radius of interaction, KD-tree or domain decomposition (DDC) [92, 111, 112]. Nevertheless, those methodologies require to compute the probabilities of reaction of all, or group of (as the case of DDC approach), particles in serial. That means that it is necessary to loop over large amount of particles at every time step of the simulation. Conscious of the pros and cons of the aforementioned reactive RWPT methods, this wor aims to present PARxn 2 , a computationally efficient reactive particle tracking algorithm conceived on the re-organization of the simulated problem structure. In our work, high resolution performance of reactive RWPT is achieved due to the implementation of an innovative computational framework and solution approach for solving problems involving bimolecular reactions. This framework permits the application of an O(N P ) time complexity parallelizable algorithm, and consequently the usage of a large number of particles that can aim to address multidimensional problems. Ideally, the algorithm framework is suitable for being fully parallelized, leading to the computation of each particle’s probability of reaction to be independent and thus avoiding the necessity to iterate over all the other particles in the domain. We verify our methodology with previous results from the literature and illustrate the applicability of our approach for different spatially variable flow fields. 3.2 Reactive Random Walk Particle Tracking: An Overview In this section we provide an overview of the reactive random walk particle tracking (RWPT) procedure. We start by considering the following bimolecular irreversible chemical reaction: A+ B→ C. (3.1) 29 Here we consider a reactive system initially composed by the reagents A and B. For the purpose of illustrating of the algorithm, the reagents are characterised by the same diffusion coefficient. The two species, A and B, react kinetically and irreversibly with unitary stoichiometric coefficients. In order to predict the evolution of the reactive transport process, the system of reactive advection- dispersion equations (ADE) are employed [111]: ∂c k ∂t =− ∇ ˙ (uc k − D∇c k )− k f c A c B k= A,B, (3.2) where c k is the concentration of the k th specie, namely k= A,B. The velocity field is denoted by u, D is the local dispersion tensor (or diffusion), and k f is the reaction rate constant. Given the equivalence between the ADE and the Langevin equation [113], multiple works used Lagrangian particle tracking methods as an approximation to equation (3.2) [see 105, 114, 115, and references therein]. Let N A and N B denote the number of particles for species A and B respectively. All particles for species A have identical mass M A and all particles for species B have identical mass M B . The masses of reagents A and B initially present in the system are given by M ∗ A = M A N A and M ∗ B = M B N B . We denote P A a and P B b as the a th and the b th particles belonging to the species A and B. The particles are transported by two physical mechanisms, namely advection and local-scale dispersion, and the trajectory X P of a generic particle is given by the Itˆ o-Taylor integration scheme [113]: X P (t+∆t)= X P (t)+[u(X P (t))+∇· D(X P (t))]∆t+ B(X P (t))· ξ(t) √ ∆t, (3.3) where∆t is the simulation time step, u is the velocity field and D is the local dispersion tensor. The tensor B expresses the strength of dispersion andξ is a vector of normally distributed random variables with zero mean and unit variance representing the random displacement associated with the dispersion mechanism [79, 83, 84]. 30 Without loss of generality, we opt to work under the assumption of a constant local-scale dispersion tensor, i.e. D= diag[D]. Therefore equation (3.3) can be simplified as follows: X P (t+∆t)= X P (t)+ u(X P (t))∆t+ √ 2D∆tξ(t). (3.4) At each time step, after the trajectories of the solute particles are computed, reactions between particles are implemented in the simulation procedure. Benson and Meerschaert [105] proposed to quantify chemical reaction without regard to local concentrations but by considering probabilistic rules of particle interaction and association. Given the bimolecular irreversible reaction in equation (3.1), Benson and Meerschaert [105] defined the probability that the a th particle P A a reacts with the b th particle P B b as the combination of the probability that the two particles are collocated and the thermodynamic conditional probability that those two particles will react (for additional details see [105, 107]): P(reaction) = P(collocation)P(thermodynamic reaction| collocation ), (3.5) whereP(·) denotes the probability of an event occurring. Here, the collocation probability is a function of the separation between the two considered particles, the diffusion and/or local dispersion coefficient and the simulation time step ∆t. The thermodynamic reaction probability depends on∆t and on the reaction rate coefficient k f . Traditionally, the simulation approach consists in considering a particle, calculating its probabil- ity of reaction with every other particles (the ones carrying the mass of the other chemical species) and comparing this probability to a random number sampled from a uniform distribution [105, 116]. If the random number is lower than the calculated probability, the two particles will react (generating a reaction product particle) and they will be both removed from the system (particle killing algorithm). Otherwise, the probability of reaction with the other next particle is evaluated. As a direct consequence of the application of the particle killing algorithm, numerical resolution issues (linked to low concentration zones) arise [110]. To overcome this problem, Rahbaralam et al. 31 [89] equipped each particle with a time varying kernel function reflecting the probability density function (PDF) of the location of the carried mass at each time step. The kernel function shape changes not only accordingly to the diffusion and/or dispersion process (constant in time) [105], but also to the continuously diminishing amount of particles in the domain, preventing the formation of segregated areas as the simulation proceeds [89, 110]. Issues related to particle segregation (and therefore, incomplete mixing) can occur when the particle killing algorithm is employed. Further details about this problem can be found in literature [89]. In most of the cases, Gaussian kernels are preferred due to mathematical advantages [89, 117], but a variety of different kernel functions can be adopted. It is important to notice that a common feature of all the current algorithms for simulating reactions between solute particles is that the probabilities of reaction of all, or a group of, parti- cles have to be computed in serial. This means that particles’ reaction probabilities have to be processed one after the other at every∆t, i.e., the algorithms are serial implemented with O(N 2 P ) or O(N 2 P /N groups ) computational time complexity, where N groups is the number of particles’ groups individually analyzed [112]. 3.3 Methodology Next we provide the details regarding the structure of the proposed reactive RWPT algorithm in a two dimensional (2D) computational domain. 3.3.1 An Optimized Kernel Function It is common to adopt a Gaussian kernel function with varianceσ 2 = 2D∆t to represent the particle’s position probability density function [105]. In our approach, we opt to employ a kernel that 1) approximates the Gaussian function and 2) its local support is a finite domain with dimensions proportional to the sizeλ g of the squared cells within the grid G (see Fig. 3.1). 32 Figure 3.1: Illustration of the kernel function representing a generic particle P A a with a= 1,..,N A and N A denoting the total number of A particles in the computational domain covered by the grid G. The red grid shows the shape of the local support of the kernel function, the red dot represents P A a andλ g is the size of the squared grid cell . Figure 3.2: Graphical comparison between a Gaussian function and a tent function that share the same varianceσ 2 = 2D∆t. 33 To fulfill these requirements we impose the kernel’s base length to be 2λ g = 2 √ 12D∆t. To support the reasoning behind this choice, consider a 1D scenario and a tent (i.e. triangular) kernel function with a finite local support shown in Fig. 3.2. Fig. 3.2 shows that both tent and Gaussian functions (characterized by the same variance) evolve similarly. Thus, the local support is given by 2λ g = 2 √ 2πσ 2 = 2 √ 4πD∆t≃ 2 √ 12D∆t, leading toλ g = √ 12D∆t. By the same token, this approximation is also adopted for the 2D scenario. For such reasons, we construct our kernel function with a local support with dimensions l x × l y = 2λ g × 2λ g , whereλ g denotes the dimension of the squared grid cell (i.e., resolution of the grid G) shown in Fig. 3.1. The dimension of the edge of the grid cell,λ g , is computed as a combination of the variables characterizing the dynamics of the physical processes involved in the reaction simulation and is defined as λ g = √ 12D∆t, (3.6) where D is the local-dispersion coefficient and ∆t is the time step of the simulation. The functional shape ofλ g comes from the fact that, in general, the diffusion of a chemical species is modelled by a Brownian random walk motion with zero mean and varianceσ 2 = 2D∆t [89, 113]. Therefore, the mass carried by a particle spreads while taking into account the simulated physical process (i.e., not-discrete nature of the the solute mass), the simulation parameters (as already done by the Gaussian function), and proportionally to the cells of the grid G. The remaining number of particles over which the reagents’ masses are assigned throughout the whole simulation does not influence the geometry of the kernel function. Note that issues related to the particle segregation, see [89, 110], are not a limit in our approach since we are working with a considerable number of particles. Thus, the local support of our kernel function does not have to adjust accordingly to N P . The functional formulation of the optimized kernel’s shape is the result of a modification of a squared pyramid. Fig. 3.3 shows one quarter of the optimized kernel function in which triangular shape sections are maintained along its local support midsections, while parabolic sections are 34 Figure 3.3: Schematic representation of one quarter of the kernel function of height H. The value of the kernel function at point Q, defined by the distance from the center of the kernel r, and the angleα, is h. Here d represents the length of the segment connecting the center of the kernel to the edge of its local support, passing through the point Q. The resolution of the grid is given byλ g and corresponds to half the side of the kernel function’s local support. employed elsewhere. The expression of a generic value of the kernel, at given r andα, is (see Fig. 3.3): h(r,α) = − H d(α) 2 + (q 1 Hα 2 + q 2 Hα− H) d(α)λ g r 2 + q 1 Hα 2 + q 2 Hα− H λ g r+ H, (3.7) where h is the value of the kernel function at a given point Q defined by the distance from the center of the kernel, r, and the angle α (see Fig. 3.3). Here, d is the length of the segment connecting the center of the kernel to the edge of its local support with dimensions 2λ g × 2λ g , passing through the point Q, H is the height of the kernel and q 1 and q 2 are respectively 0.650239 and− 1.03809. Details regarding the full derivation of the shape of the kernel function are provided in the Appendix. 35 Figure 3.4: A (red) and B (blue) reagents in a 2D domain. (a.) A and B particles positions at the initial time denoted by t 0 , (b.) A and B particles positions after moving at the first simulation time step t 0 +∆t, (c.) representation of the grid G necessary for the reagents mass particles allocation. 3.3.2 Particles Mass’ Re-allocation In our approach, denoted here as PARxn 2 , the 2D computational domain is firstly covered by a squared grid G, as showed in Fig. 3.4c. With the purpose of illustration, the initial concentrations of the two reagents create a segregated field (Fig. 3.4a). Subsequently, the chemical species are transported by advection and local-scale dispersion at each time step (Fig. 3.4b). At each time step, after the particles’ displacement, the reactant mass carried by each particle (M A,a or M B,b , with a=1,..,N A and b=1,..,N B ) is re-allocated on the cells’ vertices of the grid G, where each grid cell has dimensionsλ g × λ g . To better illustrate the mass re-allocation process, let us focus on the species A. As shown in Fig. 3.5a, the amount of M A,a carried by a generic particle P A a , with a=1,..,N A , is assigned to each grid vertex of G accordingly to the value of the novel kernel function (see section 3.3.1) (centered where P A a is located and having H = M A,a ) at the vertex position. The kernel values at the vertices’ positions are denoted as h a j and∑ j∈C(i) h a j = M A,a , whereC(i) is a function that given the index i of the cell where the particle P A a is located, returns the associated j indices of the vertices of the cell. At this point, the masses on the vertices of G are treated as the “new” particles of the species A in the system, as depicted in Fig. 3.5b. 36 Figure 3.5: (a.) Mass re-allocation process of three particles of A reagent P A 1 , P A 2 and P A 3 . Each particle is represented by the kernel and its mass M A,a (with a=1,2,3) is distributed in each grid vertex accordingly to the value of the kernel on that grid vertex, centered where P A a is located and having H = M A,a , such that: M A,a =Σ j∈C(i) h a j , where i denotes the index of the cell where P A a is located and j the indexes of the i th cell’s vertices. (b.) New mass distribution configuration, after the mass re-allocation process. The mass of the reagent A located in a vertex of the grid is equal to the sum of the kernels’ values in that vertex. The A mass allocated to a generic vertex j is equal to M A j =∑ N A a=1 h a j with a= 1,..,N A . 37 Figure 3.6: (a). Four optimized kernels centered on the vertices of the red grid cell. (b). V olume resulting from the sum of the values of the four kernels illustrated in a in each point of the red cell. (c). Graphical explanation of the equivalence between the sum of the values of a kernel, centered in the black dot, on the four intercepted grid vertices, h 1 , h 2 , h 3 and h 4 , and the sum of the values that the kernels 1 2 3 and 4, respectively centered in the four grid vertices, assumes in the black dot position. Given the kernel function, see equation (3.7), and the illustrations provided in Figs. 3.5-3.6, we highlight the following points: 1. Due to the shape of the local support of the kernel function, it is possible to state that for every possible position of a particle P A a or P B b (with a=1,..,N A and b=1,..,N B ) in the domain, the kernel function representative of M A,a or M B,b will always intercept just the four vertices of the cell where P A a or P B b is positioned and not any other vertex of the grid (Fig. 3.1). As an outcome of this feature, the particle mass allocation process is just limited to the allocation of the mass in the four vertices of the cell where the particle carrying the mass is located. 2. Due to the functional form of the kernel, conservation of mass is always guaranteed in the particle’s mass allocation process. Fig. 3.6b illustrates that the summation of four optimized kernel in Fig. 3.6a (coincident with the sum of four values of a single kernel over the intercepted vertices of a cell, as shown in Fig. 3.6c) is constant over the red cell, meaning that no matter where the particle is in the cell, the kernel re-allocates its mass respecting mass conservation, feature that with any other kernel form would have not been respected. 38 Other features of the proposed kernel function will be highlighted in the upcoming sections. In addition, all the above mentioned characteristics are fulfilled by a tent function with base length 2λ g in a 1D setting (see Fig. 3.2). The tent shaped kernel function is adopted in the 1D scenarios explored in this work. Figure 3.7: Graphical representation of the algorithm steps. (a). shows how the mass of one species (in this case A, for illustrative reasons) is re-allocated on the vertices of the grid G over the domain. (b). clarifies how the mass of A is distributed over the domain after the re-allocation step. (c). represents the structure of the reactive step, where the B particles (in blue) react proportionally to the amount of A mass available, depicted by the product of the probability density function h ∗ j and the masses of A in the vertices j of the grid, M A j . 3.3.3 Evaluation of Reaction Probability Next we provide details associated with the computation of the chemical reaction in the RWPT method. We implement our reaction by applying probabilistic rules of particles interaction and 39 Figure 3.8: Computation of the algorithm reactive step. Visualization of the other species influence (in this case A, the red one) on the probability of reaction of one species particles (B, the blue one). Given two B particles in a certain cell, only the masses allocated in the vertices of the cell where the two B particles are located will affect their probability of reaction. association, e.g., see Rahbaralam et al. [89]. The probability that a generic particle P B b (in congruence with the previously explained methodological steps in Section 3.3.2) reacts in the simulation time step∆t is [89, 110] P(Reaction of P B b → P C c )= k f ∆t V ∑ j=1 M A j h ∗ j | P B b , (3.8) where k f is the forward reaction coefficient, V is the total number of vertices associated with the used grid G, h ∗ j | P B b is the value of kernel function (see eq. (3.7)) with unit volume (and consequently with H = 3/4λ 2 g , based on the approximation that the volume of the kernel is V K =(4Hλ 2 g )/3, as for a squared pyramid), centered on the j th vertex and evaluated at the position of the considered P B b . M A j is the amount of mass A located on the j th vertex. The product between h ∗ j | P B b and M A j can be physically interpreted as the effective amount of M A j available to enable the particle P B b to react (Fig. 3.7c), while the mass carried by the evaluated particle P B b is considered localized in the position of the particle itself. However, given that the kernel function has a local support proportional to the cell’s dimensions of the grid G, equation (3.8) can be re-written as P(Reaction of P B b → P C c )= k f ∆t ∑ j∈C(i) M A j h ∗ j | P B b , (3.9) 40 where the values of j correspond to the indices (returned by the functionC(i)) of the vertices associated with the cell i where P B b is located. Equation (3.9) is justified by the fact that, when computing the probability of reaction, the B particles located in a generic cell i will only perceive the mass associated with A particles located on the vertices of that cell. This is illustrated in Fig. 3.8. We remark that the h ∗ value associated with any other A particle in the domain has zero value in the location of P B b , since all the other A particles are located on grid vertices that are far. Note that the summation in equation (3.9) considers only the particles of species A located on the four vertices of the grid cell where P B b is, therefore simplifying the computation of the probability of reaction. At this point, the value ofP(Reaction of P B b → P C c ) is compared with a random number generated from a uniform distributionζ∼U(0,1). If the probability calculated in equation (3.9) is higher thanζ , the considered B particle will become a C particle. Here it is important to notice that no A particles are converted to C particles when a B particle reacts. By choosing a small simulation time step and using a large number of particles, the same number of A and B particles react in the lapse of time∆t (in an average sense). Thus, the reactive process can be considered balanced, meaning that a comparable number of A and B particles are going to react (in∆t) without the need to consider the conversion of a particle of the other species, and consequently leading to a evenness condition. We would have an uneven condition if a larger amount of A particles react compared to the B ones, or vice versa. In average, the same particles of A and B react in a given time step, as shown in Section 3.5.1. 3.3.4 Computation of the Reaction Product C Details regarding the structure of the proposed reactive RWPT algorithm are now presented. If we consider the irreversible bimolecular reaction A+ B→ C, the reaction product C can be computed through the PARxn 2 algorithm, in a parallelized way, as follows: 41 Algorithm PARxn 2 1: build the grid G(λ g ) ▷ (see section 3.1) 2: for each time step do 3: for each particle do ▷ (see section 3.2) 4: reassign particle mass to G(λ g ) corners 5: end for 6: for each particle do ▷ (see section 3.3) 7: if particle is A or B then 8: p← compute probability of reaction using masses in G(λ g ) 9: ζ ← generate a uniform random number between 0 and 1 10: ifζ≤ p then 11: particle becomes C 12: end if 13: end if 14: end for 15: end for The first step of the algorithm define the grid G with resolution equal to λ g (see eq. 3.6). Subsequently, a for loop is employed to evaluate the reactive process in time. At each time step, all the masses carried by the particles are re-allocated on the vertices of the grid G using the kernel function h provided in equation (3.7), as explained in Section 3.3.2. Thus, each vertex of G is now characterized by a certain value of mass of A, equal to the sum of all A particles’ kernel functions values at the position of the vertex (and analogously by a certain value of mass of B). The next step of the iteration consists in computing (for all particles), the probability of reaction for the masses carried by the particles (prior to the re-allocation step) in the domain, using the re-allocated masses of the other species (see eq. (3.9)), as defined in Section 3.3.3. All the probabilities are concurrently compared to the randomly generatedζ values (obtained from a uniform distribution between 0 and 1) and turned into the product C according to the if statement. 42 From the pseudo code it is possible to infer that the algorithm is linear with the number of particles. In fact, there are no for loops on the number of particles nested in other for loops. In addition, each for loop on particles can be parallelized, since there is no dependency among the computed steps. To conclude, it is important to highlight that when a particle reacts, it becomes a particle of species C, without the actual conversion of the other species particle. This is possible due to the usage of a small time step and a large number of particles, that guarantee conservation of mass at each time step. In this way, the algorithm can be computed in parallel. We highlight that the algorithm has been written ad hoc for an irreversible bimolecular reaction. In this work, the proposed algorithm PARxn 2 is computed on the CPU and solved with the support of the Python libraryNumPy that enables the implementation of numerical computations as vectorizing computations [118]. Due to vectorized computations, it is possible to pseudo-parallelize operations, avoiding the usage of loop structure, and thus leading to high computational efficiency ( ∼ O(N A + N B )). We point out that the structure of this algorithm is suited for GPU parallelization, where the hardware architecture is made with the purpose of parallel operations [1]. 3.4 Performance Assessment and Comparison with other Works This section is divided in two main parts. The first part focuses on testing the performance of PARxn 2 by comparing its accuracy and computational efficiency against other reactive RWPT algorithms. In the second part of this section, the proposed algorithm will be validated with existing analytical solutions. 3.4.1 Accuracy and Speedup Analysis Here we evaluate the performance of the proposed methodology by comparing it to other reactive particle tracking approaches. Engdahl et al. [112] developed two algorithms, respectively based on sparse search (Looping Sparse Search, LSS) and domain decomposition (DDC) techniques, to speed up the computation of the reactive step. The authors showed how LSS and DCC schemes, 43 both under serial execution, offer a significant computational time reduction with respect to the approach proposed by Benson and Bolster [106]. The two methodologies presented in Engdahl et al. [112] are based on the partitioning/division of the simulation domain to avoid that a single particle in an ensemble of N particles has N− 1 possible pairs that have to be evaluated, at each time step, to compute its probability of reaction. Thus, particles are grouped into bins (DCC) or assigned to branches (LSS), such that the interaction among them is limited and ensembles of particles can be evaluated independently. For further details about these methodologies, we refer the reader to Engdahl et al. [112]. In order to investigate PARxn 2 ’s accuracy and efficiency, the result and computational time of its serial execution are compared with the ones of the above mentioned DCC and LSS methodologies, also serially executed. For this analysis, we consider a 1D domain of length L= 5 cm in which the reactant species are initially segregated, being A uniformly distributed on the left and B on the right half of the domain with identical concentration[A 0 ]=[B 0 ]= 1/5 g/cm. Since we are here simulating a 1D system, the kernel function used by PARxn 2 is the tent function, shown in Fig. 3.2, having base length 2λ g (see Section 3.3.1 for additional details). The considered scenario is inspired by the soil column laboratory experiment reported by Gramling et al. [119], characterized also by a constant flow velocity, here equal to u= 0.65 cm/s. The diffusion coefficient is set to D= 1.75× 10 − 3 cm 2 /s, the reaction rate constant is k f = 200M − 1 s − 1 and the total simulation time is t= 10 s with a time step∆t= 0.01 s. All the simulations are carried out using a desktop computer equipped with a Intel Core i7-9700K vPro CPU. The LSS and DCC algorithms are written in Matlab (version R2020a), recycling sections of the code reported by Engdahl et al. [112] in the open repository on GitHub (seehttp://doi.org/10.5281/zenodo.1476680), while PARxn 2 is written in Python (version 3.7.6). Simulation scenarios with 1× 10 3 to 9× 10 3 particles (with N P = N A +N B ) are used to investigate the methodologies’ computational efficiencies, through the evaluation of the mean runtimes ( T m ), and accuracy, through the computation of the mean absolute error (MAE) defined as: 44 MAE = 1 N steps N steps ∑ s=1 m C (t)| t=s∆t − ˆ m C (t)| t=s∆t , (3.10) where N steps is the number of numerical simulation time steps,∆t is the length of the time step and m C and ˆ m C are respectively the numerical (given by the evaluated reactive RWPT algorithm) and the analytical values of the mass of reaction product C. For this model set-up, the exact analytical solution for the total product of mass C, i.e. ˆ m C (t), is available (see details in [119]) and expressed by: ˆ m C (t)= 2c 0 r Dt π M C A T , (3.11) where c 0 is the initial concentration of one of the two reactants A or B (indifferently, since they are equal),M C is the molar mass of the reaction product C, in this case with unit value, and A T is the cross sectional area of the considered domain. Both MAE and T m are defined as mean values. Due to the randomness of the evaluated methodologies (e.g. Brownian motion) multiple realizations are computed for each scenario with the goal to reach average values of m C (consequently MAE) and run time T m . Following Engdahl et al. [112], we define an ensemble of 75 realizations sufficient to reach convergence of the above mentioned target parameters for each given scenario. In addition, the decomposition level used in the DCC approach is 17, the highest possible to maximize not only its speedup, but also its scaling efficiency (see details in [112]). Fig. 3.9 shows that the MAE decreases with increasing N P , for all three approaches. As expected, for N P = 1× 10 3 , both LSS and DCC outperform PARxn 2 in terms of accuracy. As stated in section 3.3, the proposed approach is based on the assumption that a large number of particles is needed to satisfy conservation of total mass. When this assumption is violated, the other methodologies achieve better accuracy when compared to PARxn 2 . When 4× 10 3 ≲ N P ≲ 7× 10 3 , the MAE values are similar for all the considered algorithms, with a tendency of the LSS method to be less accurate and almost constant. When N P > 8× 10 3 , PARxn 2 and DCC farther reduce the MAE 45 Figure 3.9: Methodologies’ accuracy comparison. PARxn 2 , LSS and DCC approaches’ mean absolute errors (MAE) are computed for increasing number of particles (N P ). The shaded areas represent the 95% confidence interval of the simulation ensemble. Figure 3.10: Methodologies runtime comparison. PARxn 2 , LSS and DCC approaches’ simulation ensembles mean runtimes T m are computed for increasing number of particles (N P ). The markers represent the values of the measured T m , while the solid line corresponds to the best fit. 46 values, reaching the best accuracy among the all the simulated scenarios (i.e. at different amount of N P ). Fig. 3.10 depicts the values of the mean (over the ensemble) runtimes, T m , as a function of N P . The T m values obtained from the RWPT are represented by the markers while the solid line corresponds to the best fit curve (i.e., power law) of the computed T m . The expression for the best fit curve is: T m (N P )=S o N γ P , (3.12) whereS o andγ are the fitting coefficients. This fitting analysis is done in order to investigate the computational efficiency of the evaluated algorithms. By examining the value of the exponent γ, we can investigate how T m scales as a function of problem size N P . The values ofγ associated with the three algorithms are respectively: γ = 2.8884 (for DCC), γ = 1.6163 (for LSS) andγ = 0.9847 (for PARxn 2 ). Contrary to the results for MAE (Fig. 3.9), Fig. 3.10 displays a significant difference in T m among the evaluated approaches. As stated by Engdahl et al. [112], the LSS method tends to be faster than the DCC one (at the price of its accuracy) since its computational efficiency is less affected by the increment of N P (theγ value obtained in the LSS is lower than theγ for the DCC), while the speedup provided by PARxn 2 results to be quite decisive. The value ofγ obtained by PARxn 2 is equal to 0.9847, meaning that the cost of running the algorithm scales linearly with N P (i.e., O(N P )), while the T m values of the DDC and LSS result to be much impacted by N P , scaling respectively almost in a cubic and quadratic way. In addition, for PARxn 2 , each particle reaction probability can be computed separately from the other particles, leading to the possibility of using an N P number of processors. The other approaches have limits to the number of usable processors since each bin (DCC) or branch (LSS) has to contain a sufficient number of particles. 47 3.4.2 Validation Next, we validate our approach by considering a forward bimolecular irreversible reaction A+B→C in both 1D and 2D systems. We report results for different values of the Dahmk¨ ohler number. The Dahmk¨ ohler number (Da) is defined as [111]: Da= τ D τ r = [A 0 ]k f l 2 c D ; (3.13) whereτ D = l 2 c /D is the characteristic diffusion time scale, expressed as a function of the diffusion coefficient D and the initial size of the concentration perturbation l c , and τ r = 1/([A 0 ]k f ) is the characteristic reaction time, with[A 0 ] denoting the initial concentration of the reagent A and k f the reaction rate constant. First, the algorithm is tested in a 1D domain with dimension L x = 64 cm and uniform initial concentrations [A 0 ]=[B 0 ]= 0.005 g/cm (well-mixed conditions). In the 2D illustrations, we simulate reactive transport in a plug flow and laminar viscous flow in a computational domain of size L x × L y ={(x,y)|x∈ L x and y∈ L y }. In both plug flow and laminar viscous flow cases, we set L x = 5 cm and L y = 0.2 cm. The initial segregation of the reactant species is uniformly distributed on the left half of the domain (reactant A) and on the right half of the domain (reactant B) as illustrated in Fig. 3.4a. Initial concentrations are considered to be identical[A 0 ]=[B 0 ]= 1/(5× 0.2) g/cm 2 . 3.4.2.1 Reactive Transport in a Well-Mixed Flow Scenario Similar to the analysis carried out by Benson & Meerschaert [105], we investigate the performance of the algorithm PARxn 2 in a well-mixed flow scenario. The analytical expression of the temporal evolution for the normalized concentration for A is [see details in 89, 105]: [A](t) [A 0 ] = 1 1+ k f t ; (3.14) where, in this specific example, k f is set equal to 50 M − 1 s − 1 . Using the exact same simulation variables employed in Benson & Meerschaert [105], we test our reactive particle tracking method for 48 different values of Da, ranging from 10 − 5 to 10 1 . The initial size of the concentration perturbation l c is here defined as the ratio between the length of the domain and the initial number of particles, i.e. L x /N P . The variation of Da is performed by changing the diffusion coefficient D (10 − 2 cm 2 /s or 10 − 5 cm 2 /s) and/or the initial number of employed particles (N P = 4× 10 4 , 4× 10 3 or 4× 10 2 ) [see 105, for further details]. As showed in Fig. 3.11 (see case for Da= 10 − 5 ), the reactive flow scenario is dominated by mixing, leading to a numerical solution that coincides with the analytical one (eq. 3.14) for most of the simulation time. Increasing Da generates anomalies due to diffusion limited conditions, which implies segregation of particles and consequently a reduction in the system’s reactivity potential [89]. The temporal dynamics of the normalized concentration computed by our method matches quite well the trend of the results reported in Benson & Meerschaert [see fig. 2 of Ref. 105] thus demonstrating the ability of our algorithm to simulate mixing driven reactive transport. Figure 3.11: Comparison between the numerical results for the normalized concentration of the reagent A obtained by the reactive RWPT algorithm PARxn 2 for different Dahmk¨ ohler numbers. Results are compared with the analytical solution provided in equation (3.14). Results displayed for the well-mixed flow scenario. 49 3.4.2.2 Reactive Transport in a Plug Flow Scenario We consider here a plug flow scenario that is representative of the soil column laboratory experiment reported by Gramling et al. [119]. The exact analytical solution for this scenario (mass of reaction product C) is known and reported in equation (3.11). The (constant) velocity characterizing the plug flow system is u 0 = 0.60 cm/s and the diffusion coefficient is set to D= 1.75× 10 − 3 cm 2 /s. As for the particle tracking simulation, the number of particles used is N P = 2× 10 4 and the total simulation time is t= 100 s with a time step∆t= 0.01 s. We report results for different values for Da (i.e. by varying the reaction rate constant k f ). Note that the Gramling et al.’s [119] experiment is carried out with a k f in the order of 10 4 M − 1 s − 1 , since the reaction is instantaneous. However, due to the proportionality of the probability of reaction with k f (see eq. 3.9), this high value for the reaction rate constant would lead to probability values larger than unity. Sanchez-Vila et al. [120] stated that if the system is characterized by a Da> 100 (i.e., instantaneous reactions), the system reaches local equilibrium instantaneously and the reaction coefficient can be arbitrary chosen. Consequently, in the simulation characterized by a Da= 10 2 , we adopt a k f = 200 M − 1 s − 1 . We also report simulation results for Da values ranging from 10 1 to 10 − 1 . Here, we define l c , in eq. 3.13, as H/π [see 121]. The results depicted in Fig. 3.12 indicate that Da values in the order of 10 2 are necessary to correctly simulate the variation of product’s mass in time for instantaneous/fast reactions (in agreement with Sanchez-Vila et al. [120]). As expected, a reduction in the Da number leads to a slow down of the initial system’s reactivity. However, when t>τ D (see vertical green line Fig. 3.12), the system can be considered well mixed and all curves converge. 3.4.2.3 Reactive Transport in a Viscous Flow Scenario For our next comparison, we investigate transport in a steady-state viscous flow field within a 2D channel (i.e. Hagen-Poiseuille law) in the absence of gravity effects. Reactive transport in this flow 50 Figure 3.12: Comparison between the numerical results for m C , obtained by the reactive RWPT algorithm PARxn 2 for different Dahmk¨ ohler numbers, and the analytical solution given by equation (3.11). The vertical light green line marks the diffusion time scaleτ D . Results shown for a plug flow scenario. field was subject of investigation in multiple works [109, 121, 122]. For this case, the laminar flow field is characterized by the well-known parabolic velocity profile, u(y)= 3 2 u m 1− y 2 ℓ 2 y ! (3.15) whereℓ y = L y /2 and u m is the cross-sectional mean velocity. For this computational illustration, we set u m = 0.4cm/s whereas all the parameters are identical to the plug flow scenario investigated in section 3.4.2.2. Similar to the work of Perez et al. [109], we compare the results from our methodology with the analytical results for m C at early and late times. The diffusive and the advective times scales are defined as τ D = l 2 c /(D) andτ v = H/u m respectively [121]. Early times are defined by t<τ v whereas late times are defined by t>τ D . Initially, when t<τ v , the species are transported mainly by diffusion and the product of mass m C generated by the reaction of the reagents A and B for 51 instantaneous reactions is given by equation (3.11). While for late times (i.e., t>τ D , Taylor regime), the reagents’ transport can be characterized by an advection-dispersion equation parameterized by the mean velocity u m and a Taylor (effective) dispersion coefficient D eff = D+ 2v 2 m ℓ 2 y /(105D) [123]. The generated product of mass C is expressed by equation (3.11), with the difference that the diffusion coefficient D is substituted by the effective dispersion coefficient D eff in the large time regime. Fig. 3.13 shows the agreement between the numerical solution for Da in the order of 10 2 as well as the analytical solution (3.11) at both early and late time regimes. Similar to the plug flow scenario previously discussed, the value of the Da number impacts the agreement between the particle tracking method and the analytical solution. At late times (i.e. t>τ D ), well mixed conditions are reached along the surface of contact of the two reagents and even small values of k f are sufficient to generate m C as predicted by the analytical solution. Finally, the results depicted in Fig. 3.13 confirm that the numerical solution provided by PARxn 2 coincides with the analytical solutions at both early and late times for conditions where the reaction can be classified as instantaneous (i.e. Da≥ 10 2 ). For completeness, we include in Appendix B a comparison between the solution obtained by PARxn 2 and an Eulerian based solution. In Appendix B, we also show the effect of the number of particles used in the PARxn 2 code on the mean absolute error of the solution with respect to the Eulerian results. 3.5 Illustrative Examples In the following, we illustrate the reactive RWPT code in two distinct 2D flow fields. The first example consists of a Darcy flow field past a permeable inclusion and the second example relates to a vortical flow. In both cases, the flow field is at steady-state and is spatially variable. 3.5.1 Darcy Flow Past a Permeable Circular Inclusion This illustration consists of an unbounded Darcy flow past a permeable circular inclusion (DPC). The permeable circular inclusion is characterized by a radius a o . The ratio between the inclusion 52 Figure 3.13: Comparison between the numerical results for m C , obtained by the reactive RWPT algorithm PARxn 2 for different Dahmk¨ ohler numbers, and the analytical solution given by equation (3.11). Results shown for the viscous laminar flow scenario. Theoretical results are shown for both early (t<τ v ) and late (t>τ D ) time regimes. The early and late times regimes are marked by the vertical dark and light green lines respectively. Table 3.1: Input parameters used in the simulations. Left column reports the parameters’ values used for the Darcy flow past a permeable inclusion (DPC). Right column reports the parameters’ values for the Taylor-Green vortex (TGV) flow. DPC TGV L x 0.0001 L x π cm L y 0.2 cm L y π cm [A 0 ] 1/(0.2× 0.0001) g/cm 2 [A 0 ] 1/π 2 g/cm 2 [B 0 ] 1/(0.2× 0.0001) g/cm 2 [B 0 ] 1/π 2 g/cm 2 D 1.75× 10 − 3 cm 2 /s D 1.75× 10 − 3 cm 2 /s u b 0.85 cm/s u o 1 cm/s ∆t 0.01 s ∆t 0.01 s N P 2× 10 4 N P 8× 10 4 k f 200 M − 1 s − 1 k f 200 M − 1 s − 1 a o 0.5 ν 10 − 2 cm 2 /s κ 0.5 L o π cm 53 Figure 3.14: Temporal evolution of the product mass m C in a Darcy flow past a permeable circular inclusion. Results obtained with PARxn 2 . The top row of figures shows six snapshot of the 2D plumes for reagents A (red) and B (blue) as well as the plume for the product C (purple). The flow domain is characterized by a permeability contrast ofκ = 0.5. The streamlines for this flow system are illustrated in the inset on the bottom right vertex of the plot (see eq. 3.16). 54 Figure 3.15: Temporal evolution of the product mass m C in a Taylor-Green vortex flow. Results obtained with PARxn 2 . The top row of figures shows five snapshot of the 2D plumes for reagents A (red) and B (blue) as well as the plume for the product C (purple). An inset is included on the bottom right vertex of the plot to illustrate streamlines. 55 permeability to the permeability of the background porous matrix is given byκ and u b is the Darcy scale velocity in the background porous matrix. The flow field is mathematically described by the following velocity potential (see details in Eames and Bush [124]): φ(r,θ)= u b 1+ d o r 2 r cosθ, r> a o ; 2u b κ (1+κ) r cosθ, r< a o ; (3.16) with d o = a 2 o (1− κ)/(1+κ). Whenκ> 1, the circular body is more permeable than the surrounding porous matrix whereas whenκ< 1, we have the opposite. The velocity field for this system is given by u=∇φ. To analyze the transport behaviour in this flow system, we consider an instantaneous injection of a line source covering a transverse direction L y . The line source is positioned at the beginning of the computational domain (x= 0) with a transverse dimension equal to L y . The top half of the line source contains A particles whereas the other half contains B particles. The parameters used in this simulation, as well as the specifications of the flow domain, are reported on the left column of Table 4.1. In our illustrative example, depicted in Fig. 3.14, we consider a circular inclusion characterized by a permeability value smaller than the permeability of the surrounding porous media (i.e. κ< 1, see Table 4.1). The top part of Fig. 3.14 shows six snapshots of the solute plume evaluated at different simulation time steps, respectively indicated on the horizontal axis of the plot at the snapshot position. The presence of the low permeability inclusion increases the contact surface between the two reagents, leading to enhanced mixing; this is confirmed by an increase in the slope of m C . Fig. 3.16 illustrates the temporal evolution of masses A, B and C obtained in this test case and provides evidence that the same number of particles of A and B reacts in a given time step. Close inspection of Fig. 3.16 shows that the temporal evolution of the masses A and B are in good 56 Figure 3.16: Illustration of the temporal evolution of m A , m B and m C in the Darcy flow past a permeable circular inclusion. agreement for the bimolecular reaction under consideration. The result reported in Fig. 3.16 shows that the conservation of mass is respected. 3.5.2 Vortical Flow For the second illustrative example, we consider a well-established flow model known as the Taylor- Green vortex (TGV) in an unbounded domain [125]. This 2D flow is characterized by the following velocity field [126]: u(x)= u o h − e − 2tν/(L o u o ) cosxsiny, e − 2tν/(L o u o ) sinxcosy, 0 i (3.17) with L o denoting the spacing between the vortices,ν represents the fluid’s kinematic viscosity and u o is a characteristic velocity. All parameters are listed in the right column of Table 4.1. For this illustration, we set L o = L x = L y . The flow field consists of counter-rotating vortices with maximum velocity at the center of the vortices. The two reagents A and B are initially distributed over an area L x × L y , as illustrated in the first snapshot on the top of Fig. 3.15. As shown in Fig. 3.15, at t= 0, 57 the lower diagonal half of the source area is occupied by A particles and the upper diagonal half of it contains B particles. Similar to the first illustrative example, Fig. 3.15 shows five snapshots of the solute plumes at different simulation time steps, respectively indicated on the horizontal axis of the plot at the snapshot position. Due to the initial position of the reagents and the shape of the velocity field, the contact surface between A and B increases with time which controls the production rate of m C . 3.6 Summary In this work we propose a new approach to quantify reactive transport in spatially heterogeneous flow fields through the use of the random walk particle tracking technique. These flow fields are typically encountered in many environmental applications such as subsurface hydrology, reservoir engineering and surface water flows. In our work, we focus on bimolecular reactions. In the following, we summarize the key advantages of our proposed approach. 1. PARxn 2 ’s computational time scales linearly with the number of particles N P . This leads to a computational complexity of O(N P ) which makes the introduced algorithm suitable for simulations involving large number of particles. 2. We developed a new geometrical concept for the kernel function. The shape of this kernel function is optimized to fit the needs of the proposed algorithm implementation. 3. PARxn 2 can be parallelized. Due to the algorithm framework, the probability of reaction of each particle can be computed independently. This eliminates restrictions related to the maximum number of processors usable in the parallelization process and simplifies its implementation by avoiding the definition of sub-domains, radius of influences etc. 4. PARxn 2 ’s flexibility makes the algorithm suitable for simulating different scenarios, char- acterized by different domain geometry and velocity fields. We illustrate how the proposed method is applied to solve problems in 1D and 2D domains with diverse velocity fields. 58 In order to test the proposed algorithm, a computational speedup analysis was performed and a comparison with existing approaches was carried out. As shown in Fig. 3.10, the proposed algorithm is computationally efficient, between 73% and 96% faster than the other methodologies used for benchmark. We also validated the particle tracking results with existing analytical solutions for reactive transport. Finally, future research is necessary in order to expand the algorithm to account for more complex chemical reactions and to address transport phenomena in 3D domains. Regarding the latter, a 4D kernel function will be required. 59 Chapter 4 Computationally Efficient Characterization of Groundwater Contaminant Transport for Supporting Risk-Based Decision Making 4.1 Introduction Assessing the risks associated with the presence of pollutants in groundwater often times relies on the use of mathematical models. The key challenge is that model predictions in the subsurface environment are subject to significant amount of uncertainty. These uncertainties arise due to insuf- ficient site characterization and our inability to fully resolve the spatial fluctuations of hydrological properties at multiple scales. The combined effect of these factors leads to uncertainty in model input parameters which render model outputs to be uncertain [127]. Quantifying this uncertainty and understanding how it propagates to quantities of interest, such as environmental performance metrics [128], are critical in risk analysis as well as for decision makers to better allocate resources toward uncertainty reduction [129] and define optimal aquifer remediation strategies [130]. A number of computational approaches have been proposed to quantify the effects of aquifer heterogeneity on the spatiotemporal dynamics of a solute plume (see the following review articles, [72, 131, 132]) and the associated uncertainty (e.g., [133–139]). The computational stochastic frameworks provided in most of the above-mentioned works are of analytical or semi-analytical 60 nature (i.e. based on perturbation theory). Many of these analytical approaches have been applied to human health probabilistic risk analysis ([29, 129]) and successfully compared with concentration field data (i.e.,[140]). Fully numerical approaches allow to relax on simplifying assumptions and uncertainty is typically quantified through the Monte Carlo framework (e.g., [67, 141–143]). An in-depth analysis of the reliability assessment of the computed statistical moments obtained from Monte Carlo simulations within the context of subsurface hydrology is provided in [9]. In addition to Monte Carlo methods, other approaches for uncertainty quantification are reported in the literature (see [144, 145] and references therein). A review comparing the advantages of different numerical stochastic methodologies used for uncertainty quantification can be found in [4]. Despite significant contributions in computational methods in the field of stochastic hydro- geology (see [127]), many of the developed computational tools are not easily accessible to the hydrological community. With the exception of few tools [e.g., [146–148] amongst others], most existing computational tools are difficult to access and are not open source. There is an ever- increasing need within the hydrological community for models’ transparency and reproducibility. Being able to reproduce numerical results is, nowadays, an essential feature that needs to be present in tools used for environmental modeling, risk analysis and data management (e.g., [149, 150]). The usage of collaborative coding environments (such as GitHub [151]), where scripts are written in open-source languages (e.g., Python [152]) has the potential of creating clear, shareable and reproducible knowledge. As stated by [153], the absence of those characteristics can reduce the credibility of the model as a decision support tool and hamper resource management efforts. In this work, we present a computational package that links the various components relevant for the estimation of the pollutant concentration at an environmentally sensitive target and its associated uncertainty. The computational framework builds upon existing computational tools such as HYDRO GEN [154], FloPy [11], and a GPU-based random walk particle tracking code [1]. The key features of this computational toolbox are as follows: • The proposed computational toolbox is fully open source, including coding language and utilized software, to enhance accessibility of the modeling workflow. 61 • All the utilized software are run through Python scripts, to provide a complete, transparent and repeatable record of the modeling process following the ideas put forth in [11]. • All the steps necessary to construct the model, compute risks and uncertainty are smoothly connected in a unified script, to ensure efficiency in computing the Monte Carlo iterations, for uncertainty quantification. • All the software have been selected for their precision, robustness, compatibility among each other, user-friendliness and transparency. • All files are shared on a GitHub repository, to be constantly accessible, editable and expand- able as a communal effort in creating a consistent and efficient modeling framework. As mentioned by [11], models are commonly constructed with a graphical user interface (GUI), due to their interactive environment and guided structure in populating the model and post process the results. However when GUI are utilized for constructing and post processing numerical groundwater flow and transport models, no records of the modeling process are available, limiting repeatability and accessibility of the employed modeling framework. On the other hand, Python scripts can be seen as a complete, clear, easy and readable structure of the modeling approach. It serves as a documentation of the model input data and provides the hydrological community a cooperative script-based workflow easy to access [155]. The visual and interactive nature of our workflow and software package enhance the accessibility and understanding of model predictions, overcoming the communication limits of static documentation, with the final goal of assisting decision makers [156]. For such reasons, we provide a step-by-step tutorial on how to utilize the proposed package and illustrate how this computational tool can be employed to 1) perform risk assessment and 2) improve our fundamental understanding of the role of aquifer heterogeneity in the physics of contaminant transport. 62 4.2 Problem Statement We start by considering a scenario where an hazardous substance is released into an aquifer with ambient base flow rate Q b . The contaminant plume originating from the source zone will undergo a series of physical and (bio)chemical processes until it reaches a receptor, e.g. a compliance plane or pumping wells. Due to limited site characterization of the subsurface environment, the spatiotemporal dynamics of the solute plume is subject to uncertainty. In this work, the main source of uncertainty stems from the randomness of the hydraulic conductivity field, denoted by K. Under these conditions, decision makers are interested in determining the probability that the contaminant concentration C at an environmentally sensitive location will exceed a threshold value C ∗ established by a regulatory agency, namely Prob[C> C ∗ ]. The concentration estimate C at a given location x=[x 1 ,...,x d ], with d denoting the dimensionality of the flow domain, and time t is typically obtained by solving the partial differential equations governing the physics of transport. The concept of reliability, traditionally defined as the probability of non-failure, can be employed to formulate this problem. Letξ(t) denote the aquifer reliability [-] evaluated at an environmentally sensitive target exposed to contamination for a given realization of the hydraulic conductivity field. In this work, we define that aquifer reliability function at the environmentally sensitive target as follows: ξ(t;V T )= 1, C(x∈V T ,t)≤ C ∗ , 0, otherwise, (4.1) where t is time andV T is the geometric configuration (i.e. volume, area, line or point) that characterizes the dimensions of the environmentally sensitive location. In order to measure the capability of the aquifer to recover its reliability as a potable water source, we will rely on the concept of loss of resilience. The aquifer resilience loss, denoted here by R L , over a given time period t 0 ≤ t≤ t 0 +t f is computed at the target locationV T as follows: 63 R L (t;V T )= Z t 0 +t f t 0 Ψ(t;V T )dt, (4.2) where Ψ(t;V T )= 1− ξ(t;V T ). (4.3) Due to the inherent uncertainty in the parameters characterizing the aquifer system, such as the K-field, the functions C,ξ and R L are regarded as random functions. As a consequence, bothΨ and R L are characterized in terms of their statistical moments and their probability density functions (or cumulative density functions). For this work, we will compute uncertainty inΨ and R L through Monte Carlo framework. With the goal of ensuring transparency and reproducibility of the results presented in this work, we provide all the codes necessary to compute Equations (4.2) - (4.3) for a fully worked illustration. We include a detailed description of the files and scripts for the illustrations, following the mindset presented in some works within the hydrological community (e.g.,[149, 153]). 4.3 Methods and Implementation Our approach for modeling the resilience loss of an aquifer consists of following four components: the generation of the hydraulic conductivity fields, a groundwater flow and contaminant transport model accompanied by an initially known contaminant injection zone, the computation of Equation (4.2) and a Monte Carlo framework to evaluate the uncertainty associated with the resilience loss. For our computational illustrations, we will assume that groundwater flow and contaminant transport take place in a hypothetical two-dimensional (2D) aquifer. The computational domain has dimensionsℓ i along the i th direction where i= 1,2. Below we provide details regarding each step. 1. Geology and Geostatistics: First, a description of the site geology is needed. This description is based on a grid of hydraulic conductivity values that will serve as input for the groundwater flow model. The log-conductivity field is assumed to be isotropic and spatially heterogeneous 64 Y(x)= logK(x). As previously mentioned, Y is uncertain given the incomplete information of the hydrogeological system. Therefore, Y is regarded as a random space function (RSF) [127, 157] and considered here as statistically stationary and multi-variate Gaussian. The RSF model for Y is therefore characterized by the mean (µ Y ), variance (σ 2 Y ) and spatial covariance C Y (r) of Y with r denoting the lag-distance. We adopt an exponential model forC Y with isotropic correlation lengthλ. The ensemble of random Y fields used in the simulations is generated using the robust HYDRO GEN tool [154]. 2. Groundwater Flow Field: Groundwater flow is assumed to be at steady-state and far from the presence of sinks and sources. The governing equation for the flow field is provided in the Appendix A, see Equation (D.1). Permeameter-like boundary conditions are assumed, i.e. prescribed hydraulic heads at the inlet (h in ) and outlet (h out ) along the longitudinal direc- tion of the computational domain and no-flux conditions at the remaining boundaries. The groundwater balance equations are solved numerically and the solution of hydraulic head in the computational domain is obtained. Groundwater fluxes are computed using MODFLOW [59] together with the Python library FloPy [11]. The dimensions of the numerical grid block are∆x i , with i= 1,2. The velocity field can be calculated through Darcy’s law by combining the computed hydraulic head with the hydraulic conductivity and with the knowledge of the porosity. 3. Solute Transport: A contaminant is instantaneously released along a source zone with area A o =∆s 1 × ∆s 2 . The initial concentration of the contaminant is given by C 0 . We assume that transport is non-reactive and governed by the advection-dispersion equation (see Appendix A, Equation D.2). Transport is solved through the use of a Lagrangian-based simulator, i.e., Random Walk Particle Tracking (RWPT). The transport simulator is a parallelized GPU-based RWPT dubbed PAR 2 [1]. The transport simulations will allow to compute the concentration breakthrough curve at a given target location (i.e. a protection zone or an observation well). 65 The concentration values at the target location are then compared to a given regulatory thresh- old value, C ∗ , for the pollutant of interest. Based on the solute breakthrough curves and C ∗ , we can then calculate the resilience loss, Equation (4.2). 4. Uncertainty Quantification : To estimate the uncertainty associated with the quantities of interest, we employ a Monte Carlo (MC) framework. In this approach, a series of hydraulic conductivity field realizations are generated, based upon some geostatistical representation of the subsurface environment (see step 1). Then, the groundwater flow and contaminant transport equations are solved for each realization of the K field. This results in a statistical description of the concentration breakthrough curves at a given location and consequently, the resilience loss, Equation (4.2). In our fully worked out example, we evaluate an ensemble consisting of five hundred realizations of the conductivity field ( N MC = 500). 4.4 Tutorial Here, we describe the structure and the contents of the computational code adopted in our study and a tutorial. For our case study, we provide an open-source code to ensure the reproducibility and re- usability of its model outputs [155]. We will make use of the appealing features of Jupyter Notebooks (i.e., web-based applications) to write a simple and transparent code. The proposed package is aimed at Visualizing Uncertainty for Hydrological Risk Analysis and it is calledVisU-HydRA. 66 Table 4.1: Input parameters used in the simulations. Random Space Function Model for Y = logK Symbol Value Units ℓ 1 × ℓ 2 170× 150 [m] µ Y 1.6 [m/day] K G = exp[µ Y ] 5 [m/day] σ 2 Y 3 [-] λ 1 ,λ 2 8,8 [m] Flow Simulations h in , h out 1,0 [m] ∆x 1 × ∆x 2 1× 1 [m] t TOT 1000 [days] ∆t 4 [days] Transport Simulations α 1 ,α 2 0.01,0.001 [m] D m 8.6× 10 − 5 [m 2 /day] s 0 1 , s 0 2 25,65 [m] ∆s 1 ,∆s 2 12,20 [m] ν 0 1 ,ν 0 2 117,55 [m] ∆ν 1 ,∆ν 2 12,40 [m] C 0 1 [mg/L] C ∗ 0.001 [mg/L] N p 10 5 [-] 67 Figure 4.1: Jupyter Notebook schematic representation to illustrate the tutorial framework. Each number indicates one of the Jupyter Notebook code cells, and the description clarifies its output. Colored are the names of the employed software and N MC stands for the number of Monte Carlo realizations. 68 Figure 4.2: Jupyter Notebook schematic representation to illustrate the code cells of the proposed tutorial. In the Jupyter Notebook, code cells are one after the other and more extended, compared to the illustration above. 69 Table 4.2: Schematic summary of what needed and what has to be installed by the user to run the tutorial that reproduces the data showed in the analysis carried out in this work. What you need What to install Output K Fields Generation •hydrogen linux or hydrogen mac •hydrogen input.txt •hydrogen.py •numpy library •Kfield Hydrogen.npy Flow Simulations •mf2005dbl.exe •flopy library •N MC model-*.ftl files in the folder tmp Contaminant Transport Simulations •NVIDIA GPU •config.yaml •config-tmp.yaml •par2.exe •yaml library •os library •subprocess library •N MC result-*.csv files in the output folder •snap-{}-*.cvs files for each chosen simulation time step ({}) Uncertainty Quantification & Risk Analysis •RAUQ function.py •matplotlib library •data output folder 70 TheVisU-HydRA package is available on GitHub repository (https://github.com/mariamorvillo/ VisU-HydRA). A Jupyter Notebook, named “Tutorial MC F&T.ipynb”, contains all the scripts necessary to produce the results and analysis associated with the case study presented in this work (see further details in Section 4.3). All other functions and files are also made available to support the user in better understanding the features of the tutorial and to eventually apply all (or some) of the available tools to scenarios of their interest. As shown in Figure 4.1, the tutorial consist of six components. These components are subdivided into code cells (see Figure 4.2) in the Jupyter Notebook as follow: 1. Import Libraries: The first tutorial code cell imports libraries which are needed in order to run some of the tools offered by Python. A Python library is simply a collection of codes, or modules of codes, that can be used in a program for specific operations. Among the most commonly used libraries,NumPy supports large matrices, multi-dimensional data and consists of in-built mathematical functions to facilitate the computations [158]. Further libraries are needed in order to generate spatially random K fields, to assist in the execution of flow and contaminant transport simulations and in post-processing the output data to compute relevant quantities such as the reliability and resilience loss of the aquifer. 2. Model domain & Grid Definition : In the second code cell, the dimensions of the compu- tational domain and related contaminant source and target zone areas as well as the charac- teristics of the numerical mesh (i.e. the discretization scheme) are declared. The hydraulic properties of the aquifer, such as the hydraulic conductivity, are specified for each cell which will serve as input for the flow simulator (such as MODFLOW). In any finite-difference based flow simulation, such as the one employed in this analysis through MODFLOW, the hydraulic heads are calculated at discrete points in space; those points are termed the nodes of the Finite Difference or Finite V olume grid with dimensions n col × n row × n lay (e.g., [83]). 71 3. Hydraulic Conductivity Fields Generation: This code cell contains the first step of the MC loop (see Figure 4.1). As stated in Section 4.3, this analysis considers 500 realizations of the hydraulic conductivity field, namely N MC = 500. To start, spatially heterogeneous log-conductivity fields Y≡ ln[K] are generated, with the characteristics indicated in Table 4.1. To produce the ensemble of spatially correlated Y field realizations, we use HYDRO GEN [154]. The HYDRO GEN executable is available for Linux and Mac platforms, meaning that if a Windows platform is used, the user needs to generate a file containing the K fields realizations (’Kfileds Hydrogen.npy’) on a different platform. The output of the code cell is a “.npy” file containing the K values related to all the generated fields. Each of these K fields are distributed along the file columns. The user has to declare only the number of MC realizations and the used operating system in the code cell. All the information, such as the geostatistical parameters used to generate the random K fields, have to be indicated on the ”hydrogen info.txt” file. This “.txt” file can be easily compiled following the instructions included in the “manual hydrogen.pdf” file made available by the software creators [154] and included in the GitHub repository, as all the documents discussed in this work (see Table 4.2). 4. Flow Simulations: The fourth code cell computes the second step of the MC framework, by processing the flow simulations using the randomly generated heterogeneous K fields. The flow field is computed using numerical simulator MODFLOW [59]. To create, run, and post-process MODFLOW-based models, the Python package FloPy [11] is employed. The FloPy library is imported in the first step on this tutorial. The MODFLOW exe- cutable, “mf2005dbl.exe”, is needed in order to run this code cell (in the GitHub repos- itory, the one for Windows platform), while all the variables related to the flow simula- tion (see [59] for details regarding MODFLOW and its packages0, have to be specified in this cell’s code. Documentation on how to install the FloPy package is available on https://github.com/modflowpy/flopy, in addition to the full description of all the func- tions available in the just mentioned package, needed to run different MODFLOW’s features. 72 The cell gives as output, in the folder named “tmp”, N MC “.flt” files (“model-*.flt”), each of them containing respectively the flow simulation output related to the “*” MC iteration. 5. Contaminant Transport Simulation: PAR 2 [1] is used to run the contaminant transport simu- lations. It requires, as input, the groundwater flow velocities originating from the previous code cell output. PAR 2 is a GPU-accelerated solute transport simulator, as explained in Section 4.3, and consequently needs to be run on a platform equipped with an NVIDIA GPU. Simulation parameters can be easily defined through a YAML configuration file. Thus, to run this code cell, the user needs the PAR 2 executable, “par2.exe”(in the GitHub repository, the one for Windows platform) and two YAML files. Note that “config-tmp.yaml” is a temporary file that is modified at each MC iteration, following the structure of “config.yaml”, by substituting the “*” symbol with the number of the current MC iteration and the “{}” symbol with the evaluated simulation time step. The user need to provide the input simulation parameters in both the YAML files. Further information on how to compile these files can be found at “PAR2Info”. The output of this code cell, in the folder “output”, are N MC .csv files (“result-*.csv”), each of them containing respectively the data related to the contaminant’s cumulative breakthrough curves at the selected control planes for the “*” MC iteration . Snap files (i.e., “snap- {}-*.csv”), for each MC iteration, can be an output of the simulation as well. Those files contain respectively the positions of the particles in which the contaminant has been discretized into, at the “{}” time step selected by the user through the YAML file. A schematic representation of the potential output of a contaminant transport simulation with PAR 2 is shown in Figure 4.3. As mentioned above, snapshot files are created for each simulation of the MC ensemble. The time required for a simulation is, among other variables, proportional to the number of generated snapshot files. As a consequence, the user should choose this variable wisely, to avoid too prolonged simulations. 73 Figure 4.3: Schematic representation of the contaminant transport simulations code cell output. The image schematizes a possible output (depending on the declared parameters in the YAML file) of a randomly chosen iteration among the Monte Carlo realizations, the 47 th . This realization outputs: result-47.csv, a file that contains the data related to the contaminant’s cumulative breakthrough curve at the evaluated control plane, in the image the red vertical line. In addition to that, snapshot files are generated every 400 simulation time steps. snap-0-47.csv, snap-400-47.csv, snap-800- 47.csv, and snap-1200-47.csv containing respectively the positions (x=[x,y,z]) of each particle representing the contaminant plume. For visualization purposes, the position of the 1 st ,2 nd and 6478 th particles are highlighted by red circles. 74 6. Uncertainties Quantification & Risk Analysis : This component of the tutorial consists of post-processing the previous data and generating the graphical output that can support the decision-making process. In order to run this last part of the Jupyter Notebook, the “.py” file containing the scripts of the implemented functions is necessary to elaborate the data coming from the previous sections. The file is called “RAUQ function.py”, written following a basic and intuitive structure to allow the user to access and easily modify it accordingly. This section of the package allows the user to visualize the geometry of the model, the location of the source of contaminant and target zone (A T =∆ν 1 × ∆ν 2 ) and the positions of observation wells. The user can also visualize the spatiotemporal evolution of the solute plume (including the positions of the leading edge of the plume and the maximum concentration) and the hydraulic conductivity field (see Fig. 4.4). For this plume visualization step, the user can choose a specific MC realization and the specific snapshot in time ([days]). The user can also visualize the ensemble statistics of the concentration field in both space and time as well as other risk-related metrics (e.g., probability of contaminant concentration exceedance and the statistics of the maximum concentration, resilience loss, reliability, etc). Further detail on the generated files and other information can be found in the text included in the Jupyter Notebook. 4.5 Application to Risk and Resilience 4.5.1 Probability of Concentration Exceedance and Resilience Loss Maps We will now useVisU-HydRA to investigate the risks associated with an accidental benzene spill. For this hypothetical case study, we will consider a 2D simulation as previously described. All parameter values employed in the upcoming results are reported in Table 4.1. Benzene is a wildly used chemical for industrial solvents and for constituents of fossil fuels and is considered to be a major threat to groundwater resources and human health [159]. Benzene spills are typically associated with transportation and storage tank leakages. Due to its potential health risk (e.g., 75 Figure 4.4: Schematic representation of the contaminant plume, in green, of a Monte Carlo realization, characterized by the illustrated hydraulic conductivity field. The figure shows where observations wells (circles) are positioned over the target area (rectangle) and how the locations of the leading edge of the plume (blue cross) and of the maximum contaminant concentration (red cross), experienced at each time steps of the simulation, evolve in time. As we can observe, the leading edge of the plume tends to follow a trajectory, highlighted by the dotted blue line, while the maximum concentration experienced in the field bounces from a location to another. [160]), the State of California (USA) set the Maximum Contaminant Level, i.e. the highest level of a contaminant that is allowed in drinking water, to C ∗ = 10 − 3 [mg/L] [161]. Note that benzene’s degradation in water is extremely slow (i.e., [159]), and consequently, for the purpose of our illustration, we will assume transport to be non-reactive (see Equation D.2). Figure 4.5 depicts the spatial map of the expected value ofΨ (Figure 4.5, top row), namely⟨Ψ⟩, as well as its varianceσ 2 Ψ (Figure 4.5, bottom row) at different time snapshots. The statistics ofΨ (4.3) are computed over all N MC realizations of the hydraulic conductivity field. Given that ξ (Eq. 4.1) is a Bernoulli distribution,Ψ (Eq. 4.3) also follows a Bernoulli distribution (see Section 2.2. of [162]). Therefore, the expected value ofΨ is equal to the probability of concentration exceedance at the position x and instant of time t. In other words,⟨Ψ(x,t)⟩≡ Pr[C(x,t)≤ C ∗ ]. The results shown in Figure 4.5 (top row) show that the probability of being at risk decreases with time as an outcome of the enhanced plume dilution due to macroscale spreading (e.g., [131, 143, 163, 164]). The maps depicted in Figure 4.5 can be used by decision makers for evaluating the risks associated 76 Figure 4.5: Maps of the probability of concentration exceedance (top), and its uncertainty (bottom) at different instant of time in the simulation (columns). Plots of the top row shows how⟨Ψ⟩ [-] evolves in space respectively after 0, 10 and 30 days from the beginning of the contamination process.⟨Ψ⟩ is expressed as a probability, as indicated by its values on the color bar on the right. The bottom row showsσ 2 Ψ , as a measure of the uncertainty related to the information given by the plots in the row above. with contamination, identifying sampling locations and allocating resources towards uncertainty reduction. Next, we analyze the loss of resilience of the aquifer system. Figure 4.6 shows the spatial map of first two statistical moments of R L , see Eq. (4.2). The results illustrated provide information regarding the locations where the expected resilience loss will be the largest (Figure 4.6, top) and its corresponding uncertainty (Figure 4.6, bottom). R L represents a measure of the amount of days necessary for the aquifer to recover up to a state where the risks associated with the contamination can be considered negligible. As expected, the values of⟨R L ⟩ increase with travel distance from the source location. This is explained by the increase of the macroscale dispersion as the plume moves downstream from the source. An increase in plume dispersion leads to the presence of long tails in the solute breakthrough curve. Therefore, the residence time for the plume while crossing a given location increases thus leading to an increase in the averaged resilience loss. As observed in Figure 4.6 (top), the maximum 77 Figure 4.6: Maps of the average loss of resilience (top) in the model domain, and its variance values (bottom) throughout the whole simulation. The plot on the top shows how⟨R L ⟩ [d] evolves in space, expressing the amount of days necessary for each point in the domain to recover up to a reliable status. The amount of days associated with the different sheds of blue are indicated by the values on the color bar on the right of the top plot. The plot at the bottom shows the values of the variance of R L ,σ 2 R L , as a measure of the uncertainty related to the information given by the plot above. 78 number of days necessary for the right boundary of the flow domain to recover is approximately 80. The results at the bottom of Figure 4.6 quantifies the uncertainties related to the estimated resilience loss values. As shown in Figure 4.6 (bottom), larger uncertainty in R L is observed at large distances from the source. One possible explanation for this phenomenon may be due to the fact the contaminant plume has sampled more fluctuations of the velocity field (i.e. the plume travels through many correlation scales) thus leading to increased uncertainty in the solute breakthrough curves at those locations. 4.5.2 Impact of contaminant source efficiency on aquifer reliability The computational packageVisU-HydRA can also be employed to improve our understanding on the impact of groundwater flow physics and decision making metrics such as R L . Here we illustrate how the hydraulics conditions within the contaminant source zone could be used as an indicator of R L . [165] introduced the source zone efficiency η η = Q sz Q b , (4.4) where Q sz is the volumetric flow rate [ m 3 /d] crossing the contaminant source zone while Q b indicates the total background volumetric flow rate passing through the entire aquifer’s cross section. [165] showed that Eq. (4.4) controls the overall plume dispersion downstream from the source. Similar results are reported in [143] in the context of risk analysis, where the authors showed that high water flux crossing the source zone leads to decrease in the magnitude of human health risk due to the presence of chlorinated solvents. [166] also report experimental evidence regarding the importance of source zone hydraulics on transport behavior. Figure 4.7 shows the scatter plot of the maximum value of resilience loss R L experienced within the target area (A T =∆ν 1 × ∆ν 2 , see Table 4.1) and η. Each point in Figure 4.7 represents the R L obtained for each realization of hydraulic conductivity ensemble. The data are fitted by a logarithmic curve, the blue line, that has been found through genetic programming (see [167]). The 79 data suggests that scenarios characterized by highη values correspond to lower estimates of R L . For completeness, we include the three plot insets of the plume. These insets belong to the red dots in Figure 4.7 which correspond to 33 th , 34 th and 102 nd Monte Carlo realizations. The results depicted in the plot insets suggest thatη has a clear impact in controlling the overall longitudinal macrodispersion of the plume, which in turn will impact the value of R L . Close inspection of Figure 4.7 reveals that a realization characterized by a high value ofη, such as realization number 102, is characterized by a compact (along the longitudinal direction) plume and therefore a lower R L value when compared to realization number 33. When the strength of the contaminant source area decreases (η< 1), the plume is more dispersed. Increased plume spreading leads to larger plume residence times at an environmentally sensitive target and therefore higher R L values. Note that the aforementioned conclusions are limited to the groundwater flow and transport scenario adopted, the initial concentration of the contaminant and the threshold concentration C ∗ . Nevertheless, the analysis carried out in this section re-emphasis the importance of 1)η in decision making and 2) the characterization of the source zone in risk analysis. 4.6 Summary In this work we provide VisU-HydRA, an open source, documented, computationally efficient toolbox to characterize specific features of the contaminant plume transport. The proposed package serves as a user-ready toolbox and allows to compute the uncertainty associated with metrics typically used in risk analysis. VisU-HydRA consists of a collection of rapid and open source software which have been assembled to deliver a rapid, reproducible and transparent modeling framework. Computational efficiency and rapidity are ensured by the usage of a GPU-accelerated solute transport simulator [1] and the automatized and solid structure of the iterative processes. 80 Figure 4.7: Maximum value of resilience loss (R L [d]) experienced within the target area versus contaminant source efficiency ( η [-] ) for all the Monte Carlo realizations. Data (gray dots) are fitted by a logarithmic distribution, the blue line in the plot. The red dots highlight three Monte Carlo realizations, the 33 th , 34 th and 102 nd , in which the shape of the contaminant plume is investigated and illustrated in the insets respectively connected to the simulated scenario by the black arrow. The black rectangles in the insets are representative of the target area location. 81 Reproducibility and transparency are guaranteed by the open source coding language, the user- friendly interface of the Python code and the availability of a well documented and easy to follow tutorial made available on a web-based application. In order to support the decision making process, the computational toolbox includes a visu- alization component that allows users to generate probability maps of aquifer resilience loss and risk hot spots. Furthermore, a GitHub repository was created and contains all the material reported to this work. The results reported are limited to a two dimensional application and the hydraulic conductivity field is the only source of uncertainty. As a consequence, the computational toolbox can be expanded to account for other sources of uncertainty as well as three dimensional models. 82 Chapter 5 Summary Reproducible and rapid computational approaches for accessing contamination in hydrogeological systems are the outcomes of this dissertation work. A reduction of the computational burden associated with the prediction of quantity of interest in the context of probabilistic contaminant risk analysis is obtained. The impossibility to fully characterize contaminated aquifers, due to their large scale and the cost of data acquisition, is the reasons why hydrogeological models are often computationally complex, subject to uncertainties and eventually challenging to characterize form a contamination risk point of view. In Chapter 2, we show how to improve the uncertainty quantification of one of the key component in subsurface risk analysis: contaminant first arrival time at a vulnerable target location. Based on the known correlation between this dynamic contaminant transport parameter and the minimum hydraulic resistance, a static connectivity metric, an innovative connectivity-based Monte Carlo approach has been developed. The methodology simply requires to rank the randomly generated hydraulic conductivity fields, associated to the evaluated synthetic realities, accordingly to their minimum hydraulic resistance, before performing the steps required by the classic Monte Carlo stochastic framework. The methodology is tested on two and three dimensional hydrogeological domains, showing a faster convergence of the statistical moments of the quantity of interest at almost no additional computational cost. In fact, due to the efficiency in estimating the connectivity metric, this innovative methodology consistency reduces the time required for first arrival time uncertainty quantification. 83 Since the efficiency of an uncertainty quantification analysis is directly linked to the actual efficiency of each simulated scenario, in Chapter 4 we introduce a novel modeling approach to reduce the computational cost associated with reactive transport numerical simulations. We firstly review the commonly-used Lagrangian reactive random walk particle tracking methodology, used to model chemical reactions on the basis of probabilistic rules, and highlight the impractical computational cost of the serial evaluation of each reagent particle reaction probability. Next, we show how PARxn 2 , a novel algorithm for simulating irreversible bimolecular reactions in two dimensional scenarios, with a computational time complexity linear with the reagent particles number, is suitable for parallelization. The algorithm computational efficiency is tested against other computationally efficient variations of the initially described methodology and its applicability is shown when simulating environmentally relevant flow fields. As outcome of the speed up analysis, the novel modeling methodology resulted up to 96% faster than the methodologies with which it has been compared with, making this approach a good effort in reducing the computational burden associated with numerical simulations and consequently associated uncertainty quantification. To follow, in Chapter 4 we combine computationally efficient tools, to create a reproducible and rapid modeling framework for accessing contaminant transport risk in two dimensional model domains. Uncertainty quantification of specific features of the contaminant transport plume are efficiently characterize and presented in an innovative and visually powerful way. As a matter of fact, among the evaluated environmental performance matrices, probability maps displaying risk hot spots and aquifer resilience are generated to rapidly and intuitively inform decision makers about the spatio-temporal evolution of the risk associated with the contamination. To promote reproducibility, efficiency and consistency in translating data to models, a step-by-step tutorial is provided in combination with the tool box mentioned above. All the proposed modeling framework methodology sections are included in the tutorial, well documented and suitable for iterative execution, to easily address uncertainty quantification. In addition, the scripts are written in an open source language (Python), the tutorial is illustrated through a user-friendly web-based application (Jupyter Notebook) and everything all together is made available on a Git Hub repository, hoping 84 that this could be a starting point of the hydrogeological community general effort to enhance modeling efficiency and conformity. In conclusion, we proposed a diversified set of approaches that aim to improve computational efficiency by restructuring, innovating and combing available methodologies, to reduce numerical modeling computational costs and eventually better quantify contamination uncertainties. 85 References 1. Rizzo, C. B., Nakano, A. & de Barros, F. P. J. PAR2: Parallel Random Walk Particle Tracking Method for solute transport in porous media. Computer Physics Communications 239, 265– 271 (2019). 2. Rubin, Y . Applied stochastic hydrogeology (Oxford University Press, 2003). 3. Allard, D. J.-P . Chil` es, P . Delfiner: Geostatistics: Modeling Spatial Uncertainty 2013. 4. Zhang, D., Shi, L., Chang, H. & Yang, J. A comparative study of numerical approaches to risk assessment of contaminant transport. Stochastic Environmental Research and Risk Assessment 24, 971–984 (2010). 5. De Barros, F. P. J., Fiori, A., Boso, F. & Bellin, A. A theoretical framework for modeling dilution enhancement of non-reactive solutes in heterogeneous porous media. Journal of Contaminant Hydrology 175, 72–83 (2015). 6. Fiori, A. The Lagrangian concentration approach for determining dilution in aquifer transport: Theoretical analysis and comparison with field experiments. Water resources research 37, 3105–3114 (2001). 7. Bellin, A., Rubin, Y . & Rinaldo, A. Eulerian-Lagrangian approach for modeling of flow and transport in heterogeneous geological formations. Water Resources Research 30, 2913–2924 (1994). 8. Steefel, C. I., Appelo, C. A. J., Arora, B., Jacques, D., Kalbacher, T., Kolditz, O., et al. Reac- tive transport codes for subsurface environmental simulation. Computational Geosciences 19, 445–478 (2015). 9. Ballio, F. & Guadagnini, A. Convergence assessment of numerical Monte Carlo simulations in groundwater hydrology. Water Resources Research 40 (2004). 10. Asmussen, S. & Glynn, P. W. Stochastic simulation: algorithms and analysis (Springer, 2007). 11. Bakker, M., Post, V ., Langevin, C. D., Hughes, J. D., White, J., Starn, J., et al. Scripting MODFLOW model development using Python and FloPy. Groundwater 54, 733–739 (2016). 12. Geng, X. & Michael, H. A. Preferential flow enhances pumping-induced saltwater intrusion in volcanic aquifers. Water Resources Research 56, e2019WR026390 (2020). 13. Bianchi, M., Zheng, C., Wilson, C., Tick, G. R., Liu, G. & Gorelick, S. M. Spatial connectivity in a highly heterogeneous aquifer: From cores to preferential flow paths. Water Resources Research 47 (2011). 14. Fuks, O., Ibrahima, F., Tomin, P. & Tchelepi, H. A. Analysis of travel time distributions for uncertainty propagation in channelized porous systems. Transport in Porous Media 126, 115–137 (2019). 15. Shapiro, A. M. & Cvetkovic, V . D. Stochastic analysis of solute arrival time in heterogeneous porous media. Water Resources Research 24, 1711–1718 (1988). 86 16. Rubin, Y . & Dagan, G. Conditional estimation of solute travel time in heterogeneous forma- tions: Impact of transmissivity measurements. Water Resources Research 28, 1033–1040 (1992). 17. Bellin, A., Salandin, P. & Rinaldo, A. Simulation of dispersion in heterogeneous porous formations: Statistics, first-order theories, convergence of computations. Water Resources Research 28, 2211–2227 (1992). 18. Riva, M., S´ anchez-Vila, X., Guadagnini, A., De Simoni, M. & Willmann, M. Travel time and trajectory moments of conservative solutes in two-dimensional convergent flows. Journal of Contaminant Hydrology 82, 23–43 (2006). 19. Gotovac, H., Cvetkovic, V . & Andricevic, R. Flow and travel time statistics in highly hetero- geneous porous media. Water resources research 45 (2009). 20. Fiori, A., Jankovic, I. & Dagan, G. The impact of local diffusion upon mass arrival of a passive solute in transport through three-dimensional highly heterogeneous aquifers. Advances in water resources 34, 1563–1573 (2011). 21. Bianchi, M. & Pedretti, D. Geological entropy and solute transport in heterogeneous porous media. Water Resources Research 53, 4691–4708 (2017). 22. Deutsch, C. V . Fortran programs for calculating connectivity of three-dimensional numerical models and for ranking multiple realizations. Computers & Geosciences 24, 69–76 (1998). 23. Trinchero, P., S´ anchez-Vila, X. & Fern` andez-Garcia, D. Point-to-point connectivity, an abstract concept or a key issue for risk assessment studies? Advances in water resources 31, 1742–1753 (2008). 24. Tyukhova, A. R. & Willmann, M. Connectivity metrics based on the path of smallest resistance. Advances in Water Resources 88, 14–20 (2016). 25. Henri, C., Fern` andez-Garcia, D. & de Barros, F. Probabilistic human health risk assessment of degradation-related chemical mixtures in heterogeneous aquifers: Risk statistics, hot spots, and preferential channels. Water Resources Research 51, 4086–4108 (2015). 26. Rizzo, C. B. & de Barros, F. P. Minimum hydraulic resistance and least resistance path in heterogeneous porous media. Water Resources Research 53, 8596–8613 (2017). 27. Sahimi, M., Davis, H. T. & Scriven, L. Dispersion in disordered porous media. Chemical Engineering Communications 23, 329–341 (1983). 28. Harvey, C. F. & Gorelick, S. M. Temporal moment-generating equations: Modeling transport and mass transfer in heterogeneous aquifers. Water Resources Research 31, 1895–1911 (1995). 29. Andriˇ cevi´ c, R. & Cvetkovi´ c, V . Evaluation of risk from contaminants migrating by ground- water. Water Resources Research 32, 611–621 (1996). 30. Maxwell, R. & Kastenberg, W. Stochastic environmental risk analysis: an integrated method- ology for predicting cancer risk from contaminated groundwater. Stochastic Environmental Research and Risk Assessment 13, 27–47 (1999). 31. Henri, C., Fern` andez-Garcia, D. & de Barros, F. Assessing the joint impact of DNAPL source- zone behavior and degradation products on the probabilistic characterization of human health risk. Advances in Water Resources 88, 124–138 (2016). 32. De Barros, F., Bellin, A., Cvetkovic, V ., Dagan, G. & Fiori, A. Aquifer heterogeneity controls on adverse human health effects and the concept of the hazard attenuation factor. Water Resources Research 52, 5911–5922 (2016). 87 33. Jabbari, N., Aminzadeh, F. & de Barros, F. P. Hydraulic fracturing and the environment: risk assessment for groundwater contamination from well casing failure. Stochastic Environmental Research and Risk Assessment 31, 1527–1542 (2017). 34. Libera, A., Henri, C. & de Barros, F. Hydraulic conductivity and porosity heterogeneity controls on environmental performance metrics: Implications in probabilistic risk analysis. Advances in Water Resources 127, 1–12 (2019). 35. Henri, C. & Harter, T. Stochastic Assessment of Nonpoint Source Contamination: Joint Impact of Aquifer Heterogeneity and Well Characteristics on Management Metrics. Water Resources Research 55, 6773–6794 (2019). 36. Henri, C. V ., Harter, T. & Diamantopoulos, E. On the conceptual complexity of non-point source management: impact of spatial variability. Hydrology & Earth System Sciences 24 (2020). 37. S´ anchez-Vila, X., Carrera, J. & Girardi, J. P. Scale effects in transmissivity. Journal of Hydrology 183, 1–22 (1996). 38. Knudby, C. & Carrera, J. On the relationship between indicators of geostatistical, flow and transport connectivity. Advances in Water Resources 28, 405–421 (2005). 39. Le Goc, R., de Dreuzy, J.-R. & Davy, P. Statistical characteristics of flow as indicators of channeling in heterogeneous porous and fractured media. Advances in Water Resources 33, 257–269 (2010). 40. Fiori, A. & Jankovic, I. On preferential flow, channeling and connectivity in heterogeneous porous formations. Mathematical Geosciences 44, 133–145 (2012). 41. Renard, P. & Allard, D. Connectivity metrics for subsurface flow and transport. Advances in Water Resources 51, 168–196 (2013). 42. Gershenzon, N. I., Soltanian, M. R., Ritzi Jr, R. W., Dominic, D. F., Keefer, D., Shaffer, E., et al. How does the connectivity of open-framework conglomerates within multi-scale hierarchical fluvial architecture affect oil-sweep efficiency in waterflooding? Geosphere 11, 2049–2066 (2015). 43. Jimenez-Martinez, J. & Negre, C. F. Eigenvector centrality for geometric and topological characterization of porous media. Physical Review E 96, 013310 (2017). 44. Savoy, H., Kalbacher, T., Dietrich, P. & Rubin, Y . Geological heterogeneity: Goal-oriented simplification of structure and characterization needs. Advances in Water Resources 109, 1–13 (2017). 45. Dijkstra, E. W. et al. A note on two problems in connexion with graphs. Numerische mathematik 1, 269–271 (1959). 46. Tyukhova, A. R., Kinzelbach, W. & Willmann, M. Delineation of connectivity structures in 2-D heterogeneous hydraulic conductivity fields. Water Resources Research 51, 5846–5854 (2015). 47. Rizzo, C. & de Barros, F. Minimum Hydraulic Resistance Uncertainty and the Development of a Connectivity-Based Iterative Sampling Strategy. Water Resources Research 55, 5593– 5611 (2019). 48. Knudby, C. & Carrera, J. On the use of apparent hydraulic diffusivity as an indicator of connectivity. Journal of Hydrology 329, 377–389 (2006). 49. Zimmerman, D., De Marsily, G., Gotway, C. A., Marietta, M. G., Axness, C. L., Beauheim, R. L., et al. A comparison of seven geostatistically based inverse approaches to estimate 88 transmissivities for modeling advective transport by groundwater flow. Water Resources Research 34, 1373–1413 (1998). 50. Loll, P. & Moldrup, P. A new two-step stochastic modeling approach: Application to water transport in a spatially variable unsaturated soil. Water Resources Research 34, 1909–1918 (1998). 51. Leube, P. C., de Barros, F. P. J., Nowak, W. & Rajagopal, R. Towards optimal allocation of computer resources: Trade-offs between uncertainty quantification, discretization and model reduction. Environmental modelling & software 50, 97–107 (2013). 52. Moslehi, M., Rajagopal, R. & de Barros, F. P. J. Optimal allocation of computational resources in hydrogeological models under uncertainty. Advances in Water Resources 83, 299–309 (2015). 53. Berrone, S., Hyman, J. & Pieraccini, S. Multilevel Monte Carlo predictions of First passage times in three-dimensional discrete fracture networks: A graph-based approach. Water Resources Research 56, e2019WR026493 (2020). 54. Hyman, J. D., Hagberg, A., Srinivasan, G., Mohd-Yusof, J. & Viswanathan, H. Predictions of first passage times in sparse discrete fracture networks using graph-based reductions. Physical Review E 96, 013304 (2017). 55. Gotovac, H., Cvetkovic, V . & Andricevic, R. Significance of higher moments for complete characterization of the travel time probability density function in heterogeneous porous media using the maximum entropy principle. Water resources research 46 (2010). 56. Booth, A. D. & Colin, A. J. On the efficiency of a new method of dictionary construction. Information and Control 3, 327–334 (1960). 57. Knuth, D. E. The Art of Computer Programming, vol 1: Fundamental. Algorithms. Reading, MA: Addison-Wesley (1968). 58. Remy, N., Boucher, A. & Wu, J. Applied geostatistics with SGeMS: a user’s guide (Cambridge University Press, 2009). 59. Harbaugh, A. W. MODFLOW-2005, the US Geological Survey modular ground-water model: the ground-water flow process (US Department of the Interior, US Geological Survey Reston, V A, 2005). 60. Neuman, S. P. Maximum likelihood Bayesian averaging of uncertain model predictions. Stochastic Environmental Research and Risk Assessment 17, 291–305 (2003). 61. Gaus, I., Audigane, P., Andr´ e, L., Lions, J., Jacquemet, N., Durst, P., et al. Geochemical and solute transport modelling for CO2 storage, what to expect from it? International Journal of Greenhouse Gas Control 2, 605–625 (2008). 62. Soltanian, M. R., Amooie, M. A., Dai, Z., Cole, D. & Moortgat, J. Critical dynamics of gravito-convective mixing in geological carbon sequestration. Scientific reports 6, 1–13 (2016). 63. De Barros, F. P. J., Fern` andez-Garcia, D., Bolster, D. & Sanchez-Vila, X. A risk-based probabilistic framework to estimate the endpoint of remediation: Concentration rebound by rate-limited mass transfer. Water Resources Research 49, 1929–1942 (2013). 64. Ding, D., Benson, D. A., Fern` andez-Garcia, D., Henri, C. V ., Hyndman, D. W., Phanikumar, M. S., et al. Elimination of the Reaction Rate “Scale Effect”: Application of the Lagrangian Reactive Particle-Tracking Method to Simulate Mixing-Limited, Field-Scale Biodegradation at the Schoolcraft (MI, USA) Site. Water Resources Research 53, 10411–10432 (2017). 89 65. Libera, A., de Barros, F. P. J., Faybishenko, B., Eddy-Dilek, C., Denham, M., Lipnikov, K., et al. Climate change impact on residual contaminants under sustainable remediation. Journal of contaminant hydrology 226, 103518 (2019). 66. Zarlenga, A., de Barros, F. P. J. & Fiori, A. Uncertainty quantification of adverse human health effects from continuously released contaminant sources in groundwater systems. Journal of Hydrology 541, 850–861 (2016). 67. Im, J., Rizzo, C. B. & de Barros, F. P. J. Resilience of groundwater systems in the presence of Bisphenol A under uncertainty. Science of The Total Environment 727, 138363 (2020). 68. Bijeljic, B., Mostaghimi, P. & Blunt, M. J. Signature of non-Fickian solute transport in complex heterogeneous porous media. Physical Review Letters 107, 204502 (2011). 69. Dai, Z., Middleton, R., Viswanathan, H., Fessenden-Rahn, J., Bauman, J., Pawar, R., et al. An integrated framework for optimizing CO2 sequestration and enhanced oil recovery. Environmental Science & Technology Letters 1, 49–54 (2014). 70. Neretnieks, I. Solute transport in fractured rock-applications to radionuclide waste reposito- ries tech. rep. (Swedish Nuclear Fuel and Waste Management Co., 1990). 71. Gustafson, G., Gylling, B. & Selroos, J.-O. The ¨ Asp¨ o Task Force on groundwater flow and transport of solutes: bridging the gap between site characterization and performance assessment for radioactive waste disposal in fractured rocks. Hydrogeology journal 17, 1031– 1033 (2009). 72. Fiori, A., Bellin, A., Cvetkovic, V ., de Barros, F. P. J. & Dagan, G. Stochastic modeling of solute transport in aquifers: From heterogeneity characterization to risk analysis. Water Resources Research 51, 6622–6648 (2015). 73. Srinivasan, G., Tartakovsky, D. M., Dentz, M., Viswanathan, H., Berkowitz, B. & Robinson, B. Random walk particle tracking simulations of non-Fickian transport in heterogeneous media. Journal of Computational Physics 229, 4304–4314 (2010). 74. Rizzo, C. B. & de Barros, F. P. J. Minimum hydraulic resistance and least resistance path in heterogeneous porous media. Water Resources Research 53, 8596–8613 (2017). 75. Dentz, M., Le Borgne, T., Englert, A. & Bijeljic, B. Mixing, spreading and reaction in heterogeneous media: A brief review. Journal of Contaminant Hydrology 120, 1–17 (2011). 76. Dentz, M. & de Barros, F. P. J. Mixing-scale dependent dispersion for transport in heteroge- neous flows. Journal of Fluid Mechanics 777, 178–195 (2015). 77. Ferziger, J. H., Peri´ c, M. & Street, R. L. Computational methods for fluid dynamics (Springer, 2002). 78. Delay, F., Ackerer, P. & Danquigny, C. Simulating solute transport in porous or fractured formations using random walk particle tracking: a review. Vadose Zone Journal 4, 360–379 (2005). 79. Salamon, P., Fern` andez-Garcia, D. & G´ omez-Hern´ andez, J. J. A review and numerical assessment of the random walk particle tracking method. Journal of Contaminant Hydrology 87, 277–305 (2006). 80. Boso, F., Bellin, A. & Dumbser, M. Numerical simulations of solute transport in highly heterogeneous formations: A comparison of alternative numerical schemes. Advances in Water Resources 52, 178–189 (2013). 81. Neuman, S. P. Adaptive Eulerian–Lagrangian finite element method for advection–dispersion. International Journal for Numerical Methods in Engineering 20, 321–337 (1984). 90 82. Younes, A. & Ackerer, P. Solving the advection-diffusion equation with the Eulerian– Lagrangian localized adjoint method on unstructured meshes and non uniform time stepping. Journal of Computational Physics 208, 384–402 (2005). 83. Zheng, C., Bennett, G. D., et al. Applied contaminant transport modeling (Wiley-Interscience New York, 2002). 84. Kinzelbach, W. & Uffink, G. in Transport Processes in Porous Media 761–787 (Springer, 1991). 85. Kitanidis, P. K. Particle-tracking equations for the solution of the advection-dispersion equation with variable coefficients. Water Resources Research 30, 3225–3227 (1994). 86. LaBolle, E. M., Fogg, G. E. & Tompson, A. F. B. Random-walk simulation of transport in heterogeneous porous media: Local mass-conservation problem and implementation methods. Water Resources Research 32, 583–593 (1996). 87. Michalak, A. M. & Kitanidis, P. K. Macroscopic behavior and random-walk particle tracking of kinetically sorbing solutes. Water Resources Research 36, 2133–2146 (2000). 88. Zheng, C. Analysis of particle tracking errors associated with spatial discretization. Ground- water 32, 821–828 (1994). 89. Rahbaralam, M., Fern` andez-Garcia, D. & Sanchez-Vila, X. Do we really need a large number of particles to simulate bimolecular reactive transport with random walk methods? A kernel density estimation approach. Journal of Computational Physics 303, 95–104 (2015). 90. Bolster, D., Paster, A. & Benson, D. A. A particle number conserving Lagrangian method for mixing-driven reactive transport. Water Resources Research 52, 1518–1527 (2016). 91. Schmidt, M. J., Pankavich, S. & Benson, D. A. A kernel-based Lagrangian method for imperfectly-mixed chemical reactions. Journal of Computational Physics 336, 288–307 (2017). 92. Paster, A., Bolster, D. & Benson, D. A. Connecting the dots: Semi-analytical and random walk numerical solutions of the diffusion–reaction equation with stochastic initial conditions. Journal of Computational Physics 263, 91–112 (2014). 93. Henri, C. V . & Fern` andez-Garcia, D. Toward efficiency in heterogeneous multispecies reactive transport modeling: A particle-tracking solution for first-order network reactions. Water Resources Research 50, 7206–7230 (2014). 94. Fern` andez-Garcia, D. & Sanchez-Vila, X. Optimal reconstruction of concentrations, gradients and reaction rates from particle distributions. Journal of Contaminant Hydrology 120, 99–114 (2011). 95. Henri, C. V ., Harter, T. & Diamantopoulos, E. On the conceptual complexity of non-point source management: impact of spatial variability. Hydrology and Earth System Sciences 24, 1189–1209 (2020). 96. Fern` andez-Garcia, D., Illangasekare, T. H. & Rajaram, H. Differences in the scale dependence of dispersivity and retardation factors estimated from forced-gradient and uniform flow tracer tests in three-dimensional physically and chemically heterogeneous porous media. Water Resources Research 41 (2005). 97. Maxwell, R. & Tompson, A. F. B. SLIM-FAST: A user’s manual. Rep. UCRL-SM 225092 (2006). 98. Cui, Z., Welty, C. & Maxwell, R. M. Modeling nitrogen transport and transformation in aquifers using a particle-tracking approach. Computers & Geosciences 70, 1–14 (2014). 91 99. Rizzo, C. B. & de Barros, F. P. J. Minimum Hydraulic Resistance Uncertainty and the Development of a Connectivity-Based Iterative Sampling Strategy. Water Resources Research 55, 5593–5611 (2019). 100. Hansen, S. K., Haslauer, C. P., Cirpka, O. A. & Vesselinov, V . V . Direct breakthrough curve prediction from statistics of heterogeneous conductivity fields. Water Resources Research 54, 271–285 (2018). 101. Wright, E. E., Richter, D. H. & Bolster, D. Effects of incomplete mixing on reactive transport in flows through heterogeneous porous media. Physical Review Fluids 2, 114501 (2017). 102. Sole-Mari, G., Bolster, D., Fern` andez-Garcia, D. & Sanchez-Vila, X. Particle density es- timation with grid-projected and boundary-corrected adaptive kernels. Advances in water resources 131, 103382 (2019). 103. Tompson, A. F. & Dougherty, D. Particle-grid methods for reacting flows in porous media with application to Fisher’s equation. Applied mathematical modelling 16, 374–383 (1992). 104. Schmidt, M. J., Pankavich, S. D. & Benson, D. A. On the accuracy of simulating mixing by random-walk particle-based mass-transfer algorithms. Advances in Water Resources 117, 115–119 (2018). 105. Benson, D. A. & Meerschaert, M. M. Simulation of chemical reaction via particle tracking: Diffusion-limited versus thermodynamic rate-limited regimes. Water Resources Research 44 (2008). 106. Benson, D. A. & Bolster, D. Arbitrarily complex chemical reactions on particles. Water Resources Research 52, 9190–9200 (2016). 107. Engdahl, N. B., Benson, D. A. & Bolster, D. Lagrangian simulation of mixing and reactions in complex geochemical systems. Water Resources Research 53, 3513–3522 (2017). 108. Parkhurst, D. L. & Wissmeier, L. PhreeqcRM: A reaction module for transport simulators based on the geochemical model PHREEQC. Advances in Water Resources 83, 176–189 (2015). 109. Perez, L. J., Hidalgo, J. J. & Dentz, M. Reactive Random Walk Particle Tracking and Its Equivalence With the Advection-Diffusion-Reaction Equation. Water Resources Research 55, 847–855 (2019). 110. Sole-Mari, G., Fern` andez-Garcia, D., Rodrıguez-Escales, P. & Sanchez-Vila, X. A KDE- based random walk method for modeling reactive transport with complex kinetics in porous media. Water Resources Research 53, 9019–9039 (2017). 111. Ding, D., Benson, D. A., Paster, A. & Bolster, D. Modeling bimolecular reactions and transport in porous media via particle tracking. Advances in Water Resources 53, 56–65 (2013). 112. Engdahl, N. B., Schmidt, M. J. & Benson, D. A. Accelerating and Parallelizing Lagrangian Simulations of Mixing-Limited Reactive Transport. Water Resources Research 55, 3556– 3566 (2019). 113. Risken, H. in The Fokker-Planck Equation 63–95 (Springer, 1996). 114. Tartakovsky, A. M. & Meakin, P. Pore scale modeling of immiscible and miscible fluid flows using smoothed particle hydrodynamics. Advances in Water Resources 29, 1464–1478 (2006). 115. Paster, A., Bolster, D. & Benson, D. Particle tracking and the diffusion-reaction equation. Water Resources Research 49, 1–6 (2013). 92 116. Ding, D. & Benson, D. A. Simulating biodegradation under mixing-limited conditions using Michaelis–Menten (Monod) kinetic expressions in a particle tracking model. Advances in Water Resources 76, 109–119 (2015). 117. Pedretti, D. & Fern` andez-Garcia, D. An automatic locally-adaptive method to estimate heavily-tailed breakthrough curves from particle distributions. Advances in Water Resources 59, 52–65 (2013). 118. Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 22–30 (2011). 119. Gramling, C. M., Harvey, C. F. & Meigs, L. C. Reactive transport in porous media: A comparison of model prediction with laboratory visualization. Environmental Science & Technology 36, 2508–2514 (2002). 120. Sanchez-Vila, X., Dentz, M. & Donado, L. D. Transport-controlled reaction rates under local non-equilibrium conditions. Geophysical research letters 34 (2007). 121. Kapoor, V ., Gelhar, L. W. & Miralles-Wilhelm, F. Bimolecular second-order reactions in spatially varying flows: Segregation induced scale-dependent transformation rates. Water Resources Research 33, 527–536 (1997). 122. Benson, D. A., Pankavich, S. & Bolster, D. On the separate treatment of mixing and spreading by the reactive-particle-tracking algorithm: An example of accurate upscaling of reactive Poiseuille flow. Advances in water resources 123, 40–53 (2019). 123. Taylor, G. I. Dispersion of soluble matter in solvent flowing slowly through a tube. Proceed- ings of the Royal Society of London. Series A. Mathematical and Physical Sciences 219, 186–203 (1953). 124. Eames, I. & Bush, J. W. M. Longitudinal dispersion by bodies fixed in a potential flow. Pro- ceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 455, 3665–3686 (1999). 125. Taylor, G. I. & Green, A. E. Mechanism of the production of small eddies from large ones. Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences 158, 499–521 (1937). 126. Sengupta, T. K., Sharma, N. & Sengupta, A. Non-linear instability analysis of the two- dimensional Navier-Stokes equation: The Taylor-Green vortex problem. Physics of Fluids 30, 054105 (2018). 127. Rubin, Y . Applied stochastic hydrogeology (Oxford University Press, 2003). 128. De Barros, F. P. J., Ezzedine, S. & Rubin, Y . Impact of hydrogeological data on measures of uncertainty, site characterization and environmental performance metrics. Advances in Water Resources 36, 51–63 (2012). 129. De Barros, F. P. J. & Rubin, Y . A risk-driven approach for subsurface site characterization. Water Resources Research 44 (2008). 130. Cardiff, M., Liu, X., Kitanidis, P. K., Parker, J. & Kim, U. Cost optimization of DNAPL source and plume remediation under uncertainty using a semi-analytic model. Journal of Contaminant Hydrology 113, 25–43 (2010). 131. Dentz, M., Le Borgne, T., Englert, A. & Bijeljic, B. Mixing, spreading and reaction in heterogeneous media: A brief review. Journal of Contaminant Hydrology 120, 1–17 (2011). 132. Neuman, S. P. & Tartakovsky, D. M. Perspective on theories of non-Fickian transport in heterogeneous media. Advances in Water Resources 32, 670–680 (2009). 93 133. Kapoor, V . & Kitanidis, P. K. Concentration fluctuations and dilution in aquifers. Water Resources Research 34, 1181–1193 (1998). 134. Fiori, A. & Dagan, G. Concentration fluctuations in aquifer transport: A rigorous first-order solution and applications. Journal of Contaminant Hydrology 45, 139–163 (2000). 135. Dentz, M. & Tartakovsky, D. M. Probability density functions for passive scalars dispersed in random velocity fields. Geophysical Research Letters 37 (2010). 136. Meyer, D. W., Tchelepi, H. A. & Jenny, P. A fast simulation method for uncertainty quantifi- cation of subsurface flow and transport. Water Resources Research 49, 2359–2379 (2013). 137. De Barros, F. P. J. & Fiori, A. First-order based cumulative distribution function for solute concentration in heterogeneous aquifers: Theoretical analysis and implications for human health risk assessment. Water Resources Research 50, 4018–4037 (2014). 138. Ciriello, V . & de Barros, F. P. J. Characterizing the influence of multiple uncertainties on predictions of contaminant discharge in groundwater within a Lagrangian stochastic formulation. Water Resources Research 56, e2020WR027867 (2020). 139. Boso, F. & Tartakovsky, D. M. The method of distributions for dispersive transport in porous media with uncertain hydraulic properties. Water Resources Research 52, 4700–4712 (2016). 140. De Barros, F. P. J. & Fiori, A. On the Maximum Concentration of Contaminants in Natural Aquifers. Transport in Porous Media 140, 273–290 (2021). 141. Maxwell, R. M., Carle, S. F. & Tompson, A. F. B. Contamination, risk, and heterogeneity: on the effectiveness of aquifer remediation. Environmental Geology 54, 1771–1786 (2008). 142. Siirila, E. R., Navarre-Sitchler, A. K., Maxwell, R. M. & McCray, J. E. A quantitative methodology to assess the risks to human health from CO2 leakage into groundwater. Advances in Water Resources 36, 146–164 (2012). 143. Henri, C. V ., Fernandez-Garcia, D. & de Barros, F. P. J. Assessing the joint impact of DNAPL source-zone behavior and degradation products on the probabilistic characterization of human health risk. Advances in Water Resources 88, 124–138 (2016). 144. Oladyshkin, S. & Nowak, W. Data-driven uncertainty quantification using the arbitrary polynomial chaos expansion. Reliability Engineering & System Safety 106, 179–190 (2012). 145. Ciriello, V ., Lauriola, I., Bonvicini, S., Cozzani, V ., Di Federico, V . & Tartakovsky, D. M. Impact of hydrogeological uncertainty on estimation of environmental risks posed by hydro- carbon transportation networks. Water Resources Research 53, 8686–8697 (2017). 146. Li, S.-G. & Liu, Q. Interactive ground water (IGW). Environmental Modelling & Software 21, 417–418 (2006). 147. Maxwell, R. M., Condon, L. E. & Kollet, S. J. A high-resolution simulation of groundwater and surface water over most of the continental US with the integrated hydrologic model ParFlow v3. Geoscientific Model Development 8, 923–937 (2015). 148. Hammond, G. E., Lichtner, P. C. & Mills, R. T. Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN. Water resources research 50, 208–228 (2014). 149. Fienen, M. N. & Bakker, M. HESS Opinions: Repeatable research: what hydrologists can learn from the Duke cancer research scandal. Hydrology and Earth System Sciences 20, 3739–3743 (2016). 150. Fienen, M. N., Corson-Dosch, N. T., White, J. T., Leaf, A. T. & Hunt, R. J. Risk-Based Wellhead Protection Decision Support: A Repeatable Workflow Approach. Groundwater 60, 71–86 (2022). 94 151. Dabbish, L., Stuart, C., Tsay, J. & Herbsleb, J. Social coding in GitHub: transparency and collaboration in an open software repository in Proceedings of the ACM 2012 conference on computer supported cooperative work (2012), 1277–1286. 152. vanRossum, G. Python reference manual. Department of Computer Science [CS] (1995). 153. White, J. T., Foster, L. K., Fienen, M. N., Knowling, M. J., Hemmings, B. & Winterle, J. R. Toward reproducible environmental modeling for decision support: a worked example. Frontiers in Earth Science 8, 50 (2020). 154. Bellin, A. & Rubin, Y . HYDRO GEN: A spatially distributed random field generator for correlated properties. Stochastic Hydrology and Hydraulics 10, 253–278 (1996). 155. Pe˜ nuela, A., Hutton, C. & Pianosi, F. An open-source package with interactive Jupyter Notebooks to enhance the accessibility of reservoir operations simulation and optimisation. Environmental Modelling & Software 145, 105188 (2021). 156. Woodruff, M. J., Reed, P. M. & Simpson, T. W. Many objective visual analytics: rethinking the design of complex engineered systems. Structural and Multidisciplinary Optimization 48, 201–219 (2013). 157. Kitanidis, P. K. Introduction to geostatistics: applications in hydrogeology (Cambridge university press, 1997). 158. Oliphant, T. E. A guide to NumPy (Trelgol Publishing USA, 2006). 159. Sivasankar, V ., Gopalakrishna, G., et al. Quantification of benzene in groundwater sources and risk analysis in a popular South Indian Pilgrimage City–a GIS based approach. Arabian Journal of Chemistry 10, S2523–S2533 (2017). 160. Logue, J. N. & Fox, J. M. Residential health study of families living near the Drake Chem- ical Superfund site in Lock Haven, Pennsylvania. Archives of Environmental Health: An International Journal 41, 222–228 (1986). 161. Proctor, C. R., Lee, J., Yu, D., Shah, A. D. & Whelton, A. J. Wildfire caused widespread drinking water distribution network contamination. AWWA Water Science 2, e1183 (2020). 162. Mood, A. M., Graybill, F. A. & Boes, C. D. Introduction to the Theory of Statistics. Interna- tional Student Edition. McGraw-Hill (1974). 163. De Barros, F. P. J., Fiori, A., Boso, F. & Bellin, A. A theoretical framework for modeling dilution enhancement of non-reactive solutes in heterogeneous porous media. Journal of Contaminant Hydrology 175, 72–83 (2015). 164. Ye, Y ., Chiogna, G., Cirpka, O. A., Grathwohl, P. & Rolle, M. Enhancement of plume dilution in two-dimensional and three-dimensional porous media by flow focusing in high- permeability inclusions. Water Resources Research 51, 5582–5602 (2015). 165. De Barros, F. P. J. & Nowak, W. On the link between contaminant source release conditions and plume prediction uncertainty. Journal of contaminant hydrology 116, 24–34 (2010). 166. Gueting, N. & Englert, A. Hydraulic conditions at the source zone and their impact on plume behavior. Hydrogeology Journal 21, 829–844 (2013). 167. Im, J., Rizzo, C. B., de Barros, F. P. J. & Masri, S. Application of genetic programming for model-free identification of nonlinear multi-physics systems. Nonlinear Dynamics 104, 1781–1800 (2021). 168. Inc., W. R. Mathematica, Version 12.3.1 Champaign, IL, 2021. 95 Appendices A Appendix: Flow and transport formulations We consider a steady-state flow in a spatially heterogeneous aquifer. The flow field is governed by: ∇· (K∇h)= 0, (A.1) with h denoting the hydraulic head and K the hydraulic conductivity. For all our 2D and 3D simulations, we consider permeameter-like boundary conditions to ensure the flow is uniform-in-the-mean. That is achieved by setting Dirichlet boundary conditions on the inflow and outflow of the domain and no-flow Neumann conditions on the remaining boundaries. An inert solute is instantaneously released along a line source perpendicular to the mean flow direction. Transport is assumed to be governed by the advection-dispersion equation: ∂c ∂t + u· ∇c=∇· (D∇c). (A.2) where c is the resident concentration, u is the velocity field, D is the local-dispersion tensor assumed to be anisotropic and defined as: D=(α T |u|+ D m )I+ α L − α T |u| uu T (A.3) where D m is the molecular diffusion,α L is the longitudinal dispersivity andα T is the transverse dispersivity, where in our study x 1 denotes the longitudinal dimension and x 2 and x 3 the transverse 96 ones. Transport is solved through a random walk particle tracking (RWPT) code. The RWPT code used in our work is GPU-based and denoted as PAR 2 [1]. B Appendix: Derivation of the kernel function We provide details regarding the optimal kernel function reported in section 3.3.1. Our goal is to obtain an optimal shape of the kernel function such that the summation of four kernel distributions centered on the cell’s vertices is equal to a constant (arbitrary chosen) value of H over all the cell (see Fig. 5.1). In this way, as explained in section 3.3.1, we ensure that conservation of mass is always guarantee in the particle’s mass allocation process (see analogy depicted in Fig. 3.6c). The proposed kernel function originates from the modification of a squared pyramid, in which triangular shape sections are maintained along its local support midsections, while parabolic sections are employed elsewhere. To compute the value of the kernel function h at the point Q, defined by the distance from the center of the kernel, r, and the angleα (see Fig. 3.3), the following equation is used (see Section 3.3.1): h(r,α)=− H d(α) 2 + (q 1 Hα 2 + q 2 Hα− H) d(α)λ g r 2 + q 1 Hα 2 + q 2 Hα− H λ g r+ H, (B.1) with q 1 = 0.650239, q 2 =− 1.03809 and d(α) the length of the segment connecting the center of the kernel to the edge of the cell with dimensionsλ g × λ g (see Fig. 3.3), passing through the point Q. We can re-write equation (B.1) in the following compact form h(r,α)= a(α)r 2 + b(α)r+ c(α), (B.2) From geometric considerations, it is clear that only two kernels intersect on the points belonging to the borders of the red cell (see Fig. 3.6) whereas the four kernels have the same value of h on the point at the center of the cell. Thus, it is straightforward to compute the coefficients of the parabolic 97 Figure 5.1: Graphical visualization of the summation values of four optimized kernel functions positioned on the corners of a cell (projected on the top of the Figure in red for clarity) of dimensions λ g × λ g . Over the area of interest, i.e. the central cell, the summation of the four kernel functions is uniformly equal to H with a maximum error equal to 0.15%. 98 section (namely a, b and c in eq. B.2), for the sections where the points belonging to cell’s borders (α = 0) and diagonals (α =π/4) are located: when α = 0 H = a(α = 0)0+ b(α = 0)0+ c(α = 0), H 2 = a(α = 0)( d(α=0) 2 ) 2 + b(α = 0) d(α=0) 2 + c(α = 0), 0= a(α = 0)(d(α = 0)) 2 + b(α = 0)d(α = 0)+ c(α = 0), (B.3) when α = π 4 H = a α = π 4 0+ b α = π 4 0+ c α = π 4 , H 4 = a α = π 4 ( d(α= π 4 ) 2 ) 2 + b α = π 4 d(α= π 4 ) 2 + c α = π 4 , 0= a α = π 4 (d(α = π 4 )) 2 + b α = π 4 d(α = π 4 )+ c α = π 4 . (B.4) The values resulting from the solution of systems (B.3) and (B.4) are: when α = 0⇒ a= 0; b=− H d(α = 0) ; c= H; (B.5) when α = π 4 ⇒ a= H d(α = π 4 ) 2 ; b=− 2H d(α = π 4 ) ; c= H. (B.6) Using these coefficients, we can write S K ∑ j=1 h j (r,α = 0)= S K ∑ j=1 h j (r,α =π/4)= H, (B.7) where S K denotes the number of kernels centered on the vertices of the evaluated cell (S K = 4) and h j are the values of the kernels on the points belonging to the borders (α = 0) and on the points belonging to the diagonal (α =π/4) of the cell, defined by equation (B.2). Our next step is to compute the coefficients a, b and c for other values ofα (i.e.,α̸= 0 and α̸=π/4). The coefficients are defined by varying their values from the ones that they assume at α = 0 to the ones computed atα =π/4. Here, we consider that b varies in a quadratic form with 99 Figure 5.2: Graphical representation of how the b parameter could vary withα. The blue line shows the linear behaviour, while the green and the red ones illustrate the parabolic variation, respectively in a convex and concave way. α, while c is kept constant since it does not vary between the two already calculated values (i.e., c= H forα = 0 andα =π/4). The coefficient a is determined by knowing b and c. In order to respect geometry constrains (i.e., the kernel has to reach a value of zero on the points belonging to the borders of its local support: h(r= d,α)= ad 2 + bd+ c≡ 0∀α), a must be determined based on the values of b and c and it has the following analytical formulation: a(α)=− c(α) d(α) 2 − b(α) d(α) for ∀α. (B.8) Fig. 5.2 illustrates the dependence of b withα. This functional dependency is mathematically described by: b(α)= b 1 α 2 + b 2 α+ b 3 for ∀α, (B.9) 100 where b 1 , b 2 and b 3 are determined as: b(α = 0) = − H d(α = 0) = b 3 ; b(α =π/4) = − 2H d(α =π/4) = b 1 π 4 2 + b 2 π 4 + b 3 ; b(α =π/8) = − 3H 2d(α =π/8) ϑ opt = b 1 π 8 2 + b 2 π 8 + b 3 , (B.10) where b(α= 0) and b(α=π/4) are values between which the parabolic variation of b is computed, while b(α=π/8) is found though an optimization process. The multiplicative factor− 3H/(2d(α= π/8)) present in the last line of (B.10) is the value assumed by b(α =π/8) if b varies linearly (see blue curve in Fig. 5.2), whileϑ opt is the multiplicative optimized correction parameter that is found by minimizing the error between H and the sum of the four kernel functions over the considered cell: ε = ∑ cell |H− S K ∑ j h j (ϑ)|, (B.11) ϑ opt = argmin ϑ [ε], (B.12) where the sum over the cell (represented by∑ cell ) is performed on every point of a finer sub-grid inside the cell itself. For example, if b varies in a linear or in a convex manner, as displayed by the blue and green lines in Fig. 5.2, the summation of the kernel functions over the cell is far from being constantly equal to H. The concave quadratic variation of b leads to a summation that is uniform over the cell and optimal forϑ opt = 0.9434. This value has been found numerically and Fig. 5.3 shows howϑ opt minimizes the value of theε (eq. B.11). Given that d(α)=λ g /cos(α) (due to geometrical reasons), the solution of the system of equations in (B.10) is: b 1 = q 1 H λ g ; b 2 = q 2 H λ g ; b 3 =− H λ g , (B.13) 101 Figure 5.3: Errorε estimates provided by eq. (B.11) versus the parameterϑ. The inset displays the error withinϑ∈[0.8,1.2]. with q 1 = 0.650239 and q 2 =− 1.03809. To summarize, the coefficients to determine the value of the kernel function expressed in equation (B.2) are a = − H d(α) 2 + (q 1 Hα 2 + q 2 Hα− H) d(α)λ g , (B.14) b = q 1 Hα 2 + q 2 Hα− H λ g , (B.15) c = H. (B.16) C Appendix: Comparison with an Eulerian solution for a 2-D non-well mixed reactive scenario. In this appendix we provide a comparison between the proposed, particle-based, method PARxn 2 with an Eulerian solution for the system of reactive advection-diffusion equations provided in Eq. (3.2). As in the previous cases, we consider a bimolecular reaction A+ B→ C in a 2-D, transient and non-well mixed scenario. The problem investigated is identical to the reactive transport problem reported in Kapoor et al. [121] and Benson et al. [122] and addressed in Section 3.4.2.3. The 102 Figure 5.4: Comparison between the numerical results for m C , obtained by the reactive RWPT algorithm PARxn 2 for different number of particles N P , and the Eulerian solution integrated over the space domain. domain is a 2-D channel where the flow field is characterized by a parabolic velocity profile and as in the work by Kapoor et al. [121] the values for the input model parameters are: L y = 200 mm, L x = 1 mm,[A 0 ]=[B 0 ]= 1 M, u m = 1.0132 mm/s (see Eq. (3.15)), k f = 0.0987 M − 1 s − 1 and the diffusion coefficient is set to D= 1.0× 10 − 3 mm 2 /s. The results are shown for t= 100 s. The system of partial differential equations, see Eq. (3.2), was solved using Mathematica’sNDSolve which is based on the method of lines combined with a finite difference scheme [168]. The byprod- uct mass, m C is obtained by integrating the concentration obtained by the well-resolved Eulerian solution in the both the longitudinal and transverse directions. In Fig. 5.4, we compare both the Eulerian and Lagrangian (i.e. obtained through PARxn 2 ) solutions for the temporal evolution of the mass C. The results displayed in Fig. 5.4 demonstrate the convergence of the PARxn 2 solution to the Eulerian one with increasing number of particles N P . 103 Figure 5.5: The mean absolute errors (MAE) between PARxn 2 and the Eulerian solution for different number of particles (N P ). Fig. 5.5 depicts the mean absolute error (MAE) between the PARxn 2 and Eulerian solutions for different number of particles (N P ). Equation (3.10) provides details on the computation of the MAE, where, for this case, ˆ m C represents the mass of reaction product C computed through the Eulerian approach. The results show that the error reduces with increasing number of particles (see Fig. 5.5). D Appendix: Flow and transport equations For computational illustrations, we simulated a steady-state fully saturated incompressible flow in a spatially heterogeneous aquifer in absence of sink or sources. Flow is 2D and in our computational 2D domain, x 1 denotes the longitudinal dimension and x 2 the transverse one. The steady flow field is governed by: ∇· [K(x)∇h(x)]= 0, (D.1) with h denoting the hydraulic head and K the hydraulic conductivity. 104 For all our simulations, we considered permeameter-like boundary conditions for the flow field. That is, no-flux boundary conditions in the transverse boundaries and constant heads, respectively h in and h out , are adopted in the inflow and outflow boundaries of the computational domain. For the transport simulation we consider that instantaneous release of Benzene within a rectan- gular source zone of areaA o =∆s 1 × ∆s 2 . The spatiotemporal evolution of the concentration field is assumed to be governed by the advection-dispersion equation: ∂C(x,t) ∂t + u(x)· ∇C(x,t)=∇· [D(x)∇C(x,t)]. (D.2) where c is the resident concentration, u is the velocity field, D is the local-scale dispersion tensor assumed to be anisotropic and defined as: D(x)=(α T |u(x)|+ D m )I+ α L − α T |u(x)| u(x)u(x) T (D.3) where D m is the molecular diffusion,α L is the longitudinal (along x 1 ) dispersivity andα T is the transverse (along x 2 ) dispersivity. 105
Abstract (if available)
Abstract
The ubiquitous presence of multi-scale heterogeneity in hydrological properties is the cause of complex subsurface flow patterns that impact the transport behavior of a solute plume. Fluctuations in the velocity field lead to increased solute spreading which enhances mass transfer mechanisms and impact solute arrival times.
This thesis proposes a series of methods which accounts for the effects of aquifer heterogeneity on transport observables which are essential for risk analysis, performance assessment of waste disposal facilities and the selection of optimal remediation cleanup strategies. The approaches proposed in this dissertation are computationally rapid and reproducible.
The first contribution of this thesis consists of the development of a novel aquifer connectivity-ranked Monte Carlo method that accelerates the statistical convergence of the statistics of the first arrival times of a solute body in an environmentally sensitive location. Secondly, I propose an innovative kernel-based reactive random walk particle tracking method to improve the computational efficiency associated with reactive transport in spatially variable groundwater flows. Finally, we present a computational package that links the various components relevant for the estimation of the concentration of a pollutant at an environmentally sensitive target and its uncertainty to support decision making in risk analysis.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Hybrid physics-based and data-driven computational approaches for multi-scale hydrologic systems under heterogeneity
PDF
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
PDF
Hydraulic fracturing and the environment: risk assessment for groundwater contamination from well casing failure
PDF
On the transport dynamics of miscible fluids in porous media under different sources of disorder
PDF
Groundwater contaminant transport predictions in a sustainable water management scenario
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Efficient connectivity assessment of heterogeneous porous media using graph theory
PDF
Efficient simulation of flow and transport in complex images of porous materials and media using curvelet transformation
PDF
Fair Machine Learning for Human Behavior Understanding
PDF
Machine-learning approaches for modeling of complex materials and media
PDF
A polynomial chaos formalism for uncertainty budget assessment
PDF
Algorithms for stochastic Galerkin projections: solvers, basis adaptation and multiscale modeling and reduction
PDF
Evaluating the role of energy system decarbonization and land cover properties on urban air quality in southern California
PDF
Model falsification: new insights, methods and applications
PDF
Flow and thermal transport at porous interfaces
PDF
Molecular- and continuum-scale simulation of single- and two-phase flow of CO₂ and H₂O in mixed-layer clays and a heterogeneous sandstone
PDF
Multiscale and multiresolution approach to characterization and modeling of porous media: From pore to field scale
PDF
Open channel flow instabilities: modeling the spatial evolution of roll waves
PDF
Efficient control optimization in subsurface flow systems with machine learning surrogate models
PDF
Stochastic and multiscale models for urban and natural ecology
Asset Metadata
Creator
Morvillo, Maria
(author)
Core Title
Reproducible and rapid computational approaches for assessing contamination in natural aquifers
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Environmental Engineering
Degree Conferral Date
2022-08
Publication Date
05/12/2022
Defense Date
03/28/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Decision making,flow and transport,OAI-PMH Harvest,probabilistic risk analysis,rapid,reproducible,stochastic hydrogeology,uncertainty quantification
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
de Barros, Felipe (
committee chair
)
Creator Email
mariamorvillo@me.com,morvillo@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111313292
Unique identifier
UC111313292
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Morvillo, Maria
Type
texts
Source
20220517-usctheses-batch-942
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
flow and transport
probabilistic risk analysis
rapid
reproducible
stochastic hydrogeology
uncertainty quantification