Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Feature learning for imaging and prior model selection
(USC Thesis Other)
Feature learning for imaging and prior model selection
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Feature Learning for Imaging and Prior Model Selection By Azarang Golmohammadi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL VITERBI SCHOOL OF ENGINEERING In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy (ELECTRICAL ENGINEERING) UNIVERSITY OF SOUTHERN CALIFORNIA August 2018 i @Copyright 2018 Azarang Golmohammadi ii To the memory of my grandfathers, Hasan Golmohammadi and AbbasAli Farshbaf Dadvar, iii ABSTRACT Imaging inverse problems in subsurface environments are typically ill-posed and highly nonlinear. As a result, too many unknown parameters are desired to be estimated from limited response measurements. When the underlying images (i.e., parameters) form spatially complex connectivity patterns, classical covariance-based geostatistical techniques cannot describe the underlying connectivity structures. In addition to the complexities in representing connectivity patterns, uncertainty in geologic scenarios further complicates these problems. Since data limitations, modeling assumptions and subjective interpretations can lead to significant uncertainty in adopted geologic scenarios, flow and transport data may also be useful for constraining the uncertainty in proposed geologic scenarios. Constraining geologic scenarios with flow-related data opens an interesting and challenging research area, which goes beyond the traditional subsurface inverse modeling formulations where the geologic scenario is assumed given. In this work, we employ group-sparsity regularization as an effective formulation to constrain the uncertainty of prior geologic scenarios in subsurface imaging problems. We demonstrate that this regularization term can be adopted to eliminate inconsistent geologic scenarios that are not supported by the measurements used for inversion, e.g., flow and transport observations. We also introduce a novel supervised machine learning algorithm to enforce the feasibility constraints in subsurface imaging problems. To this end, we develop an inverse modeling framework, in which the solution is initially obtained on a parameterization domain. Then, a supervised machine learning algorithm is employed to learn the properties of the feasible set and iv enforce them onto the parameterized solution. Based on this formulation, the parameterized and feasible solutions are obtained iteratively, and they are gradually converged to the optimal solution. We demonstrate that adopting feasibility constraints can further enhance capability of group- sparsity regularization in selecting relevant geologic scenarios. v ACKNOWLEDGMENT I would like to extend thanks to all the people who so generously contributed to the work presented in this thesis. First and foremost, I would like to express my sincere gratitude to my Ph.D. advisor Professor Behnam Jafarpour for the continuous support of my Ph.D. studies and research, for his patience, enthusiasm, motivation and immense knowledge. The door of his office was always open whenever I had a question about my research, and his guidance helped me in all the time of research and writing of this thesis. Beside my Ph.D. advisor, my profound gratitude goes to the rest of my thesis committee: Professor Mahta Moghaddam, Professor Iraj Ershaghi, Professor Keith Jenkins, and Professor Krishna S. Nayak. I would like to specially thank Professor Behnam Jafarpour and Professor Iraj Ershaghi for providing me opportunities to research and teach at the Department of Petroleum/Electrical Engineering, USC. Their recommendations played a critical role in fulfilling the requirements for the financial support needed during my Ph.D. studies. In addition, I would like to extend thanks to the institutions that provided the funding needed for my research, including Energi Simulation (Foundation CMG). I thank all my fellow lab-mates in SEES laboratory: Siavash Hakim Elahi, Atefeh Jahandideh, Wei Ma, Shiva Navabi Sohi, and Entao Liu. Specially, I would like to thank M-Reza Khaninezhad for his role as the senior Ph.D. student who mentored me with an unbelievable patience during my Ph.D. studies. vi The support I have received from my friends, from all over the world, were incredible and helped me to overcome the difficulties of living far from my family. To this end, I would like to thank my fellow friends Shahram Farhadi, Reza Rokni, Mohammad Evazi, Reza Bohlouli, Ali Mostafavi, Behnam Farshbaf, Mohammad Rafiei, and Pouya Mahdavi. Without any doubts, my today’s achievements are a result of the patience, motivation, and enthusiasm I have received from my previous teachers and advisors. Therefore, my deepest gratitude goes to all the professors at the University of Southern California and Sharif University of Technology who have influenced me in growing my scientific knowledge. My special thanks are for Professor Farokh Marvasti, who advised me during my B.Sc. studies at Sharif University of Technology. Finally, but by no means least, my sincere thanks and appreciation go to my father, mother and brother, Mohammad Golmohammadi, Azar Farshbaf Dadvar, and Hamed Golmohammadi, for their unbelievable support at every step of my life. My wife, Shiva Salmasi, has been extremely supportive during my PhD studies, and the love I have received from her has been the greatest motivation for me. She has always stood beside me, and I have always find her my best friend. My father and mother are my first advisors, and my wife is my inspiration for future. Therefore, I dedicate this thesis to their honour. vii CONTENTS ABSTRACT ................................................................................................................................ iii ACKNOWLEDGMENT ................................................................................................................. v LIST OF TABLES ....................................................................................................................... xi LIST OF FIGURES ..................................................................................................................... xii 1. INTRODUCTION ....................................................................................................................... 1 1.1. PDE-Constrained Inverse Problems .................................................................................. 1 Physics Laws in Subsurface Environments........................................................... 3 Inverse Problem Formulation ............................................................................... 8 1.2. Challenges in Solving Subsurface Inverse Problems ....................................................... 12 1.3. Parameterization and Regularization Techniques ............................................................ 15 1.3.1. Spatial Parameterization/Regularization ............................................................... 17 Zonation Parameterization ................................................................................. 17 Tikhonov Regularization .................................................................................... 19 Total-Variation .................................................................................................. 20 1.3.2. Compressive Transform ....................................................................................... 21 Generic Compressive Transform ........................................................................ 22 Learned Compressive Transform ....................................................................... 23 Principal Component Analysis (PCA) .................................................... 24 Sparse Dictionary Learning (𝑘-SVD) ..................................................... 26 1.4. Uncertainty in Prior Geologic Scenarios ........................................................................ 29 1.5. Geologic Feasibility Constraints .................................................................................... 32 1.6. Scope of Work and Dissertation Outline ........................................................................ 36 2. GROUP-SPARSE FEATURE LEARNING FOR PARAMETERIZATION PURPOSES ......................... 41 2.1. Sparse Parameter Estimation........................................................................................... 42 2.2. Mixed 𝑙 # /𝑙 % -Norm for Promoting Group-Sparsity ........................................................... 45 viii 2.3. Learning Group-Sparse Parameterization ........................................................................ 52 Grouping with Wavelet Tree Structures ............................................................. 53 Grouping with Sparse PCA ................................................................................ 55 2.4. Numerical Results .......................................................................................................... 59 Tomographic Inversion ...................................................................................... 60 Pumping Test in Single Phase Flow, Effect of Observation Noise ...................... 62 Case1: Group-Sparsity with Wavelet Tree Structure ............................... 63 Case 2: Group-Sparsity Using SPCA with DCT Basis ............................ 65 Subsurface Flow Model Calibration: Nonlinear Inversion .................................. 68 2.5. Scalability, Computational Costs and Uncertainty Quantification ................................... 71 2.6. Summary and Discussion ................................................................................................ 72 3. GROUP-SPARSE FEATURE LEARNING FOR GEOLOGIC SCENARIO SELECTION ..................... 76 3.1. Geologic Scenario Identification with Group-Sparsity .................................................... 77 3.2. Further Exploration of Group-Sparse Formulation .......................................................... 80 3.3. Numerical Results .......................................................................................................... 86 Tomographic Inversion ...................................................................................... 87 Case1: Multi-Gaussian Slowness Map .................................................... 88 Case 2: Channel-Type Slowness Map ..................................................... 90 Aquifer Pumping Test in Single Phase Flow ...................................................... 92 Case3: 2D Channel-Type Transmissivity ................................................ 92 Case 4: 3D Multi-Gaussian Transmissivity ............................................. 95 Case 5: 3D Channel-Type Transmissivity ............................................... 98 3.4. Summary and Discussion ............................................................................................. 101 4. PROMOTING DISCRETE FEATURES THROUGH POTENTIAL FUNCTIONS .............................. 104 4.1. Connectivity-Preserving Constraints ............................................................................. 105 4.2. Discrete Regularization ................................................................................................ 108 4.3. Optimization Approach................................................................................................. 110 4.4. Numerical Results ........................................................................................................ 113 Tomographic Inversion (Example1) ................................................................. 113 Integration of Dynamic and Static Data (Example 2)........................................ 116 ix Large-Scale Pumping Test with Complex Geology (Example 3) ...................... 121 4.5. Summary and Discussion ............................................................................................. 125 5. PATTERN-BASED FEATURE LEARNING FOR HONORING FEASIBILITY CONSTRAINTS .... 130 5.1. Problem Statement and Formulation ............................................................................. 131 5.2. Enforcing Feasibility Constraints for Pattern Learning ................................................. 135 Mapping onto Feasible Set ............................................................................... 136 𝑘-NN Classification ............................................................................. 140 Aggregation ......................................................................................... 141 5.3. Computational Complexities ......................................................................................... 143 5.4. Numerical Results......................................................................................................... 146 Tomographic Inversion .................................................................................... 147 Effect of within Facies Heterogeneity ................................................... 149 2D Pumping Test ............................................................................................. 151 Two-Phase Flow in 3D Formations .................................................................. 155 5.5. Summary and Discussion ............................................................................................. 158 6. PATTERN-BASED FEATURE LEARNING UNDER UNCERTAINTY IN GEOLOGIC SCENARIOS ............................................................................................................................................... 163 6.1. Problem Statement and Formulation ............................................................................. 165 6.2. Optimization Approach ................................................................................................ 168 6.3. Exploring the Proposed Algorithm ............................................................................... 170 Mapping onto the Refined Feasible Set ............................................................ 175 Effect of Regularization Terms ........................................................................ 178 6.4. Numerical Results ........................................................................................................ 179 Travel-Time Tomographic Inversion................................................................ 180 Large Scale 2D Pumping Test .......................................................................... 182 6.5. Summary and Discussion ............................................................................................. 186 7. SUMMARY, CONCLUSIONS AND FUTURE WORK ................................................................. 190 7.1. Summary ...................................................................................................................... 190 x 7.2. Conclusions .................................................................................................................. 191 7.3. Future Work ................................................................................................................. 196 NOMENCLATURE .................................................................................................................... 199 BIBLIOGRAPHY ....................................................................................................................... 201 APPENDICES ........................................................................................................................... 208 Appendix A: IRLS Algorithm for 𝑙 # -Regularized Problems ................................................ 208 Appendix B: An Iterative Approach for 𝑙 # /𝑙 % -Regularized Problems .................................. 209 Appendix C: Sparse PCA Algorithm .................................................................................. 210 Appendix D: Pseudo Code for Model Selection Using Group-Sparsity Formulation .......... 211 Appendix E: 𝑘-SVD Dictionary Learning ........................................................................... 211 xi LIST OF TABLES Table 1.1. A summary of physical properties and their definitions ………………………...…………………………..4 Table 2.1. Different cases for the chosen dictionary and their corresponding tree structure in section 2.4 (Tomographic inversion) …………...……………………………………………………………………………….………………..61 Table 2.2. Average error between the estimated transmissivity and pressure heads and the true ones for different levels of noise (𝑒 # = # ) ∑ (𝑢 - −𝑢 / - ) % ) -1# and 𝑒 % = # ) ∑ |𝑢 - −𝑢 / - | ) -1# ) in Section 2.4 (pumping test) case 2 …...……..………..67 Table 2.3. Parameters of flow equations in Section 2.4 (waterflooding example) …....................................…………68 Table 3.1. Variogram parameters for Case 4 in Section 3.3 (3D pumping test) …...……………………...…..………97 Table E1. 𝑘-SVD Algorithm………………………………………………………………………………...………213 xii LIST OF FIGURES Figure 1.1. A two-phase flow system (waterflooding): (a) schematic of a reservoir with injection (production) wells on the left (right) side of the domain; (b) the intrinsic permeability distribution in the reference model consisting of high-permeability fluvial channels (red) and low-permeability background shale (blue); snapshots of pressure (c) and saturation (d) profiles after 10, 20, and 30 months ………………………………………………....……………..……6 Figure 1.2. Schematic of parameter representation via linear expansion: spatial zonation with predefined regions with similar parameter values ……………………………………….……………………………………...……………...18 Figure 1.3. Examples of generic (pre-computed) compressive transform bases: (a) sample low-frequency basis elements from the DCT basis; (b) samples basis elements from the discrete Haar wavelet. Example is shown for a 64×64 two-dimensional image. The basis elements are separated with black boxes …………………..…………....24 Figure 1.4. Examples of learned expansion images using prior training data: (a) prior (training) models used for constructing linear expansion images; (b) 𝑆=20 leading PCA basis elements; (c) sample 𝑘-SVD dictionary elements with 𝑆 =20 and 𝑘 =200. Examples are shown for 𝑛 : ×𝑛 ; = 100×100 two-dimensional model. The images are separated using white borders ………………………………………………………………………………………...28 Figure 1.5. Three prior scenarios with meandering (left), intersecting (middle), and straight (right) channel features; The training images are proposed to model the geologic scenarios, and each of them suggests specific types of connectivity…………………………………………………………………………………………………………...30 Figure 2.1. Comparison between the balls of group-sparsity and regular sparsity inducing norms in three dimensions, that is for 𝐯=[𝑣 # 𝑣 % 𝑣 @ ]: (a) 𝑙 # /𝑙 % -norm using two groups [𝑣 # ,𝑣 % ] and [𝑣 @ ], resulting in (𝑣 # % +𝑣 % % ) D E +|𝑣 @ |; and (b) 𝑙 # -norm, i.e., |𝑣 # |+|𝑣 % | +|𝑣 @ | ………………………………………………………………………...….…………..47 Figure 2.2. Comparison between reconstruction results with group-sparsity and 𝑙 # -norm regularization for different number of observations: (a) original group sparse signal; (b1)-(b5) reconstruction results for group-sparsity and regular 𝑙 # -norm sparsity inducing regularization for 10, 20, 40, 60 and 80 observations, respectively………………………..48 Figure 2.3. Reconstructing results for the group-sparse signal in Figure 2.2 for: (a) groups with size 5, by splitting the original groups of size 15 (overlapping case); (b) groups with size 10. In the latter case, the grouping structure is inconsistent with the structure in the reference model ………………………………………...….…………………...50 Figure 2.4. (a) Reconstruction results for the case with 60 measurements (Figure 2.2(b4)) with different regularization parameters; (b) data match for reconstructed signal with the corresponding regularization parameter in (a)………………………………………………………………………………………………...…………………...51 Figure 2.5. Tree structure of Wavelet basis: (a) The tree structure of the existing basis elements at different scales; (b, top) The position for the coefficients of each basis element, 0 is the DC basis component in 𝑉 G and (𝑗,𝑖) is the coefficient for 𝝓 K,- ; (b, bottom) Three examples of groups that represent variability in horizontal, diagonal and vertical directions, respectively …………………………………………………………………..…………………………...54 Figure 2.6. SPCA algorithm for grouping the basis elements in the DCT transform. Left: The prior permeability distribution with vertically-oriented channels. Right top: 6 groups out of total 200 generated groups with 1, 4, 8, 17, 11, 17 members which are the bases functions in the groups 1, 2, 3, 100, 155, 200, respectively (For example: group 155 with 11 bases). Starting from left to right in the first row and then the same direction in other rows, the maps in between two consecutive blank squares are members of the same group (For Example: the 4 bases functions between xiii the first and second blank squares are members of group 2). Right bottom: The corresponding 𝑑 - ’s for the first 100 generated groups ……………………………………………………………………………………………………...59 Figure 2.7. Tomographic inversion in Section 2.4: (a) Configuration for Tomographic inversion; (b) Best achievable maps (in RMSE) for Cases 1, 2, 3 and 4 with Wavelet transform; (c) Recoverability in tomographic inversion with the Wavelet bases; recovered solutions for Case 1 (c1), Case 2 (c2), Case 3 (c3), and Case 4 (c4) are shown. For each case, the solution with group-sparsity (top) and regular l1-norm minimization (bottom) are shown …………………………………………………………………………………………………………...…………....60 Figure 2.8. Pumping test (a) and waterflooding (b) at Section 4.2: (a1) Configuration of pumping test; (a2) Reference transmissivity, optimal transmissivity map that can be captured by the parameterization, the obtained reference pressure Head and the initial pressure head fields for two cases discussed; (b1) configuration of 9-spot water flooding as well as the reference permeability and optimal one that can be captured by 256 DCT bases; (b2) Saturation and pressure profiles at different time steps as labeled. The pressure and saturation data are obtained from these two profiles at the location of injection and production wells ……….…………………………………...………………..62 Figure 2.9. (a) location of monitoring (dots) and pumping (square) wells for the 3 cases in section 2.4 (pumping test) case 1; Reconstructed transmissivity and obtained pressure head fields for (b) group-sparsity (c) regular 𝑙 # and (d) first order Tikhonov regularizations ……………………………………………………....……………………………….64 Figure 2.10. (a) Prior transmissivity realizations in section 2.4 (pumping test) case 2; Right: Initial transmissivity field and obtained initial pressure head ……………………………………..….…………………...……………………...65 Figure 2.11. Reconstructed transmissivity and obtained pressure head fields for different levels of noise at section 2.4 (pumping test) case 2. Each column is representing the results of simulations in a specific level of noise variance defined based 𝛼 for three methods group-sparsity (a), regular 𝑙 # (b) and weighted 𝑙 # (c) …………………….…...…..66 Figure 2.12. (a) Prior permeability realizations in Section 2.4 (waterflooding); (b) the recovered permeability maps with, from top to bottom rows, the group-sparsity (b1), regular 𝑙 # (b2), weighted 𝑙 # (b3), first order Tikhonov (b4) and total variation regularizations (b5), respectively (the columns show the solution at different iterations, as labeled)………………………………………………………………………………………………...……………...69 Figure 2.13. Water-cut data match and predictions for Producer 3 (a) and Producer 8 (b), based on the solutions obtained from group-sparsity, weighted 𝑙 # , 𝑙 # , first order Tikhonov, and total variation penalties ……………………70 Figure 3.1. (a) Three TIs with meandering (left), intersecting (middle), and straight (right) channel features; (b) four sample realizations simulated from each TI. The TI with meandering channel is considered with two rotation angles (θ=0 ° and θ=90 ° ) while the other two TIs each lead to four alternative scenario with rotation angles θ= 0 ° ,45 ° ,90 ° ,135 ° ; (c) Sample TSVD basis elements corresponding to different geologic scenarios, i.e., 𝚽= [𝚽 # 𝚽 % …𝚽 #G ] and the TSVD basis obtained after combining all prior realizations, i.e., 𝚽 U ………………….…...81 Figure 3.2. (a) Eigen-spectrum of the prior models for groups 𝐔 # (meandering channel) and 𝐔 W ((intersecting channel); (b) Corresponding TSVD approximation quality for different level of energy …………...……….………..82 Figure 3.3. Approximation of sample models with TSVD bases from different groups: (a) meandering channel example; (b) intersecting channel example; (c) bar plots of the normalized data misfit (RMSE) and the 𝑙 % -norm of the best approximation coefficients ……………………………….……………………………………………………...83 Figure 3.4. Best approximate representations of three models using group-sparsity regularization: (a) reference model consistent with group 𝐔 W ; (b) reference model has features similar to those in groups 𝐔 @ and 𝐔 X ; (c) reference model with completely different geologic features than those in prior groups ……………………………………..……..….84 Figure 3.5. The convergence behavior of group-sparsity during sample iterations, including the misfit term, the regularization terms, reconstruction coefficients and spatial maps for three different initializations: Case 1, the initial coefficients are uniformly distributed across the groups and have relatively small values; Case 2, initial coefficients xiv are uniformly distributed and have relatively large values; Case 3, initial coefficients are zero for all groups except for an inconsistent group, for which relatively large values are assigned …………………………………………………85 Figure 3.6. (a) Tomographic inversion setup; (b) Three anisotropic variogram models with specified ranges; (c) Four sample realizations (out of 500) from 12 different groups that are obtained by assigning four anisotropy directions θ=0 ° ,45 ° ,90 ° ,135 ° to each variogram model. The realizations are generated using the sgsim algorithm ………...87 Figure 3.7. Tomographic inversion results for: (a) multi-Gaussian example; and (b) channel example. In each case, the figures on the left show the reference models (first row) and best approximation maps (assuming full knowledge of all grid cells) with 𝚽 (second row) and 𝚽 U (third row) while the plots in the second column show the reconstructed maps with 𝑙 # /𝑙 % − 𝚽 (top), 𝑙 # −𝚽 (middle) and 𝑙 # −𝚽 U (bottom); the corresponding reconstructed coefficients are shown on the rightmost column …...………………………………………………………………………..………...89 Figure 3.8. Effect of data content and model complexity in tomographic inversion example; two different meandering channels are used as reference slowness maps to illustrate the effect of data acquisition configuration (when two channels are intersected by the same ray path). The example in (a) shows the limitation of the TSVD basis for parameterization of complex meandering channels in ill-posed problems ……………………..…………………......91 Figure 3.9. Model calibration results for a pumping test experiment: (a) 2D configuration of pumping test; (b) the reference transmissivity field and best achievable approximation of it (top), and the corresponding reference and initial pressure head fields (bottom); (c) transmissivity fields and pressure head after calibration using group-sparsity (𝑙 # /𝑙 % - norm) and regular 𝑙 # -norm sparsity inducing reconstruction …………………………...……………………….……93 Figure 3.10. The behavior of the group-sparsity reconstruction during model calibration in the 2D pumping test: the first eight rows show the reconstructed coefficients (left) with their corresponding spatial maps (middle) and group contributions (left) for select iterations. The last row displays the evolution of the misfit term, regularization term, and the overall objective function ………………………………….…………………...……….………………………...95 Figure 3.11. Reconstruction results for the 3D pumping test: (a) 3D aquifer configuration for the pumping test; (b) reference transmissivity field and the best achievable approximation (assuming perfect knowledge of the field), as well as the reference and initial pressure head fields; (c) reconstruction results for transmissivity and pressure head fields with group-sparsity (𝑙 # /𝑙 % -norm) and regular sparsity (𝑙 # -norm) regularization ……………………….……....96 Figure 3.12. Two samples realizations (top and bottom) from prior scenarios 1, 2, 16, and 27 for the channel type 3D transmissivity field in Case 5. The training images TI # , TI % and TI @ represent meandering, intersecting and straight channels, respectively (as depicted in Figure 3.1(a)) ………………………………………………………….………98 Figure 3.13. Reference transmissivity and pressure field for the 3D pumping test with channel-type connectivity. The locations of monitoring and pumping wells are shown with black dots and a red star, respectively …………………..99 Figure 3.14. Reconstruction results: (a) evolution of the transmissivity field with iterations (each row shows the results for layers 1-3, 4-6 and 7-9); (b) evolution of the 𝑙 % -norm of the coefficients corresponding to the 27 prior groups (geologic scenarios) with iterations; (c): the pressure head field predicted from the final transmissivity field; (d) the initial and final active coefficients ………………………………………..……………………………...………….100 Figure 4.1. (a) a discrete regularization function; (b) Sample discrete regularization functions for three-valued; (c) and four-valued facies; (d) the behavior of alternative discrete regularization functions at the neighborhood of the 𝑖 [\ discrete level ………………………………………………………………………………..……………………….109 Figure 4.2. The behavior of the discrete penalty function (𝜃 - −𝑢 - ) % +𝛽𝑢 - % (𝑢 - −1) % ,𝑢 - ∈[0,1] for different values of 𝛽 and input argument 𝜃 - . In each plot, the x and y axes show the 𝑢 - values and the value of the objective function, respectively. As the value of 𝛽 decreases, the minimum of this function gets closer to the discrete values 0 or 1, depending on the value of the continuous variable 𝜃 - ………………………………………..………………..……..112 xv Figure 4.3. Travel-time tomography example: (a) Transmitter/Receiver setup; (b) reference three-facies model; (c) initial model realizations generated using sequential indicator simulation; and (d) the corresponding 𝑘-SVD dictionary atoms learned from the realizations (with 𝑘=1000,𝑆=30) ………………………………………………....……114 Figure 4.4. Travel-time tomography reconstruction results with different discrete regularization function forms: (a)- (c) show examples in which more weight is given to identification of shale, sand, and both shale and sand facies types, respectively, as discrete values; the results in (d) are for the locally shifted Tukey’s bi-weight function, which is quite similar to the results in (c), suggesting similar effect for the two regularization forms ……………………………………………………………………………………………………...…………..……115 Figure 4.5. Well configuration (a) and reference facies model (b) and its corresponding pressure head map (c) for Examples 2; (d) and (e) show sample prior (training) models (left) used for constructing SVD basis (middle) and 𝑘- SVD (right) sparse dictionaries without and with conditioning on hard data, respectively ………………………….117 Figure 4.6. Calibration results for Examples 2 without (a1)-(d1) and with (a2)-(d2) hard data conditioning. Columns (1)-(4) contain the continuous and discrete solutions using SVD, SVD+discrete regularization, 𝑘-SVD, and 𝑘- SVD+discrete regularization, respectively ………………………………………………...…………………….….119 Figure 4.7. Data match and forecasts for pressure-head at two sample monitoring wells (left and middle), as well as water extraction rate at the pumping well (right) in Example without (a) and with (b) hard data conditioning…………………………………………………..……………………………………………...…...…120 Figure 4.8. Well configuration for Example 3: (a) showing the locations of extraction (pumping) and monitoring wells; (b) Reference hydraulic conductivity model; (c) mean map of prior (training) hydraulic conductivity models conditioned on hard-data at well-locations (both extraction and monitoring wells) …………………………………122 Figure 4.9. Low-rank representation error measures (a) RMSE, (b) normalized RMSE & Computation Time measure; (c) prior (training) hydraulic conductivity models used for constructing sparse dictionaries with hard-data conditioning for Example 4 (a); SVD elements with ranks 1-9 (d), and 492-500 (e); (f)-(k) depict example k-SVD dictionary atoms using a rank-T prior model representations followed by k-SVD dictionary learning for different 𝑇, 𝑘, and 𝑆 values: (f) 𝑇=500,𝑘=500,𝑆=40; (g) 𝑇=500,𝑘 =500,𝑆=60; (h) 𝑇 =500,𝑘 =500,𝑆 =100; (i) 𝑇 =500,𝑘 = 1000,𝑆=40; (j) 𝑇=500,𝑘 =1000,𝑆=60; (k) 𝑇=500,𝑘 =1000,𝑆=100 ………………………………...123 Figure 4.10. Calibrated hydraulic-conductivity models for Example 3 (with hard-data conditioning) using: (a) truncated SVD parameterization, (b) truncated SVD parameterization with discrete regularization, (c) k-SVD based sparse reconstruction followed by thresholding; (d) k-SVD based sparse reconstruction with discrete regularization. (a1)-(d1) show the continuous solutions; (a2)-(d2) display the corresponding discrete solutions (in (a2) and (c2), show discrete solutions after thresholding); (a3)-d3) depict the estimation error (difference between the reference model and the solutions obtained in each case) ………………………………………………………………………...……….124 Figure 5.1. The schematic of mapping a parameterized solution onto the feasible set 𝛀 𝐩 . Mapping 𝚽 𝐯 c onto the feasible set 𝛀 𝐩 results in 𝐮 c …….………………………………………………………………………………...….136 Figure 5.2. Illustration of the mapping operator using a supervised learning approach. The pairs (𝐫 - ,𝐮 - ) are the learning samples, where 𝐮 - is the mapping of 𝐫 - onto 𝛀 𝐩 , which are used to learn the mapping operator. The learned mapping operator is then applied to 𝚽 𝐯 c to obtain 𝐮 c ……………………………………………...……...……….137 Figure 5.3. Construction of the learning data pairs (𝐫 - ,𝐮 - ) from the training image: The feasible models are generated from the training image (shown on left) while the corresponding parameterized samples 𝐫 - ’s are PCA approximations of the feasible samples. The rank of the PCA parametrization is the same for the training data and Step (i) of the inversion approach.…………………………………………………………...…………………………….……….138 Figure 5.4. Illustration of the neighborhood templates (left) and the feature/label vectors (right). The feature space is defined by the parameterized representation of the patterns inside the templates while the label vectors correspond to the feasible patterns ………….………………………………….……………………………………….………….140 xvi Figure 5.5. The schematic of the 𝑘-NN classifier, which is used to replace parameterized patterns with their corresponding feasible samples in the feasible set. For each cell in the model 𝚽 𝐯 c , a local template is used to extract pattern features and identify its k closest feature vectors and their corresponding labels in the learning dataset……..141 Figure 5.6. Implementation of the mapping method with aggregation in the second Step of the algorithm. The aggregation step stores the entire spatial labels (instead of the center cells), thereby including significantly larger number of sample labels for each cell and extending the spatial connectivity beyond a single template size. In this example, the samples for the cell indicated with a black circle are generated by considering the labels identified for all the templates that include this cell. Without aggregation, only the k labels corresponding to the template centered at this cell location would be used. The plot on the right shows the generate sample conditional PDF for this cell…………………………………………………………………………………………………………………...142 Figure 5.7. Results of mapping with discrete thresholding (top right), mapping onto the feasible set without aggregation (middle), and with aggregation (bottom). The results are shown for template sizes 𝑟 =5 and 𝑟 =25 and 𝑘 values (in 𝑘-NN classifier) of 𝑘=1 and 𝑘 =5 ………………………..………………………………………….142 Figure 5.8. Spatially uniform and full samples within the templates (a), and random subsampling (b) to speed up the implementation of the learning approach. The mapping results (classification + aggregation) for different subsampling rates are shown in (c). In this case, 1000 feature vectors are used in the learning dataset, and the neighborhood templates are squares with size of 50×50. The value of 𝑘 in the 𝑘-NN classifier is set to 1 ……………………......144 Figure 5.9. Straight-ray travel time tomographic inversion example: (a) source-receiver configuration; (b) reference slowness map; (c) best parameterized approximation of the reference model with 40 leading PCA basis functions; (d) the initial guess for model calibration, which is the mean of the prior realizations ………………………………..…147 Figure 5.10. (a) Prior training image for a meandering channel; (b) four corresponding sample realizations from the resulting feasible set 𝛀 𝐩 ; (c) and the first 49 PCA leading basis functions generated from the prior realizations in 𝛀 𝐩 …………………………………………………………………………………………………………………....148 Figure 5.11. Tomographic inversion iterations, showing the parameterized solution 𝚽𝐯 in Step (i) (top) and the feasible solution 𝐮 obtained in Step (ii) (bottom) ………………….………………...…………………………...….149 Figure 5.12. Tomographic inversion with within facies variability: (a) Reference slowness map; (b) best parameterized approximation of the reference model with 100 leading PCA basis functions; (c) the initial guess for model calibration, which is the mean of the prior realizations; (d) histogram of cell values in the reference slowness map; and (e) 4 sample prior model realizations……………………………………………...….……………………150 Figure 5.13. Tomographic inversion with within facies variability: inversion iterations showing the parameterized solution Φv in Step (i) (left) and the feasible solution u obtained in Step (ii) (right)…………...…………………….151 Figure 5.14. Experimental set up in constant rate pumping test: (a) pumping and monitoring well configuration; (b) reference discrete hydraulic conductivity map; (c) best rank-40 continuous PCA approximation of the reference model; (d) the initial model of hydraulic conductivity in the inversion; (e)-(f) the pressure maps corresponding to the reference and initial hydraulic conductivity models ………………………..………………………………………………….152 Figure 5.15. (a) Prior training image for intersecting fluvial channels; with (b) four sample realizations of facies model in the training data; and (b) 49 leading PCA basis functions generated from the training samples in 𝛀 𝐩 …………….152 Figure 5.16. Model calibration results for the pumping test using the proposed machine-learning-based solution approach. The parameterized solution 𝚽𝐯 (top) and discrete solution 𝐮 (bottom) are shown for 10 iterations. The feasible solution is obtained by mapping the parameterized solution onto the feasible set at each iteration.................153 xvii Figure 5.17. Model calibration results for the pumping test using thresholding to obtain feasible solutions. The parameterized solution 𝚽𝐯 (top) and feasible solution 𝐮 (bottom) are shown for 10 iterations. The feasible solution is obtained by using a thresholding scheme to replace continuous values with their closest discrete level values……...154 Figure 5.18. Experimental setup for waterflooding: (a) well configuration; (b) reference permeability field (shown for Layers 1-4 and 5-8); (c) evolution of the non-wetting (e.g., hydrocarbon) phase saturation distribution; and (d) evolution of pressure distribution …………………………………………………………………….....……….….155 Figure 5.19. (a) Prior model realizations; (b) 12 leading PCA basis images; and (c) rank-50 PCA approximation of the reference model ………………………………………………………………………………………………….156 Figure 5.20. Model calibration results for Example 3 involving two-phase flow in a 3D formation. The two columns on the left show the results for Layers 1-4, and the two columns on the right display the results for Layers 5-8……157 Figure 5.21. Data match and model predictions for waterflooding example ……………….………………….…..158 Figure 6.1. Experimental set up for the constant rate pumping test: (a) pumping and monitoring well configuration; (b) reference discrete hydraulic conductivity map; (c) best PCA approximation of the reference model using the basis functions of the true geologic scenario; (d) the initial model of hydraulic conductivity in the inversion; (e)-(f) the pressure maps corresponding to the reference and initial hydraulic conductivity models……………………………171 Figure 6.2. (a) Three TIs with meandering (left), intersecting (middle), and straight (right) channel features; (b) four sample realizations simulated from each TI. The TI with meandering channel is considered with two rotation angles (θ=0 ° and θ=90 ° ) while the other two TIs each lead to four alternative scenario with rotation angles θ= 0 ° ,45 ° ,90 ° ,135 ° ; (c) Sample TSVD basis elements corresponding to different geologic scenarios, i.e., 𝚽= [𝚽 # 𝚽 % …𝚽 #G ]; and (d) initial coefficients, 𝐯 G …………………………………………………………….…………172 Figure 6.3. Model calibration results for the pumping test using the proposed algorithm: The parameterized solution 𝚽𝐯 (a) and feasible solution 𝐮 (b) are shown for 25 iterations (the results are shown for iterations 0-9, 12, 15, 18, 22 and 25); (c) final coefficients 𝐯 %j ; and (d) obtained pressure from the final feasible solution 𝐮 %j …………………174 Figure 6.4. Selected geologic scenarios based on their contribution in representing the parameterized solution: Here, similar to Chapter 3 the 𝑙 % -norm of the coefficients within each group, i.e., ‖𝐯 - ‖ % , is calculated as the measure of activeness for each geologic scenario. Top and middle rows correspond to the iterations 0-9, and the bottom row corresponds to the iterations 12, 15, 18, 22 and 25………………………………………………………………..….175 Figure 6.5. Effect of refining learning dataset on the mapping step: (a) selected geologic scenarios based on their contribution in representation of the parameterized solution; (b) the parameterized solution in iteration 3 and a sample feature vector constructed from the depicted spatial template; and (c) selected feature and their corresponding label vectors based on (left) entire (𝛀 𝐩 ) and (right) refined (𝛀 𝐩 lmn-)mo ) learning dataset…………………………………...177 Figure 6.6. The result of mapping and obtained empirical probability maps for the two cases in which the learning dataset is constructed based on the (i) 𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 #G } (top), and (ii) refined geologic scenarios 𝛀 𝐩 lmn-)mo (bottom)……………………………………………………………………………………………………………...178 Figure 6.7. Tomographic inversion set up: (a) configuration of the tomographic inversion; (b) reference slowness map; (c) best achievable solution through the parameterization; and (d) initial starting model 𝚽𝐯 G …………...……180 Figure 6.8. Tomographic inversion example: (a) parameterized (𝚽𝐯 c ) and (b) feasible (𝐮 c ) solutions in iterations 𝑘 =0−9……………………………………………………………………………………………………………181 xviii Figure 6.9. (a) coefficients of the parameterizations, i.e., 𝐯, in iterations 0-1-2-7 and 9 (from top to bottom); (b) 𝑙 % −norm of the coefficients within the 10 geologic scenarios during the same iterations………………….………..182 Figure 6.10. Experimental set up in constant rate pumping test: (a) pumping and monitoring well configuration; (b) reference hydraulic conductivity map as well as location of pumping (green stars) and monitoring (black dots) wells; (c) pressure map corresponding to the reference hydraulic conductivity map; (d) best achievable solution using the parametrizing basis functions; (e) the initial model of hydraulic conductivity in the inversion; and (f) the pressure maps corresponding to the initial hydraulic conductivity model…………………………………………………..…183 Figure 6.11. (a) The proposed training image to model the connectivity patterns; and (b) 4 geologic scenarios initiated based on uncertainty in the direction of continuity…………………………………………………………….……..183 Figure 6.12. Inversion results in iterations 1-5, 10, 15, 25, 30 and 40 of convergence: (a) parameterized solution (𝚽𝐯 c ); and (b) feasible solution (𝐮 c )…………………………………………………………………………….…..185 Figure 6.13. Coefficients of parameterization, i.e., 𝐯=[𝐯 # 𝐯 % 𝐯 @ 𝐯 W ], (a) 𝐯 G and (b) 𝐯 WG ; (c) 𝑙 % −norm of the coefficients within the 4 geologic scenarios during the iterations 1-5, 10, 15, 25, 30 and 40…………………………………...…185 Chapter 1: Introduction 1 CHAPTER 1 INTRODUCTION 1.1. PDE-Constrained Inverse Problems Inverse problems are frequently encountered in many areas of science and engineering where observations are used to estimate the parameters of a system (or a model). In several practical applications, the dynamic processes that take place in a physical system are described using a set of partial differential equations (PDEs), which are typically nonlinear and coupled. The inverse problems that arise in dynamical systems ought to be constrained to honour the governing PDEs. The observable responses of these dynamical systems can usually be described as a function of their state variables, which in turn depend on model inputs, including controls, initial/boundary conditions, and parameters. In general, the functional relation between model input parameters and observable responses can be expressed as a (typically nonlinear) mapping that involves the solution of the underlying PDEs. Examples of these physical systems include fluid flow and heat transfer processes [Patankar, 1980], electromagnetics systems [Peterson et al., 1998], motion of planets in solar system [Murray and Dermott, 1999], and human’s neural mechanism [Kandel et al., 2000]. The exponential increase in computing power has enabled considerable advances in numerical simulation of complex processes in large-scale physical systems that have high-dimensional PDEs as governing equations. Advances in computing power have also led to development of powerful inverse modelling algorithms, some of which may Chapter 1: Introduction 2 require thousands of forward model simulations, which was once considered infeasible for such systems. The parameters that appear in the governing PDEs of physical systems are either directly observable or they need to be inferred from indirect and often limited observable quantities (outputs) of the system [Tarantola and Valette, 1982; Zimmerman et al., 1998; Vogel, 2002; Oliver and Chen, 2011; Liu and Kitanidis, 2011]. In some cases, a spatially distributed physical property may only be directly observable at finite points in space (typically known as hard data), requiring spatial interpolation techniques (e.g., kriging methods) to predict unobserved parameter values. In general, estimation of model parameters from limited output measurements of the system leads to an inference framework, known as inverse problem [Tarantola, 2005; Mueller and Siltanen, 2012]. In many cases, the inverse modelling formulation involves a minimization problem where the objective function represents the mismatch between model predicted and observed data, as well as other terms that penalize departure from prior (explicit or implicit) knowledge about the solution. When the system outputs depend on the solution of the PDEs that establish physical laws (e.g., mass/momentum/energy balance), the resulting inverse problem needs to include the PDEs as constraints, thus leading to a PDE-constrained inverse problem. Including the PDE constraints ensures that the solution of the resulting inverse problem honours the underlying governing equations (i.e., well-established physical laws such as mass/momentum conservation). Inverse problems that arise in many practical applications are ill-posed, in the sense of Hadamard: the measured data does not allow the existence of a solution to the problem, the solution is not unique, or, even further, the solution is not stable due to disturbance in the data. When there are fewer measurements from a system than there are unknown model parameters, a situation that is commonly encountered in practice, the problem is “underdetermined” and cannot have a unique Chapter 1: Introduction 3 solution. Additional (priori) information is often needed to constrain the solution by eliminating implausible outcomes. A common approach to address solution non-uniqueness is to adopt a probabilistic (Bayesian) inverse modeling framework [Gavals et al., 1976; Gómez-Hernánez et al., 1997; Tarantola, 2005; Aanonsen et al., 2009; Lochbühler et al., 2015], where the elements of the inverse problem (parameters, data, and forward model) are represented with their respective uncertainties, using probability density functions (PDFs). In this thesis, we mainly focus on deterministic inverse problems. To formulate a general inverse problem, consider collecting the observations of a physical system in a vector 𝐝 stu ∈𝐑 w×# . These observations are related to the parameters of the system through a (generally nonlinear) mapping, i.e., 𝐝 stu =𝐠(𝐮 # ,𝐮 % ,…,𝐮 c ). Here, 𝐮 # ,𝐮 % ,…,𝐮 c are the parameters of the system, and 𝐠(.) is the nonlinear function that maps the parameter space onto the observation space. We assume that the observations 𝐝 stu and the parameters 𝐮 are vectors in 𝐑 w×# and 𝐑 )×# , respectively. Physics Laws in Subsurface Environments: Fluid flow and transport in underground (subsurface) porous rock formations plays a key role in developing the related energy and water resources in these systems. Mathematical modelling of the underlying physical processes is commonly used to predict the response of these systems to perturbations (forcing) introduced during resource development (extraction or injection of fluids). The description of the physical processes that take place in these systems leads to high-dimensional and coupled nonlinear PDEs, which include various rock properties as unknown parameters. It is common to formulate inverse problems to estimate the unknown parameters of these PDEs from observations of the dynamical response of these systems. Chapter 1: Introduction 4 An important example of PDE-constrained inverse problems is the multi-phase flow equations in the subsurface environments. The spatiotemporal evolution of multi-phase fluid flow can be expressed as a special form of the Navier-Stokes equations [Chorin, 1968; Constantin and Foias, 1988]. Conservation of mass, momentum and energy are three fundamental principles in the Navier-Stokes equations, which yield the following PDEs, respectively: z{ z[ +∇.(𝜌𝐯)=0 z𝐯 z[ +(𝐯.𝛁)𝐯=− # { 𝛁𝑃+𝐅+ { 𝛁 % 𝐯 (1.1) 𝜌 z z[ +𝐯.𝛁𝐸 −𝛁.(𝐾 𝛁𝑇)+𝜌𝑃∇.𝐯=0 , where 𝐯, 𝐸, 𝑃, T, 𝜌, 𝜇, 𝐾 and 𝐅 correspond to velocity, internal thermodynamic energy, pressure, temperature, density, viscosity, heat conduction coefficient and external forces per unit mass (See Table 1.1 for definitions). Table 1.1. A summary of physical properties and their definitions. Property Definition phase mobility The ratio of effective permeability to phase viscosity phase density The density of fluids, i.e. oil or water formation volume factor Volume of the phase at the in-situ pressure to its volume at standard surface condition Permeability Ability for fluids (gas or liquid) to flow through porous rocks Porosity Ratio of void space to total rock volume phase saturation Ratio of pore volume occupied by specific fluid phase Flux Flow rate per unit area pore volume Total void volume of reservoir wetting phase The phase with more tendency to maintain contact with the solid surface A special case of Navier-Stokes equations is incompressible and immiscible two-phase fluid flow system, for which the governing PDE equations can be expressed by combining mass balance and Chapter 1: Introduction 5 Darcy’s law (representing the momentum balance) [Chen and Doolen, 1998; Efendiev et al., 2000] as: 𝛁. 𝐮(𝛁𝑃 −γ 𝛁𝑍)= z z[ 𝜙 +𝑞 (1.2) 𝛁. 𝐮(𝛁𝑃 −γ 𝛁𝑍)= z z[ 𝜙 +𝑞 In the above equations, w and n represent the wetting and non-wetting phases, and 𝜆, γ, 𝐵, 𝐮, 𝜙, 𝑍, 𝑆 and 𝑞 correspond to the phase mobility, phase density, formation volume factor, intrinsic rock permeability, rock porosity, gravity potential, phase saturation and flux, respectively (See Table 1.1 for definitions). The governing PDEs in Equation (1.2) involve four unknown dynamic state variables: 𝑃 ,𝑆 ,𝑃 and 𝑆 . Two additional equations are needed to close the PDE system. These two equations are the constitutive equations on the pressures and saturations and are typically expressed as: 𝑃 −𝑃 =𝑃 (𝑆 ) (1.3) 𝑆 +𝑆 =1 , 0≤ 𝑆 ,𝑆 ≤1 (1.4) The first equation describes the capillary pressure (difference between non-wetting and wetting phase pressures) as a function of the wetting phase saturation, while the second equation imposes a physical constraint on the saturation of two phases in a fully saturated medium. With specified rock and fluid properties, initial and boundary conditions, and other input parameters and control forcing, the coupled PDE system can be discretized and solved numerically. In practice, the resulting discretized system can be high-dimensional (~10 6-7 unknowns) and computationally demanding to solve. A simple example of immiscible two-phase flow is waterflooding of oil reservoirs as depicted in Figure 1.1. Figure 1.1(a) shows a two- dimensional (1000×1000m % ) reservoir, which is discretized into 100×100 cells of the same Chapter 1: Introduction 6 size. A series of injection wells are placed on the left side of the domain to displace the hydrocarbons toward a similar array of production wells that are placed on the right side. Figure 1.1. A two-phase flow system (waterflooding): (a) schematic of a reservoir with injection (production) wells on the left (right) side of the domain; (b) the intrinsic permeability distribution in the reference model consisting of high- permeability fluvial channels (red) and low-permeability background shale (blue); snapshots of pressure (c) and saturation (d) profiles after 10, 20, and 30 months. In this example, the capillary pressure is set to zero everywhere, that is 𝑃 (𝑥,𝑡)= 𝑃 (𝑥,𝑡). Figure 1.1(b) depicts the intrinsic permeability distribution for this model, which shows a fluvial channel system with high permeability (red) channels embedded in low-permeability (blue) background shale. As shown in the saturation plots of Figure 1.1(c), fluids move faster in the high-permeability channel sections. Figures 1.1(c) and 1.1(d) display the solution of the PDE system as snapshots of pressure and saturation (𝑆 ) fields at different times within the first 30 months of the simulation. Waterflooding is used as secondary recovery mechanism, following the natural depletion of reservoir oil (due to high in-situ pressure), to maintain a high reservoir pressure and prevent gas from leaving the live oil. In Figure 1.1(a), the configuration includes the production wells (on the right) that produce water and oil, and injection wells (on the left) that inject water into the reservoir. Initially, the reservoir is fully saturated with the non-wetting phase (oil). Water injection into the Chapter 1: Introduction 7 reservoir displaces the oil from the left side toward the production wells on the right side, where the mixture of oil and water is extracted. “Straight-ray travel time tomography” and “pumping test in a single-phase flow system” are other examples of physics-based approaches for subsurface modeling. In travel time tomography, several sources are used to transmit acoustic waves throughout a medium. The arrival times of these waves are recorded by receivers at known distances from transmitters. The governing equation that relates the recorded arrival times with slowness (unknown parameters) of the medium can be derived as: 𝑡−∫𝑠(𝑥 ⃗)𝑑𝑥 ⃗ =0 𝑠(𝑥 ⃗)= # ¡(:⃗) . (1.5) , where 𝑡 denotes the arrival time of each wave, and 𝑠(𝑥 ⃗) represents the slowness (1/velocity) of the rock at spatial locations specified by the coordinate vector 𝑥 ⃗. Under simplified conditions, a discretized version of Equation (1.5) can be used to form a linear system of equations between recoded arrival times (i.e., observations) and discretized rock slowness map (i.e., parameters), i.e., 𝐝 stu =𝐆𝐮, where 𝐝 stu represents the observed data, and 𝐮 is the parameter interest (here slowness). In groundwater pumping tests, the observations, typically hydraulic conductivity and pressure samples at the location of monitoring wells, are used to estimate the full map of hydraulic conductivity (or alternatively transmissivity) distribution in an aquifer. The governing equations of the pumping test problem are derived from the classical equations of single phase flow in a saturated porous medium, i.e., conservation of mass and Darcy’s law. Assuming that mass fluxes due to dispersion and diffusion are negligible, the underlying equations can be stated as: Chapter 1: Introduction 8 £ ¤(¥{) ¤¦ =−∇.(𝜌𝐯)+𝑞 𝐯=− # 𝐊(∇𝑃−𝜌𝑔∇𝑍) (1.6) , where 𝜙, 𝜌, 𝐯, 𝑃 and 𝑞 represent porosity, density, velocity, pressure and sources/sink terms (injection/ extraction rates). The notations 𝐊 and 𝜇 refer to permeability and viscosity, respectively. Hydraulic conductivity is defined as 𝐊 \ =𝐊 {© and is used more commonly in hydrogeology. For pumping test problems, the forward model 𝐝 stu =𝐠(𝐮=𝐊 \ ) relates the full map of hydraulic conductivity, 𝐮= 𝐊 \ , to observed pressure head values at well locations, 𝐝 stu . The forward equations, i.e., 𝐠(.) or 𝐆, described above are used to predict the spatiotemporal evolution of the dynamical states of the system for a given set of input parameters and controls. The state variables of the system (pressure and saturation distributions) are only observable through indirect measurements (e.g., flowrates and pressures) at scattered well locations. The related inverse problem can then be posed to find the unknown parameters of the system (e.g., rock flow properties) from their limited, indirect, and nonlinear measurements. Inverse Problem Formulation: Consider the Banach spaces U and D, and a mapping 𝐺 ∶𝑈 →𝐷. The inverse problem consists of the solution to the equation [Resmerita, 2005]: 𝐠(𝐮)=𝐝 stu 𝐮∈𝑈 & 𝐝 stu ∈𝐷 (1.7) If an exact solution is not expected (e.g., due to observation errors), the inverse problem in Equation (1.7) is expressed as a minimization of the form: min 𝐮 ‖𝐠(𝐮)−𝐝 stu ‖ 𝟐 𝟐 𝐮∈ 𝑈 . (1.8) When the Banach space D is some L 2 -space, then this becomes a classical least squares problem [Tarantola, 2005]. The simplest form of inverse problems is obtained when observations and model parameters are related linearly, i.e., 𝐝 stu =𝐆𝐮+𝛜. Here, 𝐮 is the parameter of the interest, 𝐆 is Chapter 1: Introduction 9 the linear mapping from parameter space to observation space, and 𝛜 is the observation noise, which is usually considered to be independent of the parameters 𝐮. In the linear case, the inverse problem in Equation (1.8) is expressed as: min 𝐮 ‖𝐆𝐮−𝐝 stu ‖ 𝟐 𝟐 𝑠.𝑡., 𝐮∈𝑈 (1.9) with a simple quadratic objective function. For noisy data, in practical applications, the least square term in Equation (1.9) is generalized to ´𝐂 𝛜 ¶ D E (𝐆𝐮−𝐝 stu )´ 𝟐 𝟐 , where 𝛜 represents the measurement noise vector with a (usually diagonal) noise covariance matrix 𝐂 𝛜 . For ill-posed linear inverse problems, the formulation often takes the form: min 𝐮 𝐽(𝐮) 𝑠.𝑡., ´𝐂 𝛜 ¶ D E (𝐆𝐮−𝐝 stu ) ´ % % ≤𝜎 % (1.10a) min 𝐮 𝐽(𝐮) + # E (´𝐂 𝛜 ¶ D E (𝐆𝐮−𝐝 stu ) ´ % % −𝜎 % ) (1.10b) min 𝐮 ´𝐂 𝛜 ¶ D E (𝐆𝐮−𝐝 stu ) ´ % % +𝜆 % 𝐽(𝐮) (1.10c) In Equation (1.10a), the constraint, i.e., ´𝐂 𝛜 ¶ D E (𝐆𝐮−𝐝 stu ) ´ % % ≤ 𝜎 % , is added to the objective function by the penalty method, and the resulting equation in (1.10b) is rewritten as Equation (1.10c) after multiplying the objective function by 𝜆 % . In Equation (1.10), 𝐽(𝐮) is a function that restricts (regularizes) the behaviour/structure of 𝐮, and 𝜎 % is a bound on the observation error. For example, if 𝐮 G is a prior belief about the parameter 𝐮, minimization of 𝐽(𝐮)=‖𝐮−𝐮 G ‖ % % results in a solution with minimum departure from 𝐮 G [Tarantola, 2005]. A classic example of regularization function is the Tikhonov regularization forms [Tikhonov and Arsenin, 1974], for which 𝐽(𝐮) is defined as the second norm of the first or second derivatives of the parameters (to Chapter 1: Introduction 10 promote solution smoothness or flatness, respectively). It is also important to note that the regularization parameter 𝜆 is not trivial to specify. For linear problems, cross validation [Golub et al., 1979] and 𝐿-curve [Hansen, 1992] methods have been proposed for finding an optimal value for the regularization parameter. In nonlinear inverse problems, the relationship between the observed data and model parameters is nonlinear, i.e., 𝐝 stu = 𝐠(𝐮)+𝛜 [Tarantola and Valette, 1982; Snieder, 1998]. A simple formulation of an ill-posed non-linear inverse problem can be expressed as: min 𝐮 𝐽(𝐮) 𝑠.𝑡., ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % ≤𝜎 % (1.11a) min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % +𝜆 % 𝐽(𝐮) (1.11b) For physical systems in which the evolution of the state variables is determined by solving PDE systems, the resulting inverse problems include the PDEs as constraints; that is, min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % +𝜆 % 𝐽(𝐮) 𝑠.𝑡., 𝑓»𝐮,𝐱(𝐮)½= 0 (1.12) , where 𝑓»𝐮,𝐱(𝐮)½=0 represents the PDE system. We note that the measurement operator 𝐠(𝐮) is usually a function of the state vector 𝐱(𝐮), which is not explicitly expressed in Equation (1.12) for compactness. It is common to enforce the constraint by first solving the PDE system to obtain the state variables and then use them to predict the measurements. In other words, the PDE equations can be included within the nonlinear measurement operator. In practice, nonlinear inverse problems do not lend themselves to analytical solutions and iterative numerical optimization techniques must be employed to find the solution. In iterative solution schemes, given the current iterate 𝐮 (c) , an updated solution is sought by expanding the nonlinear function 𝐠(𝐮) around the current iterate, using either first or second order Taylor Chapter 1: Introduction 11 expansions. For example, when a linear approximation is used, the resulting objective function to be minimized takes the form: 𝐮 (c¾#) =argmin 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠»𝐮 (c) ½+𝐆 𝐮 »𝐮−𝐮 (c) ½ ) ´ % % +𝜆 % 𝐽(𝐮) (1.13) , where 𝐆 𝐮 is the Jacobian matrix that contains the first order derivative of multivariate vector function 𝐠(𝐮) with respect to entries of 𝐮=𝐮 (c) . The linear objective function in Equation (1.13) can be readily minimized to find 𝐮 (c¾#) , and the process is continued until the algorithm converges to a solution [Tarantola, 2005]. Probabilistic inversion methods are alternative approaches that rely on sampling from posterior distribution (conditional probability). To this end, the posterior distribution of the parameters 𝐮 is constructed as: 𝑝𝑟𝑜𝑏 𝐮|𝐝 ÅÆÇ (𝐮|𝐝 stu ) = 𝑝𝑟𝑜𝑏 𝐝 ÅÆÇ |𝐮 (𝐝 stu |𝐮)𝑝𝑟𝑜𝑏 𝐮 (𝐮) ∫𝑝𝑟𝑜𝑏 𝐝 ÅÆÇ |𝐮 (𝐝 stu |𝐮)𝑝𝑟𝑜𝑏 𝐮 (𝐮)𝐝𝐮 ∝𝑝𝑟𝑜𝑏 𝐝 ÅÆÇ |𝐮 (𝐝 stu |𝐮)𝑝𝑟𝑜𝑏 𝐮 (𝐮) =𝑝𝑟𝑜𝑏 𝛜 (𝐝 stu −𝐠(𝐮))𝑝𝑟𝑜𝑏 𝐮 (𝐮) (1.14) , where 𝑝𝑟𝑜𝑏 𝐮 (𝐮) and 𝑝𝑟𝑜𝑏 𝛜 (𝛜) are the prior probability distribution corresponding to 𝐮 and the probability distribution of the observation noise, respectively (considering the relationship 𝐝 stu = 𝐠(𝐮)+𝛜). As a result, a conditional probability distribution is obtained (or approximated) for the parameters 𝐮. Finally, either multiple model realizations are sampled from this conditional probability or an accuracy measure is adopted to obtain a single sample from the posterior distribution. For example, minimum mean-square solution, which results in a single sample (conditional mean) from the posterior distribution, is obtained by solving the following optimization problem: 𝐮 É = argmin 𝐮 É 𝐄 𝐮|𝐝 ÅÆÇ {(𝐮−𝐮 É) 𝟐 } (1.15) Chapter 1: Introduction 12 In practice, it is challenging to obtain the exact form of the posterior distribution when 𝐠(.) is nonlinear or the parameters are non-Gaussian. Therefore, sampling techniques, e.g., variants of Monte Carlo methods, are demonstrated to have great value in solving such problems [Burgers et al., 1998; Evensen, 1994; Evensen, 2003; Oliver et al., 2008]. 1.2. Challenges in Solving Subsurface Inverse Problems Subsurface inverse problems involve estimating the spatial distribution of rock properties, typically in form of 2D or 3D images/maps, from limited available measurements, with the goal of improving future model predictions for improved optimization and management purposes. Reproducing the measured data and preserving the expected characteristics of the geologic formation are two general requirements in these problems [Zimmerman et al., 1998; Oliver et al., 2008; Zhou et al., 2014]. Problem ill-posedness, non-uniqueness and multiple sources of uncertainty, nonlinearity and computational complexities, non-Gaussianity, and geologic plausibility constraints complicate the solution of subsurface inverse problems [Gómez-Hernández and Journel, 1993; McLaughlin and Townley, 1996; Oliver et al., 1997; Oliver et al., 2008; Vrugt et al., 2008]. Application of inverse modeling to characterization of subsurface environments has largely been challenged by the discrepancy between the desire to develop high resolution models and the limitation imposed by low spatial resolution and coverage of available data to adequately constrain such detailed models. This discrepancy often leads to ill-posed inverse problems in which an overwhelmingly large number of subsurface model parameters (2D-3D maps) has to be estimated from scattered response data. Consequently, many geologically distinct solutions may be found that reproduce the observed data, but provide different predictions [McLaughlin and Townley, Chapter 1: Introduction 13 1996; Feyen and Caers, 2006; Oliver et al., 2008; Suzuki and Caers, 2008; Khaninezhad et al., 2012]. In general, additional information is needed to constrain the solutions to a geologically plausible set. Underdetermined inverse problems are solved by either reducing the number of model parameters, typically achieved through parameterization [Jacquard, 1965; Carrera and Neuman,1986; Kitanidis, 1997; Bhark et al., 2011; Chen, and Oliver, 2012], or by increasing the amount of information, often accomplished through regularization [Engl et al., 1996; Tarantola, 2005; Vrugt et al., 2008]. In both cases prior knowledge is used either implicitly, i.e., via information about solution structure (e.g., roughness) [Tikhonov and Arsenin, 1974], or explicitly, using existing model(s) to constrain the solution. Probabilistic inversion methods address solution non-uniqueness by representing the uncertainty in data, model, and prior knowledge, through probability density functions (PDFs), and by characterizing the solution with a posterior PDF or several realizations of it. In particular, ensemble-based subsurface model calibration offers a practical approach for reflecting the non-uniqueness of subsurface inverse problem solutions by providing multiple realizations of subsurface properties and their underlying flow and transport predictions [Evensen, 1994; Aanonsen et al., 2009]. In subsurface inverse modeling problems, prior knowledge is available through quantitative and qualitative data, expert knowledge and interpretations, and process-based modeling of depositional environments. Integration of these sources of information usually leads to a geologic continuity scenario, which is typically used as prior knowledge [Strebelle, 2002; Caers and Zhang, 2004]. Traditionally, the adopted geologic scenario is used to constrain the solution of inverse problem, for examples, by providing a variogram model [Cressie and Hawkins, 1980] to describe the continuity in heterogeneous subsurface properties or a training image (TI) Chapter 1: Introduction 14 [Caers, 2003; Arpat, 2005; Zhang et al., 2006; Hu and Chugunova, 2008] that prescribes the expected connectivity patterns in more complex geologic formations. Such prior models constrain the form of continuity in subsurface properties and allow for variability in the exact spatial locations of the existing patterns. A major flaw in adopting this approach is neglecting the uncertainty in the adopted geologic scenario, which can be quite significant. In general, provided with identical sources of information for a given geologic environment, different geologists may form different geologic scenarios and conceptual models of the formation that cannot be refuted. Therefore, it is important to acknowledge and incorporate the uncertainty in geologic continuity model, i.e., model structure and geologic scenario, in solving subsurface inverse problems [Suzuki and Caers, 2008; Khodabakhshi and Jafarpour, 2011; Park et al., 2013; Rousset, 2015; Shirangi and Durlofsky, 2015]. Non-Gaussianity (complex geologic connectivity patterns) of the subsurface model parameters makes the inversion process further complicated. These complexities are the result of inability in describing non-Gaussian PDFs with mathematically convenient forms, and consequently, conveying them to the inversion process. Training image-based simulation techniques are developed to facilitate constraining the connectivity patterns in solving these problems. However, in these techniques, incorporating the provided connectivity patterns in the inversion process is nontrivial (especially when the measurements are nonlinearly correlated to the model parameters). A major contribution of this thesis is in developing systematic inversion methods for estimating non-Gaussian parameters. Next, we provide a brief introduction about the parameterization and regularization techniques that are frequently used in the next chapters. Chapter 1: Introduction 15 1.3. Parameterization and Regularization Techniques Inverse problems often involve high-dimensional parameters with complex relations that need to be estimated from low-resolution nonlinear data. In addition to numerical stability issues (due to high-dimensional and low-rank nature of the matrices involved) in solving such ill-posed inverse problems, several non-unique solutions can be found that reproduce the (limited) available data, but fail to predict the future response of the system. In some physical systems, the parameters may represent a spatially distributed material property with specific spatial features (patterns). In that case, in addition to dealing with high parameter dimensionality, it is important to preserve the expected spatial structure of the parameters. Parametrization and regularization are two common approaches that aim to achieve these goals by reducing parameter dimensionality and imparting pre-specified attributes on the solution, respectively. Techniques for regularizing the solution of ill-posed inverse problems have been extensively studied in the literature (e.g., see [Engl et al., 1996; Tarantola, 2005; Vrugt et al., 2008]). Regularization is usually implemented by minimizing a penalty function (𝐽(𝐮) in Equations (1.10)-(1.13)) that promotes an attribute of interest in the solution, e.g., penalizing solution roughness to obtain smooth solutions. By imposing certain patterns/attributes on the solution, regularization creates correlation structures that, in effect, implicitly reduce the dimension of the parameter space. Inverse problem formulations are directly influenced by the choice of parameters (i.e., parameterization). Parameterization (or re-parameterization) refers to changing the original parameters of an inverse problem to a (typically much smaller) set of new parameters that facilitate the search for a solution. It is often used to explicitly reduce the number of unknown parameters while capturing their main characteristics, with the purpose of alleviating problem ill-posedness. Parameterization can also provide more compact descriptions of complex parameter structures and Chapter 1: Introduction 16 facilitate their reconstruction. In solving inverse problems, choosing an appropriate domain that affords an effective description of the parameters is complicated by the lack of complete knowledge about the solution. However, a reasonable choice for the parameter domain may be deduced from the knowledge about the physics of the system under analysis. Parameterization can be performed either in the original domain (space/time), in which the PDEs are solved, or they can be implemented by transforming the parameters into a different (often abstract) domain with certain desirable properties. A linear parameterization [Yeh, 1986; Vo and Durlofsky, 2015; Golmohammadi and Jafarpour, 2016] can generally be expressed as: 𝐮= 𝚽𝐯=∑ 𝝓 - 𝑣 - c -1# (1.16) , where 𝐮 and 𝐯 are vectors of original and transformed model parameters, respectively; and 𝚽 is the linear transformation matrix with columns corresponding to bases functions that are linearly combined, using the entries of 𝐯 as coefficients, to yield 𝐮. Matrix 𝚽 can be viewed as a linear mapping of the transformed parameters 𝐯 onto the original parameters 𝐮. Alternative choices of 𝚽 lead to different parameterization bases (domains) with distinct properties that can be exploited in formulating the inverse problem. Using the linear relation 𝐮=𝚽𝐯, it is straightforward to rewrite the inverse problem objective function in (1.13) in terms of 𝐯 as follows: 𝐯 (c¾#) =argmin 𝐯 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠»𝐯 (c) ½+𝐆 𝐯 »𝐯−𝐯 (c) ½ )´ % % + 𝜆 % 𝐽(𝐯) (1.17) , where 𝐽(𝐯) defines a regularization constraint on transform domain coefficients 𝐯 (more details are provided in the subsequent chapters and sections). Note that, the transformation matrix 𝚽 is assumed to be constant and dropped for brevity. Furthermore, 𝐆 𝐯 in Equation (1.17) presents the Chapter 1: Introduction 17 Jacobian matrix of the observations with respect to the transformed coefficients and can be simply calculated through the chain rule as: 𝐆 𝐯 = z z𝐯 𝐠(𝐯)| 𝐯1𝐯 (Ë) =𝐆 𝐮 𝚽 (1.18) A nonlinear form of parameterization can also be generally expressed as 𝐮=ϕ(𝐯), where the mapping ϕ(.) represents a general nonlinear transformation. A common form of nonlinear transformation is kernel-based methods in which a nonlinear mapping is used to define a new set of parameters, on which a linear transformation is subsequently applied. Kernel-based methods use kernel functions to operate in high-dimensional feature spaces without computing the coordinates of the feature space [Schölkopf and Smola, 2002; Nasrabadi, 2007; Sarma et al., 2008; Vo and Durlofsky, 2016]. Instead, they compute the inner products of images of all pairs of data in the feature space. Using this approach, the inner product of the vectors in the nonlinear space is calculated by kernel functions, 𝑘(𝐯,𝐯 Í ) = < ϕ(𝐯),ϕ(𝐯 Í )>, where ϕ is a feature map (e.g., a polynomial). The kernel 𝑘(𝐯,𝐯 Í ) is a function of 𝐯 and 𝐯 Í , and it eliminates the need for nonlinear expansion of the parameters. A major difficulty that arises in implementing nonlinear transformations is the lack of unique back transformation due to the nonlinear form of the transform ϕ(.). In this thesis, linear transforms are discussed. 1.3.1. Spatial Parameterization/Regularization Zonation Parameterization: Zonation [Jacquard, 1965] is a simple parameterization technique in which subsets of the parameter vector 𝐮 are assumed to have (approximately) identical values and can be aggregated into a single parameter. In imaging applications where 𝐮 is a spatial image (of an unknown property distribution), subsets of entries of 𝐮 that correspond to a local neighbourhood in the image form a segment or a zone with identical parameter values. By Chapter 1: Introduction 18 aggregating such multiple entries into a single parameter, zonation can significantly reduce the number of parameters. Figure 1.2 depicts a sample parameter distribution (shown in 𝑥−𝑦 plane) that consists of 𝑘 regions or zones (𝑹 # ,...,𝑹 c ). If the parameter values in each region are similar, the number of parameters can be reduced to 𝑘 ≪𝑛. This parameterization can be effectively expressed using a general linear expansion representation, consisting of basis vectors 𝝓 Ó:#ÕÓÕc in which only the entries corresponding to region 𝑹 Ó are nonzero (ones) and the remaining entries are zero. In Figure 1.2, this linear expansion form is illustrated pictorially. Figure 1.2. Schematic of parameter representation via linear expansion: spatial zonation with predefined regions with similar parameter values. Using zonation, the formulation of the inverse problem is reduced to: min 𝐯 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % +𝜆 % 𝐽(𝐮) 𝑠.𝑡., 𝐮= ∑ 𝝓 - 𝑣 - c -1# (1.19a) , where with the new parameters, i.e., [𝑣 # 𝑣 % …𝑣 c ], the problem is better posed (only 𝑘 unknowns). If the number of regions is substantially low, the problem can become well-posed even without the regularization term, i.e., 𝐽(𝐮) in Equation (1.19a). Therefore, a simpler version of the problem is achieved by eliminating the regularization term in Equation (1.19a): min 𝐯 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % 𝑠.𝑡., 𝐮=∑ 𝝓 - 𝑣 - c -1# (1.19b) Although zonation is a simple and intuitive parameterization approach, it suffers from a few shortcomings. First, it is not trivial to define the zones for an unknown map a-priori. Adaptive Chapter 1: Introduction 19 multi-resolution zonation techniques [Grimstad et al., 2003] have been developed that allow the zones to be redefined (updated) during inversion. Second, the sharp boundaries that separate the zones are geologically unrealistic. Finally, eliminating the variability (heterogeneity) within each region can result in unintended elimination of local, but important, flow-related features and introduce bias in future predictions. Several other parameterization methods have been developed to improve the ill-posedness of inverse problems. Examples of these methods include transform- domain methods, e.g., the Principal Component Analysis (PCA), the Fourier-based Discrete Cosine Transform (DCT), and the Discrete Wavelet Transform (DWT). Tikhonov Regularization: Tikhonov regularization [Tikhonov and Arsenin, 1974] is achieved by minimizing the zeroth/first/second order derivative, of the solution to promote minimum-length/smooth/flat solutions, respectively. Tikhonov regularization has been widely applied to inverse problems in several imaging applications, where the parameters are expected to show some degree of continuity. The reason for this attribute is that images that represent the parameters are often related to physical properties that naturally follow certain continuity in their formation. To illustrate how Tikhonov regularization works, consider the local operator that approximates the first order directional derivative for entry 𝑢 -,K of the parameter vector 𝐮 (defined on a two-dimensional 𝑥−𝑦 coordinates), i.e., (∇𝐮) -,K ≈× 𝑢 -,K − # % (𝑢 -¶#,K +𝑢 -¾#,K ) 𝑢 -,K − # % (𝑢 -,K¶# +𝑢 -,K¾# ) Ø. This notation is used to demonstrate the central point finite difference approximation to the first order directional derivative. Minimizing ∫‖∇𝐮‖ % % d𝐕≈Δ×∑ Ü(∇𝐮) -,K Ü % % -,K , where Δ denotes a small spatial perturbation and 𝐕 represents the spatial integral volume/domain, corresponds to solutions that exhibit smooth transition (minimum second order spatial derivative of grid property values) from Chapter 1: Introduction 20 𝑢 -,K to its neighbouring grid cells. With the first-order Tikhonov regularization, the inverse problem objective function takes the form: min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % +𝜆 % ∫‖∇𝐮‖ % % d𝐕 (1.20) For discrete problems, the spatial derivatives and the regularization function can be written as a linear operator 𝐖; that is, the regularization term can be simplified to ∫‖∇𝐮‖ % % d𝐕= ‖𝐖𝐮‖ % % , Total-Variation: Total variation [Rudin et al., 1992; Lee and Kitanidis, 2013; Gholami, 2015] is a regularization technique that is used to promote piece-wise smooth solutions; that are solutions that are generally smooth but can have discontinuity in certain parts. This form of regularization is implemented by applying a milder penalty to the derivatives of the parameters. In this case, the 𝑙 # -norm (instead of the 𝑙 % -norm) of the first-order derivative of the solution is minimized. The 𝑙 # -norm is less sensitive to larger entries and tends to tolerate discontinuity, which is often exhibited through large directional derivatives. In implementing the total variation, one seeks to minimize the following regularized least-squares form: min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % +𝜆 % ∫ Þ ∑ »∇ K 𝐮½ 𝟐 K d𝐕 (1.21a) , where the index 𝑗 is the number of directional derivatives, and ∇ K 𝐮 calculates the derivative of 𝐮 at a direction specified by index 𝑗. The total variation regularization can be implemented for any specified direction. In its standard implementation, the directions 𝑗 are the three Cartesian coordinates, i.e., 𝑥, 𝑦, and 𝑧. In this case, the formulation in (1.21a) is rewritten as: min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮)) ´ % % +𝜆 % ∫ Þ(∇ : 𝐮) 𝟐 +»∇ ; 𝐮½ 𝟐 +(∇ à 𝐮) 𝟐 d𝐕. (1.21b) Chapter 1: Introduction 21 1.3.2. Compressive Transforms Compressive transforms are used to compactly represent/approximate the most salient features of images and signals. In inverse problems, it may be possible to apply a transformation to the original parameters to achieve an effective low-rank representation. The main steps in transform-domain low-rank representation include (i) choosing an appropriate transformation (expansion functions), (ii) performing the forward transformation to obtain the transformed representation of the original parameters, (iii) identifying and retaining only significant coefficients of the transformed representation, and (iv) back transformation to the original domain using only the retained coefficients. The compressive nature of the transforms implies that the transformed representation is sparse, that is, very few of the transformed coefficients are significant. In this section, we present some of the important compressive transforms that have been used for parameterization. The choice of an appropriate basis to compactly represent model parameters is intimately related to the prior knowledge about the characteristics of the underlying properties of the model, e.g., existing correlation/connectivity structures or possible discontinuous features. In fact, when specific prior models are available, one could construct a specialized transformation that is learned from those models and training data. Examples of specialized transform basis functions that are learned from prior information include the Principal Component Analysis (PCA) [Jolliffe, 1986] and the 𝑘-SVD [Aharon et al., 2006; Khaninezhad et al., 2012] for sparse dictionary learning. In many situations, however, explicit prior models or training data may not be available. In those cases, generic transforms that are used for image compression provide an attractive option for parameterization. Examples of popular transformation methods for parameterization include Fourier transform [Bracewell, 1986] and its practical and efficient variation known as the Discrete Cosine Transform (DCT) [Ahmed et al., 1974; Jafarpour and McLaughlin, 2008; Jafarpour and Chapter 1: Introduction 22 McLaughlin, 2009] and the Wavelet transform [Mallat, 1989; Talukder and Harada, 2010; Jafarpour, 2011; Golmohammadi et al., 2015]. Generic Compressive Transforms: Generic compressive transforms consist of 𝑛 linearly independent basis vectors in 𝐑 ) that can be used to span any length-𝑛 vector. While a complete representation of a length-𝑛 parameter vector is possible in a compressive basis, the objective is to obtain an approximate representation by only using 𝑘 ≪𝑛 significant basis elements. Suppose that the set {𝝓 - :𝑖 =1,…,𝑛} contains all the basis vectors that are needed for perfect representation in 𝐑 ) , and a subset 𝚽= {𝝓 - :𝑖 =1,…,𝑘}, with no particular order, provides an acceptable approximation for a vector of interest 𝐮. Selection of the subset with 𝑘 elements depends on the original vector to be approximated and the choice of basis used. Fourier basis functions describe a signal in terms of its frequency content. In this case, if a 𝑛= ∏𝑛 - dimensional signal 𝐮 is defined in 𝐑 ) D ×…×) â , the FT at frequency »𝑓 # ,…,𝑓 ) â ½ will be calculated as: 𝐯»𝑓 # ,…,𝑓 ) â ½=∑ …∑ 𝐮»𝑖 # ,…,𝑖 ) â ½𝑒 ¶-%ã(∑ ä å æ å ç å ç â åèD ) ) â ¶# - ç â 1G ) D ¶# - D 1G (1.22) The back transformation that returns 𝐮 can be expressed as: 𝐮»𝑖 # ,…,𝑖 ) â ½= # ) ∑ …∑ 𝐯»𝑓 # ,…,𝑓 ) â ½𝑒 -%ã(∑ æ å ä å ç å ç â åèD ) ) â ¶# n ç â 1G ) D ¶# n D 1G (1.23) If the main features in 𝐮 are captured by low-frequency elements, which is especially true for smooth and correlated vectors, one could approximate 𝐮 by truncating the basis elements with frequencies exceeding a certain threshold. The (𝑛−𝑘) coefficients corresponding to frequencies higher than the specified threshold are then set to zero. Chapter 1: Introduction 23 The DCT is a special case of the Fourier transform that only considers the real part of it. The DCT transformation is carried out by keeping the real part of 𝑒 ¶-%ã(∑ ä å æ å ç å ç â åèD ) , which is cos{2𝜋(∑ n å - å ) å ) â [1# )}. Hence, the transformation takes the form: 𝐯»𝑓 # ,…,𝑓 ) â ½=∑ …∑ 𝐮»𝑖 # ,…,𝑖 ) â ½cos{2𝜋(∑ n å - å ) å ) â [1# )} ) â ¶# - ç â 1G ) D ¶# - D 1G (1.24) Similar to Fourier transform, an approximation of the original signal 𝐮 is obtained by truncating the frequencies above a certain threshold (that is, by setting the coefficients corresponding to those elements to zero). Fourier-based transforms can only represent information either in space or frequency domains. This means that once a signal is transformed to Fourier domain, it loses the spatial information and vice versa. Hence, the Fourier basis elements are global and do not encode local information. Unlike the Fourier transform, the basis elements in Wavelet transform contain both space and frequency information. This implies that each basis vector is localized in space and represents a certain frequency content. Therefore, for any spatial location, one can retain (or truncate) specific frequency components that are significant (insignificant). Figures 1.3(a) and 1.3(b) show 64 sample basis elements for the DCT and Haar wavelet transforms in 𝐑 XW×XW , respectively. As can be verified, the basis images for the DCT transform are not localized in space while those for the discrete Haar wavelet clearly exhibit localized patterns. While generic compressive transforms have useful properties that make them very desirable under the right circumstances, in applications were prior knowledge about the solution is available, one may be able to construct more specific transforms with better performance. Learned Compressive Transforms: Pre-constructed compressive bases achieve excellent compression performance in representing smooth and piecewise smooth images. Hence, Chapter 1: Introduction 24 for most natural images only a small subset of the transformed coefficients is sufficient to capture the main features of an image. This implies that most natural images have sparse approximations in these compressions transform domains. However, compressed representation of complex image features with generic transforms may require too many coefficients, which is not desirable for parameterization. Furthermore, in ill-posed inverse problems, the limitation in data resolution typically does not allow for estimation of high-resolution features. Hence, a more sophisticated approach is needed to capture complex features in certain applications. As an example, in subsurface modelling where extensive efforts go into data collection and site surveys to construct prior models, more specialized transform-domain representations that are tailored to the information in the prior knowledge is more suitable. Figure 1.3. Examples of generic (pre-computed) compressive transform bases: (a) sample low-frequency basis elements from the DCT basis; (b) samples basis elements from the discrete Haar wavelet. Example is shown for a 𝟔𝟒×𝟔𝟒 two-dimensional image. The basis elements are separated with black boxes. Principal Component Analysis (PCA): The PCA is widely used for dimensionality reduction in a wide range of applications. The PCA basis functions capture the main variability and structures in multivariate datasets, which can be exploited in compactly representing/approximating them with minimum loss of information. When the PCA is applied to the covariance matrix of a stochastic process, it leads to its diagonalization and can be used to Chapter 1: Introduction 25 define a new (often more desirable) uncorrelated random process. In this case, the PCA provides an orthogonal transformation matrix with decorrelating power that contains, in its columns, the eigenvectors of the covariance matrix. The strong decorrelating property of the PCA is advantageous in eliminating existing correlations (redundancies) and reducing dimensionality. In fact, the PCA sets the standard for dimension reduction with linear transforms as it gives the minimum error (in the least-squares sense) in approximating a 𝑛-dimensional signal with 𝑘 ≪ 𝑛 basis elements. The parameterization with PCA follows the same format as in Equation (1.16), i.e., 𝐮= 𝚽𝐯= ∑ 𝝓 - 𝑣 - c -1# where the basis functions 𝝓 - ’s are the eigenvectors of covariance matrix of 𝐮. Denoting a 𝑛×1-dimensional random variable as 𝐮 and its covariance matrix as 𝐂 𝐮 , the eigenvalue decomposition of the covariance matrix provides the following diagonalization form: 𝐂 𝐮 =𝚽𝚲𝚽 ð (1.25) , where 𝚲 is a diagonal matrix (with eigenvalues of 𝐂 𝐮 in its diagonal entries) and 𝚽 is an orthonormal (transformation) matrix that has the eigenvectors of 𝐂 𝐮 in its columns. If sample realizations of 𝐮 are collected into a data matrix 𝐔 )×ñ =[𝐮 # …𝐮 - …𝐮 ñ ], the sample covariance matrix 𝐂 𝐮 can be computed as: 𝐂 𝐮 = # ñ¶# (𝐔−𝐮 ò𝟏 #×ñ )(𝐔−𝐮 ò𝟏 #×ñ ) ð (1.26) , where 𝐮 ò denotes the mean of 𝐔, that is 𝐮 ò = # ñ ∑ 𝐮 - ñ -1# . The term # √ñ¶# (𝐔−𝐮 ò𝟏 #×ñ ) can be expressed in terms of its singular value decomposition (SVD) as: # √ñ¶# (𝐔−𝐮 ò𝟏 #×ñ )=𝚿𝚺𝐕 ð (1.27) , where 𝚿 and 𝐕 are orthonormal matrices that contain the left and right singular vectors of # √ñ¶# (𝐔−𝐮 ò𝟏 #×ñ ), respectively. Combining (1.26) and (1.27) yields: Chapter 1: Introduction 26 𝐂 𝐮 =(𝚿𝚺𝐕 ð )(𝚿𝚺𝐕 ð ) ð = 𝚿𝚺𝐕 ð 𝐕𝚺𝚿 ð = 𝚿𝚺 % 𝚿 ð (1.28) , which reveals 𝚿= 𝚽; that is, the left singular vectors of # √ñ¶# (𝐔−𝐮 ò𝟏 #×ñ ) are identical to the eigenvectors of the sample covariance matrix 𝐂 𝐮 . This relation shows that for high-dimensional variables the PCA transformation matrix can be more efficiently computed by obtaining the left singular vectors of # √ñ¶# (𝐔−𝐮 ò𝟏 #×ñ ), that is the matrix containing scaled and mean-removed sample realizations. One could therefore see the correspondence between the left singular vectors of the sample data matrix and the eigenvectors of the data covariance. It is relatively straightforward to show that amongst all 𝑘-term (rank-𝑘) linear approximations of U the expansion using its 𝑘 leading left singular vectors (denoted as 𝚽 )×c ) gives the smallest root-mean-square- error (RMSE). Sparse Dictionary Learning (k-SVD): While PCA offers a very efficient decorrelating basis for compact representations, it is a linear transform in which the significant basis elements are predetermined and fixed. Recent developments in sparse signal processing have led to growing interest in sparse dictionary learning algorithms. A major distinction between PCA and sparse dictionaries is in the way the significant elements are selected. In sparse reconstruction, the significant elements are neither predetermined (ranked) nor fixed; rather, they must be identified independently for each instance of the parameter vector. For construction of sparse dictionaries from a training dataset with 𝐿 elements, 𝐔 )×ñ = [𝐮 # …𝐮 - …𝐮 ñ ], one can solve either of the following optimization problems [Aharon et al., 2006; Khaninezhad et al., 2012]: min [𝐯 D 𝐯 E …𝐯 ÷ ],𝚽 ||𝐯 - || G 𝑠.𝑡., ∑ ||𝐮 - −𝚽 𝐯 - || % % ñ -1# ≤𝜖 , 𝑖 ∈1:𝐿 (1.29a) min [𝐯 D 𝐯 E …𝐯 ÷ ],𝚽 ∑ ||𝐮 - −𝚽 𝐯 - || % % ñ -1# 𝑠.𝑡., ||𝐯 - || G ≤ 𝑆, 𝑖 ∈1:𝐿 (1.29b) , where 𝚽∈ 𝐑 )×c , and ||𝐯 - || G refers to the number of non-zero entries in 𝐯 - (i.e., 𝑆). Equations Chapter 1: Introduction 27 (1.29a) and (1.29b) are alternative formulations for dictionary learning. In Equation (1.29a), a maximum allowable representation error is used as a constrained while the level of sparsity for each realization of the prior models is minimized. In Equation (1.29b), the level of sparsity is constrained while minimizing the approximation error (in the least squares sense) in representing each realization. Finding the exact solution to the problems in (1.29) is intractable. However, heuristic methods such as the 𝑘-SVD algorithm provide more practical approximate solutions. We note that in our notation 𝑆 refers to the sparsity level (number of significant elements retained in the approximation), and 𝑘 is the dictionary size (total number of dictionary elements), with 𝑆 ≪ 𝑘. We briefly describe the 𝑘-SVD algorithm as one approach to learn sparse geologic dictionaries from a training model dataset (more details can be found in the original publications [Aharon et al., 2006]). The 𝑘-SVD algorithm takes its name from the 𝑘-means clustering algorithm. While the latter computes 𝑘 means at each iteration, the former applies 𝑘 SVD operations at each iteration. The 𝑘-SVD algorithm constructs a dictionary 𝚽 with size 𝑛×𝑘 from 𝐿 samples of 𝐮 - , while ensuring that the projection of each 𝐮 - on 𝚽 is 𝑆-sparse, a problem that is formalized in Equation (1.29). To construct 𝚽 and 𝐕 from 𝐔, the 𝑘-SVD algorithm iteratively solves the problem specified in (1.29). Each iteration of the algorithm consists of two steps: Step I, sparse coding, is used to find the sparse representations for the entire prior library (i.e., 𝐕) by fixing 𝚽; and Step 2, dictionary updating, finds a new 𝚽 after fixing the sparse representation 𝐕. While no formal convergence proof has been given for this algorithm, numerical experiments show that it is generally robust [Aharon et al., 2006]. It is important to note that the 𝑘-SVD algorithm is computationally demanding, especially when the dimension of the dictionary increases. Each iteration of the 𝑘-SVD algorithm requires 𝑘 Orthogonal Matching Pursuit (OMP) [Tropp and Chapter 1: Introduction 28 Gilbert, 2007] sparse coding and 𝑘 rank-one SVD operations, both computationally expensive operations. However, the computations related to construction of a sparse dictionary are performed offline and can be considered as part of the training step. In addition, the original 𝑘-SVD algorithm is typically applied to small image segments. Figure 1.4. Examples of learned expansion images using prior training data: (a) prior (training) models used for constructing linear expansion images; (b) 𝑺=𝟐𝟎 leading PCA basis elements; (c) sample 𝒌-SVD dictionary elements with 𝑺=𝟐𝟎 and 𝒌=𝟐𝟎𝟎. Examples are shown for 𝒏 𝒙 ×𝒏 𝒚 = 𝟏𝟎𝟎×𝟏𝟎𝟎 two-dimensional model. The images are separated using white borders. Figure 1.4 shows an example of dictionary learning in geosciences applications. Figure 1.4(a) depicts samples from the training data that represent two-dimensional fluvial channel configurations (generated using SNESIM conditional simulation algorithm [Strebelle, 2002]). In Chapter 1: Introduction 29 this figure, the red regions represent fluvial channels that are composed of sandstone with very high permeability values while the blue regions describe shale or mudstone with very low permeability. The high permeability values manifest their importance in fluid flow and displacement patterns by creating preferential flow patterns within the channel regions. Figure 1.4(b) presents the first 𝑘 =20 PCA basis (Eigen) images that correspond to this training data, and Figure 1.4(c) shows the corresponding sample elements from the 𝑘-SVD dictionary, using 𝑆 = 20 and 𝑘 =200. 1.4. Uncertainty in Prior Geologic Scenarios In subsurface inverse modeling, an important implication of adopting a geologic scenario prior to inversion is that dynamic flow and monitoring data are not used to constrain it, resulting in an opportunity loss to potentially correct geologic scenarios that are not supported by dynamic data. Therefore, an interesting and important problem is to incorporate the flow data into prior geologic scenario selection [Feyen and Caers, 2006; Suzuki and Caers, 2006; Jafarpour and Tarrahi, 2011; Riva et al., 2011; Khodabakhshi and Jafarpour, 2013; Khaninezhad and Jafarpour, 2014; Rousset and Durlofsky, 2014; Golmohammadi and Jafarpour, 2016]. Figure 1.5 demonstrates an example of geologic scenarios uncertainty, where three different prior information are provided to model and constrain the connectivity types. When a set of geologic scenarios are proposed as prior knowledge, one could implement an inversion process using each scenario and generate a set of feasible solutions depending on the prior geologic scenario. However, this approach requires multiple model calibration runs, which is not feasible when several possible scenarios are proposed. Additionally, this method does not provide any insight into possible modification of the proposed models, for example when the true geologic scenario is not consistent with any of the Chapter 1: Introduction 30 proposed models or when multiple prior models should be combined to capture the existing features. An alternative approach that is proposed in this thesis is to develop an inversion formulation that can simultaneously incorporate several prior geologic scenarios and, in addition to inferring model parameters, can automatically select geologic scenarios that are supported by inversion data. Figure 1.5. Three prior scenarios with meandering (left), intersecting (middle), and straight (right) channel features; The training images are proposed to model the geologic scenarios, and each of them suggests specific types of connectivity. One of the main contributions of this thesis is developing an inference framework for simultaneous parameter estimation and prior geologic scenario identification to address an important challenge in subsurface flow modeling. The developed method, which is mainly constructed based on idea of group-sparsity regularization, is capable of incorporating several uncertain prior geologic scenarios, and selecting the ones that have significant contribution to reproducing the observed dynamic data. To formulate an inversion framework to identify consistent prior geologic scenarios, we take advantage of the group-sparsity regularization using mixed 𝑙 # /𝑙 % norm [Eldar et al., 2010; Jenatton et al., 2010; Golmohammadi and Jafarpour, 2016]. The mixed 𝑙 # /𝑙 % -norm penalty provides an effective regularization form to recover block-sparse signals [Stojnic et al., 2009; Eldar et al., 2010; Zhang and Rao, 2013; Fang et al., 2015]. Block-sparse signals are a subset of sparse Chapter 1: Introduction 31 signals whose components are classified into predefined groups of variables with the following properties: (1) all the elements within a group are expected to be either collectively active (non- zero) or inactive (zero); (2) only a minimum number of groups can be active (hence the group- sparse property). In promoting group-sparsity with 𝑙 # /𝑙 % -norm, the 𝑙 % -norm is applied to elements within each group (to represent the contribution of that group) while the 𝑙 # -norm operates across the groups to promote sparsity (minimum number of active groups). To screen geologic scenarios with group-sparsity, we apply the 𝑙 % -norm to first quantify the contribution of each group (geologic scenario) and then apply the 𝑙 # -norm across the groups to promote sparsity. The grouping is achieved through a truncated SVD (TSVD) parameterization of model realizations for each geologic scenario prior to model calibration. To this end, each proposed geologic scenario is first used to generate an ensemble of model realizations (e.g., using geostatistical simulation techniques). The simulated models within each group are used to construct a TSVD approximation basis for each scenario. The 𝑙 % -norm of the TSVD coefficients within each group quantifies the contribution of that group. Once the contribution of each group is computed, group-sparsity is promoted by minimizing the 𝑙 # -norm of the quantified contributions of the groups, which is known to have sparsity-inducing property. This ensures that only groups (geologic scenarios) that have significant contributions to the solution will be retained and irrelevant groups are eliminated. This formulation is implemented by minimizing a regularized inversion objective function that consists of a dynamic data mismatch term and a mixed 𝑙 # /𝑙 % - norm regularization term to remove inconsistent geologic scenarios. Further implementation details and important properties, applications and behavior of the developed inversion techniques are presented and discussed (mainly) in Chapter 2, 3 and 6 of the thesis. Chapter 1: Introduction 32 1.5. Geologic Feasibility Constraints Geostatistical methods have been used to constrain the solution of subsurface inverse problems. While the classical variogram-based (or two-point) geostatistical methods are appropriate for simulating Gaussian processes, they are not sufficient for representing more complex connectivity patterns. Object based simulation techniques [Koltermann and Gorelick, 1996; Lopez, 2003], such as marked point processes [Deutsch and Wang, 1996], are introduced for modeling complex facies patterns. While geologically intuitive and appealing, these methods are quite cumbersome for data conditioning, primarily due to the lack of flexibility in morphing existing objects. Modern geostatistical modeling techniques that incorporate higher order statistics, a.k.a., multiple-point statistics (MPS), have been developed to generate complex geological patterns from a given training image [Guardiano and Srivastava, 1993; Caers, 2003; Arpat and Caers, 2007; Comunian et al., 2012; Strebelle, 2012; Zhou et al., 2012; Pyrcz and Deutsch, 2014]. The training image (three different training images are provided in Figure 1.5) is a conceptual model of connectivity and can be provided by geologists after integrating/interpreting available measurements (core data, well logs, and outcrop), and in some cases using process-based geo-modeling [Caers and Zhang, 2004; Hu and Chugunova, 2008; Michael et al., 2010; Renard and Allard, 2013; Mariethoz and Caers, 2014]. For sequential simulation, using a user specified template, possible connectivity patterns and their frequencies in the training image are used to generate local conditional probabilities that are stored in a large search tree. For simulation at each unsampled grid cell, the local data patterns in the neighboring cells are used to locate the appropriate conditional probability in the search tree to sample from. Although conditioning MPS simulation on hard (spatial measurements) and soft (such as seismic measurements) data is straightforward, incorporating the training image patterns in Chapter 1: Introduction 33 inversion processes is nontrivial. The Probability Perturbation Method (PPM) [Caers and Hoffman, 2006] and Probability Conditioning Method (PCM) [Jafarpour and Khodabakhshi, 2011] are two direct simulation methods that attempt to generate flow-conditioned facies realizations from training images. In indirect conditioning methods, unconditional facies models are generated and updated using a model calibration method. Indirect methods encounter to difficulties in estimating complex geologic facies patterns in (i) preserving the complex connectivity patterns during model updating, and (ii) honoring solution discreetness. Fulfilling each of these requirements by itself is mathematically challenging and cannot be achieved using simple parameterization and regularization techniques. Several studies have focused on preserving higher order statistics in the subsurface flow inverse problems. Nonlinear variants of the PCA, including Kernel PCA [Sarma et al., 2007; Sarma et al., 2008; Vo and Durlofsky, 2016] and O- PCA [Vo and Durlofsky, 2014; Vo and Durlofsky, 2015] are examples in which the goal has been to fulfill the aforementioned requirements. Another set of techniques that have been used to solve the problem are discrete to continuous transformations, including the level set method [Cardiff and Kitanidis, 2009; Xie, 2012] and distance transform [Elahi and Jafarpour, 2017]. While these methods can produce discrete solutions, they do not have a mechanism to preserve complex patterns that exist in the training image. In the inverse modeling of channelized environments, connectivity patterns impose nonconvexity constraints on the feasible set; therefore, relying on conventional parameterization and regularization techniques, which typically carry the convexity assumptions, is not generally appropriate for these type of problems. In presence of prior geologic scenarios with discrete connectivity patterns, although the size of the feasible set is relatively smaller compared to the Chapter 1: Introduction 34 sub-space defined by a linear parameterization, e.g., PCA, honoring the characteristics of the feasible set is still a challenging problem. Discrete tomography [Herman et al., 2012] is used to estimate model parameters that take discrete values from a predefined set 𝑆 ={𝑙 - :𝑖 =1,…,𝑘}. A wide range of reconstruction algorithms are developed for discrete tomography problem. A general approach is to simultaneously promote structural attributes (e.g., Tikhonov) and discreteness through two different regularization terms [Schüle et al., 2005; Lukić, 2011; Khaninezhad and Jafarpour, 2017]. In presence of the prior knowledge about the expected shapes of the discrete objects, several algorithms, including the Discrete Algebraic Reconstruction Technique (DART) [Batenburg and Sijbers, 2011], are introduced, where initially an approximate continuous solution is sought, which is then projected onto the prior set to enhance the quality of the reconstruction. Markov random field (MRF) [Zhang et al., 2001] and conditional random field (CRF) [Lafferty et al., 2001] theories are introduced to model the joint probability distribution of a set of discrete variables. In CRF, which is a generalized version of MRF, posterior distribution of a set of discrete variables is adopted into a predefined or learned graph structure. Then, an exponential function is used to model the posterior distribution of the discrete variables that are connected to each other in the graph structure, namely a clique. In CRF, a set of input/output learning samples, where inputs are typically the continuous variables and outputs are the corresponding discrete ones, are applied to learn (infer) the parameters of the posterior function; thus, CRF is typically viewed to be a supervised machine learning approach for learning the posterior distribution of a set of discrete variables. The method of potential functions, which is applied to promote discreteness in discrete tomography problem, is essentially derived from CRF theory by assuming simplifying assumptions, such as independence between the grid-block values on the adopted Chapter 1: Introduction 35 graph structure. The CRF and MRF theories are widely applied in image segmentation problems; however, the complexities, which are basically consequence of the form of the posterior function and the graph connectivity structure, make it difficult to apply these techniques to nonlinear inverse problems. In this thesis, we develop two approaches for promoting feasibility constraints in solving subsurface inverse problems. In the first approach, which is discussed in Chapter 4, we use potential functions along with connectivity preserving constraints to honor feasibility constraints. The proposed method is able to force pixel level discreteness through potential functions; being combined with the connectivity preserving constraints, the method promotes global connectivity patterns and feasibility. In Chapter 5 of the thesis, we develop a framework for estimating complex facies models by learning the mapping from parameterized space onto the feasible set, which is done through a machine learning technique. The introduced inversion method in this chapter is able to facilitate promoting feasibility constraints by learning complex connectivity patterns and conveying them to the inversion process. For this purpose, we define a feasible set for model parameters that includes a large number of prior model realizations that summarize the expected spatial statistics of the solution. To implement the feasibility constraints, we formulate a regularized least-square formulation, in which the regularization term imposes the feasibility constraints. Then, we split the objective function into two sequential optimization sub-problems (stages). In the first sub-problem, an approximate solution is obtained on the parameterization space. In the second stage, a machine-learning based mapping is introduced to convert the approximate solution (obtained in stage 1) to the most relevant candidate from the feasible set. Chapter 1: Introduction 36 1.6. Scope of Work and Dissertation Outline As discussed previously, (i) “uncertainties in geologic scenarios” and (ii) “feasibility constraints” are among those important challenges that need to be addressed in subsurface inverse problems. Although, it is shown that neglecting geologic scenarios uncertainty can result in undesired estimation biases or even instabilities in solving subsurface inverse problems, it has become a tradition to build subsurface inversion techniques based on the assumption of single geologic scenario. One of the main objectives of this thesis is to take the geologic scenarios uncertainty into account and develop systematic inversion methods that are able to decrease this type of uncertainty during the inference process. Considering “feasibility constraints”, which we define them as conditions that characterize the solution and its validity, is another important topic that we cover in this thesis. For that purpose, we develop an inversion formulation that consists of two different versions of same parameters, which are (i) the feasible solution, and (ii) its approximation on a predefined parameterization space (e.g., PCA). We take advantage of alternating direction algorithm to iteratively solve for the solution on the parameterization space and the feasible set. In Chapter 2, we focus on prior knowledge about the sparsity structure of the solution and adopt group-sparsity regularization for constraining feasibility in ill-posed linear and nonlinear inverse problems. The group-sparsity regularization is a special type of sparsity-promoting functions, and it is applied to honor sparsity where active elements are assembled in predefined groups or blocks (i.e., in block-sparse signals, only a small number of groups is needed to represent the solution). In this chapter, we introduce two schemes for grouping the variable: one based on the Wavelet tree structure, and another using sparse PCA (SPCA) method. The tree structure of the multiresolution Wavelet decomposition establishes parents-children relationship across the scales, which allows for various forms of group definitions. In the second approach, a set of prior Chapter 1: Introduction 37 model realizations are used, along with sparse PCA algorithm, to learn the grouping. Several examples are presented to show that the group-sparse methods outperform the standard 𝑙 # -norm- based sparsity regularization when sparse structure is properly identified and enforced. In the next chapter, we show that a particularly useful property of group-sparsity is its ability to discriminate against multiple prior geologic scenarios that account for the uncertainty in the knowledge of geologic continuity. To this end, Chapter 3 is devoted to developing a systematic framework for simultaneously identifying relevant geologic scenarios and parameter estimation. The method generates several hundreds of realizations from each geologic scenario and uses them to construct a TSVD basis that can separately parameterize the models within each group. By combining the TSVD bases for each group a large hybrid basis is generated that is capable of compactly representing models from any group, or a model with features that cannot be found in a single group, but may be represented using multiple groups. However, since the groups are vastly different from each other only a small number of groups are needed to accurately represent a given model (or the inversion solution). This property translates into a group-sparse behavior that can be effectively induced using a mixed 𝑙 # /𝑙 % -norm regularization term, which is minimized concurrently with a data mismatch objective function in an inverse problem. In Chapter 4, we propose a novel formulation for estimation of complex geologic facies by imposing appropriate constraints to recover plausible solutions that honor the (i) spatial connectivity, and (ii) discreteness of facies models. To incorporate prior connectivity patterns, we learn plausible geologic features from available training models. This is achieved by learning spatial patterns from training data, e.g., 𝑘-SVD sparse learning or the traditional Principal Component Analysis (PCA). To impose solution discreteness, we introduce discrete regularization penalty functions (using potential functions) that encourage piecewise constant behavior in the Chapter 1: Introduction 38 solution while minimizing mismatch between observed and predicted data. We solve the resulting regularized least-squares minimization problem by invoking variable splitting to arrive at a flexible and efficient gradient-based alternating directions algorithm. Numerical results show that incorporating facies discreteness as regularization leads to geologically consistent solutions that improve facies calibration quality. In Chapter 5, we introduce a machine learning algorithm to incorporate feasibility constraints, e.g., forms of connectivity patterns, in estimating subsurface parameters. This goal is achieved by splitting the problem formulation into two iteration steps: a parameterized approximation of the solution that is obtained by solving a regularized least-squares inversion (while maintaining the expected connectivity of the patterns), followed by a machine learning- based mapping of the parameterized solution onto the feasible set defined by prior models. The second step involves a machine learning approach that uses offline training to implement the mapping. The off-line learning process uses the 𝑘-Nearest Neighbor (𝑘-NN) algorithm to construct local pattern (feature) vectors and compare them with the feature vectors in the training dataset. For each spatial template, the 𝑘 most similar feature vectors in the learning dataset are selected, and their corresponding label vectors (i.e., multivariate discrete patterns) are identified and stored. Once all local patterns are scanned and processed using a defined template size, an aggregation step is applied on the overlapping templates to collectively incorporate the multi-point statistics patterns in assigning feasible values to each grid-block. Chapter 6 addresses “feasibility constraints” and “geologic scenarios uncertainty”, simultaneously. Considering “uncertainty in geologic scenarios” is the major difference between the two methodologies developed in this chapter and Chapter 5. In Chapter 5, we assume that the prior geologic scenario does not contain uncertainty, and the inversion method is developed based Chapter 1: Introduction 39 on the assumption of a single geologic scenario. However, the proposed formulation in Chapter 6 considers uncertain geologic scenarios and adopts group-sparsity regularization (similar to Chapter 3) for the purpose of geologic scenario selection. To impose feasibility constraints, we take advantage of the formulation developed in Chapter 5 of the thesis. An alternative approach to address “feasibility constraints” and “geologic scenarios uncertainty” is to break the original problem into two independent subsequent subproblems, which are: (i) adopting the approach from Chapter 3 to select the relevant geologic scenarios, and (ii) limiting the feasible set to the selected scenarios and using the methodology introduced in Chapter 5 to address “feasibility constraints” based on the refined geologic scenarios. Chapter 7 includes a summary, conclusions, and recommendations for future work in the area of subsurface inverse modeling in presence of “uncertainty in geologic scenarios” and “feasibility constraints”. The results of this work are published in “The IMA Volumes in Mathematics and its Applications”, and Journals of “Water Research Resources” and “Advances in Water Resources”. We have presented our work in several conferences, workshops and meetings including “American Geophysical Union: Fall Meeting (2013 and 2017)”, “Society of Industrial and Applied Mathematics: Annual Meeting (2016)”, “Society of Industrial and Applied Mathematics: Conference on Mathematical & Computational issues in the Geosciences (2016)” , “Institute for Mathematics and its Application: Workshop on Frontiers in PDE-constrained Optimization (2016)”, “Institute for Applied & Pure Mathematics: Workshop on Data Assimilation, Uncertainty Reduction, and Optimization for Subsurface Flow (2017)”, “Society of Industrial and Applied Mathematics: Conference on Uncertainty Quantification (2018)”, and “Society of Petroleum Engineering, West regional Meeting (2018)”. Chapter 1: Introduction 40 Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 41 CHAPTER 2 GROUP-SPARSE FEATURE LEARNING FOR PARAMETERIZATION PURPOSES Sparse representations provide a flexible and parsimonious description of high-dimensional model parameters for reconstructing subsurface flow property distributions from limited data. While sparsity alone is a form of prior information, in many cases it is possible to identify and impose additional structural constraints on the sparse solution, e.g., by exploiting possible correlations among sparse elements. Group-sparsity regularization is designed to take advantage of possible relations among the entries of unknown sparse parameters. This relation is either learned or expected depending on the available prior information. In this chapter, group-sparsity is implemented by transforming the spatial parameters of the subsurface flow model into groups of parameters with the following properties: (i) the elements within each group are either collectively active or inactive; and (ii) only a small subset of groups is needed to approximate the spatial parameters of interest. These properties give rise to a particular structural regularization in solving the inverse problem, known as group-sparsity, in which sparsity in promoted only across the groups, and not within each group. Two implementations of group-sparsity regularization are presented: one based on the expected multi-resolution tree structure of Wavelet decomposition, and another learned from explicit prior model realizations using sparse principal component Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 42 analysis (SPCA). Locally structured basis functions in the Wavelet decomposition are rarely able to efficiently estimate the global connectivity and local accuracy simultaneously in the inversion process. Group-sparsity regularization imposes further constraints on these basis functions to not only be locally efficient but also to be able to capture the global connectivity in the parameter field. The principal components in the SPCA case are used to classify the parameters of the inverse problem into groups with specific connectivity features, and group-sparsity takes advantages of the classified features in each group to capture the connectivity more efficiently compared to regular sparsity-based regularizations. Several numerical experiments are presented to demonstrate the advantages of group-sparsity in presence of the low-resolution data where other regular regularizations are less efficient in parameter estimation. 2.1. Sparse Parameter Estimation Selection of a small subset of significant dictionary elements, i.e., entries of 𝐯 in 𝐮=𝚽𝐯, out of a large set is posed as a sparse reconstruction problem. A signal 𝐯∈𝐑 c is considered sparse if a large fraction of its entries is (approximately) zero. A signal is 𝑆-sparse if it has at most 𝑆 nonzero entries. A signal that may not appear as sparse (in space or time) may have a sparse representation in a different (transform) domain. For instance, in many cases a parameter vector 𝐮 may not be sparse, but it can have a sparse representation 𝐯 after transformation through 𝚽, that is 𝐮=𝚽𝐯. Depending on the application, identification of significant dictionary elements can be based either on complete or incomplete knowledge about the unknown parameters. In inverse problems, often limited measurements are available for identification of the significant dictionary elements and estimation of their corresponding expansion coefficients. In image compression, compressed sensing (also called compressive sensing or compressive sampling) [Donoho, 2006; Baraniuk, Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 43 2007; Candès and Wakin, 2008], is a relatively new paradigm that provides an alternative to the well-known Shannon sampling theory. Compressed sensing adopts sparsity as prior knowledge about signals, while Shannon theory was designed for frequency band-limited signals. Widespread application of compressed sensing is in part due to the universality of the sparsity property that is encountered in a wide range of natural phenomena (especially images). In many cases, sparsity may not be immediately apparent and certain manipulation (e.g., transformations) of the original parameters may be necessary for their sparsity to emerge. “For instance, natural images that have various elements with spatial correlations in them do not exhibit sparsity in the space domain, but are highly compressible and are well-known to have sparse representation in the Wavelet or DCT domains.” One of the main contributor to the widespread application of compressed sensing is its direct application to solving underdetermined inverse problems, such as tomographic image reconstruction. Compressed sensing gives a strong theoretical support and an efficient solution algorithm (under appropriate conditions) for solving otherwise intractable (NP-hard) inverse problems that can have sparse solutions. To recover a sparse solution 𝐯 from a set of linear measurements 𝐝 stu = 𝐆𝐯, one can solve the following minimization problem: min 𝐯 𝐽(𝐯)=‖𝐯‖ G 𝑠.𝑡., 𝐝 stu =𝐆𝐯 (2.1) , where ‖𝐯‖ G is the 𝑙 G -norm (𝑙 G does not conform to norm definition and is loosely referred to as a norm) of vector 𝐯 and represents its cardinality. In this formulation, the optimization problem searches for a solution that reproduces the observed data (constraint) while having a minimum number of non-zero entries (support). The 𝑙 G -norm is not a differentiable function and does not lend itself to solution with standard gradient-based optimization methods. In practice, two types of approximate algorithms have been developed to solve Equation (2.1): (i) greedy pursuit Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 44 algorithms, such as OMP [Tropp and Gilbert, 2007], COSAMP [Needell and Tropp, 2009], IHT [Blumensath and Davies, 2009] or IMAT [Marvasti et al., 2012], and (ii) convex approximations, in which the non-convex 𝑙 G -norm is replaced with its convex relaxations, e.g., 𝑙 # -norm in basis pursuit [Chen et al., 2001] or a heuristically defined exponential norm in [Mohimani et al., 2009]. Compressed sensing derives the solution by replacing the 𝑙 G -norm with 𝑙 # -norm, and it offers conditions under which an exact solution to the original problem is guaranteed (see [Donoho, 2006; Candès, 2008] for details). In this case, the optimization problem takes the form: min 𝐯 𝐽(𝐯)=‖𝐯‖ # 𝑠.𝑡., 𝐝 stu =𝐆𝐯. (2.2) The fundamental importance of this formulation is that it converts the problem from an NP-hard problem to a linear programming problem, which can be solved efficiently. In practice, it can be demonstrated that the 𝑙 ÿ -norms, 0≤ 𝑝≤1, while non-convex, have similar sparsity-promoting property; however, in addition to solution complexity, the mathematical proof and the required conditions for this case are not well understood. In many applications, the conditions required to guarantee exact solution may not be met. A particular example of departure from those conditions, which is often encountered in physical systems, is when the measurements are not adequate, or the measurement operator is nonlinear. In those cases, it may still be possible to exploit the sparsity-promoting property of the 𝑙 # -norm to formulate and solve an inverse problem. The selection property of the 𝑙 # -norm penalty offers an important regularization form t. hat can be used to enhance the solution of nonlinear inverse problems when applicable. When the measurement equations are nonlinear, the resulting sparse reconstruction problem takes the form: min 𝐯 𝐽(𝐯)=‖𝐯‖ # 𝑠.𝑡., ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐯))´ % % ≤𝜎 % (2.3) Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 45 , where 𝐠(𝐯) is a nonlinear operator. Appendix A discusses an iteratively reweighted least-squares (IRLS) algorithm for solving the 𝑙 # -norm regularized minimization problem. When the signal of interest 𝐯 has block-sparse behaviour, the 𝑙 # /𝑙 % -norm can have a superior reconstruction performance compared to standard 𝑙 # -norm. In block-sparse signals, the entries are collected in predefined groups and the sparsity penalty is applied across the groups. In this case, the 𝑙 % -norm is applied to the elements inside each group to quantify the group contribution, and the 𝑙 # -norm operates on the computed 𝑙 % -norm of the groups to impart sparsity. Mathematically, if 𝐯 - ’s are subsets of 𝐯 and ⋃ 𝐯 - - =𝐯, then the 𝑙 # /𝑙 % -norm is defined as ‖𝐯‖ #,% = ∑‖𝐯 - ‖ % - . In this case, the inverse problem formulation minimizes the 𝑙 # /𝑙 % -norm of the solution while honouring the measurement constraint, that is: min 𝐯 𝐽(𝐯)=‖𝐯‖ #,% 𝑠.𝑡., ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐆𝐯)´ % % ≤𝜎 % linear (2.4a) min 𝐯 𝐽(𝐯)=‖𝐯‖ #,% 𝑠.𝑡., ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐯)) ´ % % ≤ 𝜎 % nonlinear (2.4b) Detailed description of group-sparsity regularization and its properties are presented in the next section. In this case, the objective is to select a small set of the groups within 𝐯 that have significant contribution to the solution. In other words, the sparsity is applied to the groups and not individual entries. 2.2. Mixed 𝒍 𝟏 /𝒍 𝟐 -Norm for Promoting Group-Sparsity Let’s consider a length-𝑛 group-sparse signal 𝐯 that is composed of 𝑝 distinct groups of coefficients 𝐯 -1#:ÿ , i.e., 𝐯= [𝐯 # ð ,𝐯 % ð ,…,𝐯 ÿ ð ] ð , where only 𝑧 ≪𝑝 groups can be active (have non- zero coefficients). In general, the groups can have overlapping elements, with union 𝐯. While this Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 46 signal is clearly a sparse signal, it has an additional property, that is sparsity has a group structure, and the groups of coefficients are collectively active or inactive. The group-sparse structure (which, in our application, will be ensured by construction) can indeed provide additional constraining power to help with signal reconstruction from incomplete observations. If the 𝑙 % -norm is used to define the contribution of group (𝑖) coefficients, i.e., 𝑤 - =‖𝐯 - ‖ % =»∑ |𝑣 c | % Ó æ c1# ½ D E , then 𝐰= [𝑤 # ,𝑤 % ,…,𝑤 ÿ ] ð is a sparse vector. Hence, estimating the group-sparse signal 𝐯 can be accomplished by regular sparsity-promoting algorithms for 𝐰, which can be achieved through celebrated 𝑙 # -norm minimization. Hence, the resulting regularization function takes the form: ‖𝐰‖ # =∑ |𝑤 - | ÿ -1# = |𝑤 # |+|𝑤 % |+⋯+&𝑤 ÿ & (2.5) , which after substituting for 𝑤 - = ‖𝐯 - ‖ % , leads to the mixed 𝑙 # /𝑙 % -norm regularization function: ‖𝐰‖ # =‖𝐯‖ #,% =‖𝐯 # ‖ % +‖𝐯 % ‖ % +⋯+Ü𝐯 ÿ Ü % =∑ ‖𝐯 - ‖ % ÿ -1# . (2.6) Minimizing the above mixed 𝑙 # /𝑙 % -norm regularization along with a data mismatch term, i.e., min 𝐯 𝐽(𝐯) = ∑ ‖𝐯 - ‖ % ÿ -1# 𝑠.𝑡., ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝐯)½´ % % ≤𝜎 % (2.7a) min 𝐯 ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝐯)½´ % % + 𝜆 % ∑ ‖𝐯 - ‖ % ÿ -1# (2.7b) promotes group-sparse solutions, which is a stronger condition than 𝑙 # -norm sparsity regularization. Equation (2.7b) is equivalent to the optimization problem in (2.7a) as the constraint is added to the objective function (i.e., 𝐽(𝐯)) using multipliers method (similar to Equation (1.10)). The above formulation assumes that the groups are equally important. If prior information is available to suggest different weights for the groups, and also for each basis elements within, the formulation can be generalized into: Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 47 min 𝐯 𝐽(𝐯) = ∑ 𝜇 - ÿ -1# ‖𝐊 - 𝐯 - ‖ % 𝑠.𝑡., ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝐯)½´ % % ≤𝜎 % (2.8a) min 𝐯 ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝐯)½´ % % + 𝜆 % ∑ 𝜇 - ÿ -1# ‖𝐊 - 𝐯 - ‖ % (2.8b) , where 𝜇 - is the relative weight given to the 𝑖 [\ group and 𝐊 - ð 𝐊 - is a diagonal weight matrix for elements within the 𝑖 [\ group. In the absence of any prior information, the weights 𝜇 - and 𝐊 - ð 𝐊 - can be set to 1 and identity matrix, respectively. Appendix B presents an iteratively reweighted algorithm for solving the optimization problems regularized with group-sparsity regularization. Figure 2.1. Comparison between the balls of group-sparsity and regular sparsity inducing norms in three dimensions, that is for 𝒗=[𝒗 𝟏 𝒗 𝟐 𝒗 𝟑 ]: (a) 𝒍 𝟏 /𝒍 𝟐 -norm using two groups [𝒗 𝟏 ,𝒗 𝟐 ] and [𝒗 𝟑 ], resulting in »𝒗 𝟏 𝟐 +𝒗 𝟐 𝟐 ½ 𝟏 𝟐 +|𝒗 𝟑 |; and (b) 𝒍 𝟏 -norm, i.e., |𝒗 𝟏 |+|𝒗 𝟐 | +|𝒗 𝟑 | (Figure is borrowed from [Bach et al., 2012]). A simple example can illustrate the main distinction between the mixed 𝑙 # /𝑙 % -norm and the regular 𝑙 # -norm minimizations. Consider the unknown vector of parameters 𝐯=[𝑣 # 𝑣 % 𝑣 @ ] that are grouped into 𝐯 # =[𝑣 # ,𝑣 % ] and 𝐯 % =[𝑣 @ ]. Based on this grouping, the mixed 𝑙 # /𝑙 % and regular 𝑙 # norms are defined as (𝑣 # % +𝑣 % % ) D E +|𝑣 @ | and |𝑣 # |+|𝑣 % | +|𝑣 @ |, respectively. Figure 2.1 depicts the geometric interpretations of (𝑣 # % +𝑣 % % ) D E +|𝑣 @ |=𝑐 # and |𝑣 # |+|𝑣 % | +|𝑣 @ |=𝑐 # . For linear measurement constraints, that is, 𝐝 stu = 𝐆𝐯, the solution should lie at the intersection of a hyperplane with the balls shown in Figure 2.1. With the 𝑙 # -norm penalty a sparse solution is more Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 48 likely to occur on one of the axes (Figure 2.1(b)). On the other hand, for the 𝑙 # /𝑙 % function, the solution does not have to occur on one of the axes since 𝑣 # and 𝑣 % can simultaneously assume non- zero values at the minimum. Hence, group-sparsity allows for parameters within each group to take non-zero values without affecting the regularization term. Figure 2.2. Comparison between reconstruction results with group-sparsity and 𝒍 𝟏 -norm regularization for different number of observations: (a) original group sparse signal; (b1)-(b5) reconstruction results for group-sparsity and regular 𝒍 𝟏 -norm sparsity inducing regularization for 10, 20, 40, 60 and 80 observations, respectively. Although the group-sparsity regularization can be applied to reconstruct group-sparse signals, an important assumption is the knowledge of grouping and group-sparse structure. To compare the performance of group-sparsity with regular 𝑙 # -norm sparsity regularizations, Figure Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 49 2.2 shows the reconstruction results for a group-sparse signal. In this example, the group-sparse signal 𝐯 has 150 entries that are clustered into 10 groups of 15 consecutive elements (Figure 2.2(a)). Only Groups 3 and 8 are active in the reference model. A linear measurement of the form 𝐝 stu =𝐆𝐯 is assumed, where 𝐆 has entries drawn from an independent and identically distributed Gaussian probability density function. We use the resulting random measurements (not shown) to evaluate the reconstruction results in a linear least-square formulation containing a data mismatch term ‖𝐝 stu −𝐆𝐯‖ % % with regular sparsity and group-sparsity regularization to highlight the different behavior of the two. Figures 2.2(b1)-2.2(b5) depict the reconstruction results using 𝑙 # /𝑙 % regularization (left) and 𝑙 # regularization (right) with 10, 20, 40, 60 and 80 measurements, respectively. When fewer observations are available, such as in subsurface flow model calibration problems, group-sparsity provides a superior constraining power to regular 𝑙 # -norm sparsity regularization. The two methods give similar results when a large number of observations is used. To explore the effect of incorrect grouping, Figure 2.3 shows the results of two sets of experiments. In the first set, Figures 2.3(a1)-2.3(a5) (left column), 30 groups with 5 elements in each group are assumed (each two consecutive group is equivalent to one group in the original example). The results show departure from the reference model, especially when fewer observations are used. In the second case, Figures 2.3(b1)-2.3(b5), each group contains 10 elements with completely misplaced grouping structure. In this case, several irrelevant elements are also selected, especially when fewer observations are used. These two experiments highlight the importance of consistent grouping in successful implementation of group-sparsity regularization. Another important factor in the group-sparsity formulation is the regularization parameter 𝜆. In addition to controlling the final solution, the value of 𝜆 can have an impact on the selected Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 50 groups. Figure 2.4 shows the sensitivity of selected groups to the regularization parameter 𝜆 in the range [10 ¶#j ,800], when 60 observations are used. Intuitively, for very small (large) values of 𝜆, the regularization (data mismatch) term is neglected, resulting in overfitting (underfitting) of data. However, the behavior of the solution for midrange 𝜆 values is not easy to predict. Interestingly, the selected groups show little sensitivity to the specified 𝜆 value within a large range. It appears that the variability in 𝜆 only affects the magnitude of the selected coefficients without changing the selected groups. Figure 2.4(b) displays the data match for the solutions obtained with different values of 𝜆, which shows better data matches as 𝜆 decreases. Figure 2.3. Reconstructing results for the group-sparse signal in Figure 2.2 for: (a) groups with size 5, by splitting the original groups of size 15 (overlapping case); (b) groups with size 10. In the latter case, the grouping structure is inconsistent with the structure in the reference model. Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 51 Figure 2.4. (a) Reconstruction results for the case with 60 measurements (Figure 2.2(b4)) with different regularization parameters; (b) data match for reconstructed signal with the corresponding regularization parameter in (a). These results suggest that, with a wide range, the group selection property of 𝑙 # /𝑙 % regularization is robust against the choice of 𝜆. Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 52 2.3. Learning Group-Sparse Parameterization To formulate the subsurface inverse problems with group-sparsity regularization, we denote 𝚽 = {𝝓 # ,𝝓 % ,…,𝝓 c } as the sparsifying basis. We define the set of 𝑝 pre-specified (based on prior knowledge) groups of basis elements as Ω ={𝜁 # ,𝜁 % ,…,𝜁 ÿ } where 𝜁 -,#Õ-Õÿ ={𝝓 ò -,# ,𝝓 ò -,% ,…,𝝓 ò -,c æ } represents group 𝑖, consisting of 𝑘 - elements (1≤𝑘 - ≤𝑘) of 𝚽. Since the groups can have overlapping elements, we collect all the group elements to define an expanded dictionary 𝚽 ò )×- = .𝜁 # ,𝜁 % ,…,𝜁 ÿ /=[𝝓 ò # ,𝝓 ò % ,…,𝝓 ò - ], where 𝐾 = ∑ 𝑘 - ÿ -1# . The linear expansion can still be expressed as: 𝐮≈∑ 𝝓 ò - 𝑣̅- - -1# = 𝚽 ò 𝐯 / (2.9) , where 𝐯 /=[𝑣̅# ,𝑣̅ % ,…,𝑣̅- ] are the coefficients corresponding to all the groups. For overlapping basis elements, the resultant final coefficients for each 𝝓 -:#Õ-Õc are obtained by summing the corresponding repeated coefficients in all groups. We can now write the group-sparsity regularized minimization objective function as follows: min 𝐯 / 𝐽(𝐯 /) = ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝚽 ò 𝐯 /)½´ % % + 𝜆 % ∑ 𝜇 - ÿ -1# ‖𝐊 - 𝐯 / - ‖ % . (2.10) Here, 𝜇 - is a positive weight for the 𝑖 [\ group, and 𝐊 - ð 𝐊 - is a diagonal matrix for group 𝑖, with its 𝑗 [\ diagonal entry is specifying the weight given to the 𝑗 [\ basis element in the corresponding group (‖𝐊 - 𝐯 / - ‖ % =( 𝐯 / - ð 𝐊 - ð 𝐊 - 𝐯 / - ) D E ). An important observation from Section 2.2 is that, in group-sparsity regularization, even when the number of measurements is too low to allow perfect recovery, the active groups are correctly identified (see Figure 2.2). This has important practical implications as it suggests that, in ill-posed problems with limited data, it is possible to correctly identify the correct structure even Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 53 though the reconstruction itself is not exact. For example, when the groups represent certain modes of a system or different types of proposed priors, the method is able to identify the consistent mode(s) or prior(s) although the final solution is not captured exactly (due to data limitation). On the other hand, when regular sparsity is used with limited data, the reconstruction is not able to identify the solution structure from available data. However, the performance of group-sparsity regularization depends on the similarity between the sparse structures in the prior model and the solution. This similarity can be easy to establish in some applications. In geologic context, for example, distinct connectivity features can form separate groups. A remaining question is how to identify the groups and their elements from prior information. In this chapter, we consider two special cases that are inspired by multi-resolution Wavelet tree structures and the sparse principal component analysis [Shen and Huang, 2008] that is recently introduced for multivariate analysis and regression problems. Grouping with Wavelet Tree Structures: The discrete Wavelet transform in 𝐑 ) (𝑛=2 l ,𝑟∈𝐍) decomposes the space into orthogonal and complementary subspaces 𝑉 G , 𝑊 G , 𝑊 ¶# , 𝑊 ¶% , …, 𝑊 ¶l¾# such that their combination 𝑉 G ⨁𝑊 G ⨁𝑊 ¶# ⨁𝑊 ¶% …⨁𝑊 ¶l¾# spans 𝐑 ) . Unlike Fourier basis, Wavelets are localized both in space (time) and frequency. That is, Wavelet-transformed representation of a signal contains information about the frequency content of the signal at different spatial locations whereas a Fourier transform signal does not retain the spatial information. A Wavelet basis function at scale 𝑗 (𝑊 K ) is merely the shifted and dilated version of the basis function at scale 𝑗+1 (𝑊 K¾# ), implying a direct relation between the basis elements at different scales. Therefore, Wavelet decomposition offers a multi-scale structural relationship that lends itself to defining sparse groups. Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 54 For the basis function 𝝓 G in 𝑉 G (ususally the DC component), and 𝝓 K- as the 𝑖 [\ basis element at scale 𝑗, every Wavelet basis element at scale 𝑗+1 (i.e., 𝝓 K¾#,- ) is parent to four basis elements at scale 𝑗. This parent/children relation is repeated from the root to the leaves of the tree. Figure 2.5(a) depicts the tree structure of the Wavelet basis for a 8×8 two-dimensional image (Figure 2.5(b), top) defined by the subspaces 𝑉 G ⨁𝑊 G ⨁𝑊 ¶# ⨁𝑊 ¶% . Starting from the leaves, the finest scale (𝑊 ¶% ), there exists a parent basis at the coarser scale (𝑊 ¶# ), whose parent belongs to the coarser scale 𝑊 G , and so on. The parent/children relations across the scale can define alternative grouping structures, including, for example, groups consisting of all the basis elements along each path from the leaves to the root of the tree. Figure 2.5(b) (bottom) depicts three such groups for the Haar Wavelet each corresponding to a path taken from the root to one of the leaves. Figure 2.5. Tree structure of Wavelet basis: (a) The tree structure of the existing basis elements at different scales; (b, top) The position for the coefficients of each basis element, 0 is the DC basis component in 𝑽 𝟎 and (𝒋,𝒊) is the coefficient for 𝝓 𝒋,𝒊 ; (b, bottom) Three examples of groups that represent variability in horizontal, diagonal and vertical directions, respectively. Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 55 The implication of the wavelet-based grouping is that if a basis element at a given scale is determined to be significant, all its off-springs and parents are also likely to be significant. This multi-resolution tree-structure of Wavelets has been extensively exploited in image processing applications. For spatial images, the stated grouping combines basis elements pertaining to a local region of the model domain into a group. With this grouping constraint, if the available data for inversion exhibit significant sensitivity to coarse-scale features in certain parts of the solution, this information can propagate to fine-scale basis elements of the same regions. Hence, the group- sparsity regularization in Equation (2.10) provides a mechanism to incorporate the structural prior information of the Wavelet tree in recovering the solution. In this case, since the grouping is an intrinsic property of the tree structure, without any additional prior information about the importance of each tree path, each group is considered to be equally important. Therefore, the weights 𝜇 - and 𝐊 - ð 𝐊 - in Equation (2.10) are set to 1 and identity matrix, respectively. Next, we consider a grouping structure based on explicit prior information in the form of a training dataset consisting of several hundreds of prior model realizations. Such prior models are typically generated using geostatistical simulation techniques. Grouping with Sparse PCA: We adopt the sparse PCA algorithm to implement a grouping scheme for group-sparsity regularization based on prior model realizations as training data. Denoting a collection of 𝐿 prior model realizations as 𝐔 )×ñ =[𝐮 # ,𝐮 % ,…,𝐮 ñ ], where each column of 𝐔 is one realization of the vectorized model parameters, the sparse representation of each realization can be obtained through a linear expansion with the dictionary 𝚽 )×c as 𝐮 -,#Õ-Õñ = 𝚽𝐯 -,#Õ-Õñ . The columns of 𝐕 c×ñ = [𝐯 # ,𝐯 % ,…,𝐯 ñ ] represent a single realization of 𝑘 (sparse) transform-domain coefficients, whereas each row of 𝐕 c×ñ contains 𝐿 realizations of one of the 𝑘 coefficients 𝐯 K,#ÕKÕñ = {𝑣 # ,𝑣 % ,…,𝑣 c } (K) corresponding to the 𝑘 basis elements. We define a Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 56 grouping scheme for the 𝑘 (sparse) basis elements based on their contribution in generating the leading principal components of the transformed representations of the training dataset. The standard PCA algorithm sequentially searches for an orthogonal set of vectors 𝐰 K,#ÕKÕÿ ={𝑤 # ,𝑤 % ,…,𝑤 c } (K) that maximize the variance of 𝐳 K =𝐕 ð 𝐰 K . That is: 𝐰 K =argmax ‖𝐰‖ E 1# ‖𝐕 ð 𝐰‖ % = argmax ‖𝐰‖ E 1# 𝐰 ð 𝐕𝐕 ð 𝐰 𝑗 =1 (2.11a) 𝐰 K =argmax ‖𝐰‖ E 1# Ü𝐕 9 K¶# ð 𝐰Ü % =argmax ‖𝐰‖ E 1# 𝐰 ð 𝐕 9 K¶# 𝐕 9 K¶# ð 𝐯 𝑗≠1 (2.11b) , where 𝐕 9 K ð =𝐕 ð (𝐈−∑ 𝐰 c 𝐰 c ð K¶# c1# ) (2.11c) Each 𝐳 K (=𝐕 ð 𝐰 K ) and the corresponding 𝐰 K are called PC scores and the corresponding PC loadings, respectively. The PCA algorithm does not consider any constraint on the PC loadings 𝐰 K other than their orthogonality. A well-known drawback of the PCA is that each PC 𝐳 K is a linear combination of all the input data, which is not desirable for classification. For grouping, one needs to represent each PC with only a few (sparse) input data in 𝐕 ð . The sparse PCA is developed to achieve this goal by adding sparsity constraint on the PC loadings. A recent formulation of the SPCA, known as SPCA-rSVD [Shen and Huang, 2008], is based on singular value decomposition. In this formulation, each leading PC is a linear combination of a small number of 𝐕 ð columns. The following optimization problem can be solved to sequentially find vectors 𝐳 K (PCs) and 𝐰 K (loadings) and the constants 𝜌 - : min 𝐳 < ,𝐰 < ,{ < 𝐽»𝐳 K ,𝐰 K ,𝜌 K ½= Ü𝐑 K − 𝐳 K 𝜌 K 𝐰 K ð Ü = % +𝛼 Ü𝐰 K Ü # subject to Ü𝐰 K Ü % =1 and Ü𝐳 K Ü % =1 (2.12) Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 57 , where 𝐑 K,K>% = 𝐕 ð − ∑ 𝐳 c 𝜌 c 𝐰 c ð K¶# c1# and 𝐑 # =𝐕 ð . The weights 𝜌 - represent the energy carried by the active elements in the 𝑖 [\ group and can be used to define the weights (contribution) of the elements in each group (𝜇 K = # { < ). Similarly, the absolute values of the active elements of 𝐰 K represent the contribution of each active element to each group (𝑘 -K in Equation (2.8)). A more detailed description of this algorithm is presented in Appendix C. In our context, the prior model realizations of the unknown parameters are expected to have similarities and contrasts that can be used to classify them. Therefore, the relation between the sparse coefficients of the prior model realizations can be exploited to define a group-sparsity constraint. To this end, the sparse representation of the prior model realizations 𝐕 ð can be expressed in terms of the SPCA decomposition: 𝐕 ð ≈ ∑ 𝐳 K 𝜌 K 𝐰 K ð ÿ K1# (2.13) , where 𝐕 ð is approximated with 𝑝 PCs and their corresponding sparse loadings. Hence, the best rank-1 approximation to the residual 𝐑 K,K>% can be written as 𝐑 K,K>% ≈𝐳 K 𝜌 K 𝐰 K ð → 𝐳 K ≈ # { < 𝐑 K 𝐰 K . (2.14) Since the loading 𝐰 K corresponding to the 𝑗 [\ PC (𝐳 K ) is a sparse vector, only a few columns of 𝐑 K contribute to representation of the related PC (𝐳 K ). Therefore, we consider the active columns of 𝐑 K in representing each leading PC as a distinct group. The number of groups is the same as the number of leading PCs considered. To illustrate the proposed grouping procedure, we use an example based on the DCT basis as depicted in Figure 2.6. The prior model in this example is represented by 𝐿= 1000 realizations of fluvial channel permeability maps with the known log-permeability values of log(10) and log (600), for the low and high permeability facies, respectively. These realizations are generated Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 58 from a training image using the SNESIM algorithm of the SGEMS software. We choose the full- rank DCT basis to represent the 64×64 parameter fields (𝚽 WG@X×WG@X ). Figure 2.6 shows 24 sample prior permeability maps (Figure 2.6(a)) and six groups corresponding to the leading 𝑖 = {1,2,3,100,155,200} PCs (Figure 2.6(b), top), as well as 𝜌 - values for the first 100, out of total 200, PCs (Figure 2.6(b), bottom). The first group consists of a single basis element, which is the DC component of the dictionary that carries a significant level of energy. The group associated with the first PC’s consist of basis elements of the dictionary that carry the global features of the field. Hence, they have larger 𝜌 K values (more significant contributions to representing the prior models). Since the prior permeability maps show sudden changes in the E-W direction (more continuity in the S-N direction), the DCT basis elements with higher oscillation in the E-W (more continuity in S-N) directions have a more significant contribution in approximating the prior models. Finally, since group-sparsity with SPCA takes advantage of prior model realizations to train the DCT coefficients, for comparison, we also consider a weighted 𝑙 # -norm sparsity for regularization. For this case, for each DCT basis elements, we assign the weight 𝑤𝑒𝑖𝑔ℎ𝑡 - = # Þ{¡ æ E } , where 𝐸{∙} is the expectation operator and 𝑣 - is the DCT coefficient (other choices such as 𝑤𝑒𝑖𝑔ℎ𝑡 - = # {|¡ æ |} may also be used). To calculate Þ𝐸{𝑣 - % } from prior realizations, we project them onto the same low-rank basis that is used for SPCA learning, i.e., 𝚽 WG@X×c for DCT, and use Þ𝐸{𝑣 - % }≈ C ∑ ¡ æ< E D E <èD ñ to calculate the weights (𝑣 -K is the coefficient of 𝑗 [\ prior realization corresponding to the 𝑖 [\ basis element). With this approach, smaller weights assign lower penalty to the corresponding basis coefficients, allowing them to assume larger values. When the prior Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 59 information is reliable, the weighted 𝑙 # -norm regularization is expected to perform better than the standard 𝑙 # -norm regularization. Figure 2.6. SPCA algorithm for grouping the basis elements in the DCT transform. Left: The prior permeability distribution with vertically-oriented channels. Right top: 6 groups out of total 200 generated groups with 1, 4, 8, 17, 11, 17 members which are the bases functions in the groups 1, 2, 3, 100, 155, 200, respectively (For example: group 155 with 11 bases). Starting from left to right in the first row and then the same direction in other rows, the maps in between two consecutive blank squares are members of the same group (For Example: the 4 bases functions between the first and second blank squares are members of group 2). Right bottom: The corresponding 𝒅 𝒊 ’s for the first 100 generated groups. 2.4. Numerical Results We consider three sets of numerical examples to evaluate the performance of the method discussed in Section 2.3. The first experiment is a linear inverse problem involving straight-ray travel time tomography. The second experiment is based on a pumping test in a groundwater aquifer with Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 60 single phase flow. This example is used to evaluate the effect of observation noise on the robustness of group-sparsity regularization. The last experiment represents a nonlinear inverse problem in a two-phase (oil/water) flow system, where the heterogeneous permeability distribution in the formation is estimated from multiphase flow and pressure data. Figure 2.7. Tomographic inversion in Section 2.4: (a) Configuration for Tomographic inversion; (b) Best achievable maps (in RMSE) for Cases 1, 2, 3 and 4 with Wavelet transform; (c) Recoverability in tomographic inversion with the Wavelet bases; recovered solutions for Case 1 (c1), Case 2 (c2), Case 3 (c3), and Case 4 (c4) are shown. For each case, the solution with group-sparsity (top) and regular l 1-norm minimization (bottom) are shown. Tomographic Inversion: We first examine the performance of the proposed methods in a linear tomographic problem [Bregman et al., 1989]. The setup for the tomographic inversion here is shown in Figure 2.7(a), in which sample straight ray paths from source arrays (on the left) to the receiver arrays (on the right) are shown. The last column of Figure 2.7(b) (second row, last column) shows a 64×64 slowness map of the domain with two different rock types. The domain Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 61 consists of 4096 cells, each with dimension 10×10 m % . The objective of the tomography is to reconstruct the reference slowness map from arrival time data for different source-receiver pairs. For this experiment, we use smooth Daubechies Wavelets with 10 vanishing moments, and solve the inversion problem for four different experimental setups that are summarized in Table 2.1. The Wavelet decomposition is iterated for five levels, corresponding to 𝑉 G ⨁𝑊 G ⨁…⨁𝑊 ¶j . Four different search subspaces are considered (see Figure 2.7(b)); for the 𝑖 [\ case, the initial dictionary 𝚽 consists of the subspaces spanned by 𝑉 G ⨁𝑊 G ⨁…⨁𝑊 ¶-¶# . In each case, the inverse problem is solved for five different source-receiver pairs, namely [8, 8], [12, 12], [20, 20] and [32, 32] corresponding to 64, 144, 400 and 1024 arrival time data, respectively. In total, 20 inverse problems are solved to cover different numbers of parameters and available data. The second row of Figure 2.7(b) shows the solutions that can be obtained, for each case, if all the pixel values of the reference map were known. Table 2.1. Different cases for the chosen dictionary and their corresponding tree structure in section 2.4 (Tomographic inversion). Case Window N. bases in 𝜱 N. groups N. Bases P/Group N. Bases in 𝜱 ò 1 8×8 64 48 4 192 2 16×16 256 192 5 960 3 32×32 1024 768 6 4608 4 64×64 4096 3072 7 21504 Figure 2.7(c) illustrates the recovered maps for all four cases with different number of observations, using group-sparsity (top rows) and regular 𝑙 # -norm regularization (bottom rows). The reconstruction results, especially for smaller observation-to-parameter size ratios, clearly show superior performance for group-sparsity over the 𝑙 # -norm regularization. The plot in Figure 2.7(c1) shows the results for a small number of parameters. In this case, because of under- parameterization, increasing the number of data without including additional parameters does not improve the reconstruction quality. For large parameter sizes (cases 3 and 4), however, the solution Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 62 shows sensitivity to data size. For smaller number of observations, the group-sparsity provides better solutions than the 𝑙 # -norm regularization does (cases 2, 3 and 4). As the number of available data increases (the importance of regularization decreases) the difference in reconstruction quality diminishes. Figure 2.8. Pumping test (a) and waterflooding (b) at Section 4.2: (a1) Configuration of pumping test; (a2) Reference transmissivity, optimal transmissivity map that can be captured by the parameterization, the obtained reference pressure Head and the initial pressure head fields for two cases discussed; (b1) configuration of 9-spot water flooding as well as the reference permeability and optimal one that can be captured by 256 DCT bases; (b2) Saturation and pressure profiles at different time steps as labelled. The pressure and saturation data are obtained from these two profiles at the location of injection and production wells. Pumping Test in Single Phase Flow, Effect of Observation Noise: In this example, we study a pumping test in a groundwater aquifer. The experimental setup is depicted in Figure 2.8(a1). No-flow boundary condition is applied to the top and bottom boundaries while for the right and left boundaries constant pressure heads of 10 m and 20 m, respectively, are applied. This setup establishes a pressure gradient (background velocity field) from left to right sides. A single well in the middle of the domain is pumping water with a constant rate of 0.0578 U I u at a steady state flow condition. The spatial distribution of log-transmissivity (unit: log #G ( U E u ) ) is Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 63 assumed to be the only unknown parameter to be estimated. The log-transmissivity value for the two facies types are -1.7047 and -0.4037, respectively. Pressure head and transmissivity are measured at the monitoring well locations surrounding the pumping well. Two cases are considered, one using the Wavelet tree structure, and another using the DCT basis with SPCA grouping method. Case1: Group-Sparsity with Wavelet Tree Structure: In this case, the aquifer of size 640 ×640 m % is uniformly discretized into 64×64 cells, and the pumping well is located at cell (32, 32). A total of 256 (out of 4096) low-frequency Daubechies-10 Wavelet basis elements are used. The reference transmissivity field and its compressed approximation, as well as the reference and initial pressure head fields, are depicted in Figure 2.8(a2) (case 1). Three cases with different number of monitoring wells are considered. The number of randomly located monitoring wells in these three experiments, cases 1, 2 and 3, is 10, 20 and 30, respectively (top row of Figure 2.9(a)). Here, we assume that the measurement noise is set to be zero. The reconstructed transmissivity and the corresponding pressure fields for each case are depicted in Figures 2.9(b)-2.9(d) for group sparsity, regular 𝑙 # -norm and first order Tikhonov regularizations, respectively. For Case 1, with very few observations, the three methods of regularizations are not able to capture the connectivity in the entire transmissivity field, however the reconstructed map from group-sparsity regularization can capture the channel feature on the top, and Tikhonov regularization captures the global connectivity of the field without accuracy in detecting the channel shape and its local curvature. In all cases, group-sparsity is better able to identify the correct connectivity in the reference transmissivity field, however the Tikhonov regularization can also capture the global connectivity without extracting the width and shape of Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 64 the high transmissive regions. The reconstruction results show that the difference between the reconstruction qualities in the three methods decreases with increasing the number of observations. Figure 2.9. (a) location of monitoring (dots) and pumping (square) wells for the 3 cases in section 2.4 (pumping test) case 1; Reconstructed transmissivity and obtained pressure head fields for (b) group-sparsity (c) regular 𝒍 𝟏 and (d) first order Tikhonov regularizations. A truncated dictionary consisting of 32 or 64 Wavelet bases functions (out of 4096) can also be a valid parameterization space, however with this level of parameterization only the global connectivity will be captured, local estimation will not be affordable, and group-sparsity will not be very useful (based on sensitivity analysis in section 2.3). Simulation results (not depicted here) Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 65 indicate that with under-parameterization of the bases functions, using DCT or Wavelets, a global connectivity can be captured, however for small number of observations, such as the three cases here, many artifacts will appear in the estimated map. Group-sparsity provides constraints to eliminate these local artifacts for both under parameterized and over parameterized Wavelet based dictionaries. As a conclusion here, while the tree structure of Wavelets provides constraining prior information to improve the solution, it does not provide any information about the existence or shape of the channels. Hence, this level of prior information (tree structure) may not provide enough constraints on the parameter estimation for small of number of observations. In fact, group-sparsity (in the case of Wavelet tree structure) can enhance the recovery efficiency if the locally preserving bases functions are added to the initial dictionary to capture the local features while preserving the global connectivity. Figure 2.10. (a) Prior transmissivity realizations in section 2.4 (pumping test) case 2; Right: Initial transmissivity field and obtained initial pressure head. Case 2: Group-Sparsity Using SPCA with DCT Basis: We now consider a 500 ×500 m % domain, which is uniformly discretized into 50×50 cells for the pumping test and assume that 1000 prior realizations of the transmissivity model are available for grouping with SPCA. Figure 2.10(a) (left) shows 24 out of 1000 prior transmissivity realizations. A total of 256 low frequency DCT basis elements (𝚽 %jGG×%jX ), out of the complete set of 2500, are used to approximately represent the prior training data with sparse coefficients. Including additional high- Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 66 frequency DCT basis components can only result in minimal improvement. The SPCA algorithm is then applied to generate 150 groups. The grouped dictionary 𝚽 ò consists of 2920 (≈ 20 bases per group). Figure 2.8(a2) (Case 2) illustrates the reference transmissivity, its compressed representation, as well as the reference pressure head fields. The initial transmissivity and pressure head fields are depicted in Figure 2.10(b). Figure 2.11. Reconstructed transmissivity and obtained pressure head fields for different levels of noise at section 2.4 (pumping test) case 2. Each column is representing the results of simulations in a specific level of noise variance defined based 𝜶 for three methods group-sparsity (a), regular 𝒍 𝟏 (b) and weighted 𝒍 𝟏 (c). Noisy measurements of transmissivity and steady state pressure heads from 16 monitoring wells are used for inversion (see Figure 2.8(a2) (case 2) for well locations). The observation noise is modeled using a zero-mean Gaussian random variable with standard deviation proportional to the measurement with the proportionality constant 𝛼. The inversion is performed for different Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 67 values of 𝛼 to investigate algorithm robustness against noise level. The reconstructed transmissivity and the resulting steady state pressure head distribution for the entire field, for different values of 𝛼, are summarized in Figure 2.11. The results are shown for three methods: group-sparsity, regular 𝑙 # -norm sparsity, and weighted 𝑙 # -norm sparsity regularizations. Table 2.2 lists the reconstruction errors for transmissivity and pressure head using two measures, 𝑒 # = # ) ∑ (𝑢 - −𝑢 / - ) % ) -1# and 𝑒 % = # ) ∑ |𝑢 - −𝑢 / - | ) -1# (where 𝑢 - and 𝑢 / - are, respectively, the reference and estimated parameters at cell 𝑖). It is clear from Table 2.2 and Figure 2.11 that the group-sparsity regularization (Figure 2.11(a)) outperforms both regular (Figure 2.11(b)) and weighted (Figure 2.11(c)) 𝑙 # -norm regularization methods. In fact, the regular 𝑙 # penalty cannot capture the geologic connectivity of the transmissivity field for the number of data used in this experiment; with more data, the channel features begin to appear in the solution (results not shown). While incorporation of prior training data with the weighted 𝑙 # -norm regularization significantly improves the results from regular 𝑙 # -norm reconstruction, it is outperformed by group-sparsity regularization. Table 2.2. Average error between the estimated transmissivity and pressure heads and the true ones for different levels of noise (𝒆 𝟏 = 𝟏 𝒏 ∑ (𝒖 𝒊 −𝒖 ò 𝒊 ) 𝟐 𝒏 𝒊1𝟏 and 𝒆 𝟐 = 𝟏 𝒏 ∑ |𝒖 𝒊 −𝒖 ò 𝒊 | 𝒏 𝒊1𝟏 ) in Section 2.4 (pumping test) case 2. 𝜶 0 0.02 0.035 0.05 0.10 0.18 Transmissivity (𝒆 𝟏 ) GS 0.2315 0.2260 0.2463 0.3947 0.3860 0.3566 𝒍 𝟏 -Weighted 0.3230 0.2945 0.5100 0.4041 0.5002 1.1472 Transmissivity (𝒆 𝟐 ) GS 0.3815 0.3833 0.3950 0.4793 0.4786 0.4780 𝒍 𝟏 -Weighted 0.4574 0.4198 0.5661 0.5104 0.5537 0.8341 Pressure Head (𝒆 𝟏 ) GS 0.0401 0.0504 0.0854 0.2569 0.6192 0.6644 𝒍 𝟏 -Weighted 0.0470 0.0901 0.2124 0.3664 2.0715 1.5912 Pressure Head (𝒆 𝟐 ) GS 0.1500 0.1844 0.2209 0.3562 0.6436 0.6213 𝒍 𝟏 -Weighted 0.1678 0.2252 0.3461 0.4433 1.0916 0.9440 Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 68 Subsurface Flow Model Calibration: Nonlinear Inversion: We now apply the group-sparsity regularization to nonlinear inverse problems involving two-phase (oil-water) flow systems. For this experiment, 256 low-frequency DCT basis elements are selected, using a square truncation of size 16×16, to represent 64×64 permeability field (𝚽 WG@X×%jX ). A water injection well is placed in the middle of the field and eight production wells are located symmetrically along the edges of the domain. The simulation time is 8 years. For the first four years, pressure and watercut (ratio of water to total fluid produced) data, at every 72 days, are used for inversion. Data from the remaining 4 years are used for evaluating the forecast performance of the calibrated models. Snapshots of the saturation and pressure profiles for 5 time steps are depicted in Figure 2.8(b2). The parameters of the two-phase flow simulation are provided in Table 2.3. Table 2.3. Parameters of flow equations in Section 2.4 (waterflooding example). Parameter Parameter Value Flooding type 9 spot Flow system 2-phase w/o Simulation time 8 years Observation step 72 days Field grids 64×64×1 Grid dimension 10×10×10 m @ Field dimension 640×640×10 m @ Porosity 0.2 Initial oil sat. 1 History match time 4 years Inj. Vol. (4) years 0.8 PV Prediction time 4 years Inj. Vol. (8) years 1.6 PV A total of 1000 prior model realizations are generated using the SNESIM algorithm. Figure 12(a) shows sample prior realizations for this example. The low-rank DCT coefficients of these prior realizations are used as training data for grouping with SPCA, and for finding the weights in the weighted 𝑙 # -norm method. Using the SPCA-rSVD algorithm, 256 groups (𝑝=256 in Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 69 Equation (2.13)) are generated, resulting in an expanded dictionary with 5442 elements (≈ 21 elements per group). Figure 2.8(b1) shows the experimental setup, the reference permeability map, and its best rank-256 approximation with the truncated DCT basis (𝚽 WG@X×%jX ). This truncation level can sufficiently preserve the channel features in the reference model (in practice, a reasonable truncation level can be determined by testing the approximation quality of the prior models). Figure 2.12. (a) Prior permeability realizations in Section 2.4 (waterflooding); (b) the recovered permeability maps with, from top to bottom rows, the group-sparsity (b1), regular 𝒍 𝟏 (b2), weighted 𝒍 𝟏 (b3), first order Tikhonov (b4) and total variation regularizations (b5), respectively (the columns show the solution at different iterations, as labelled). In this example, group-sparsity, weighted and regular 𝑙 # -norm, first-order Tikhononv, and total variation regularizations are compared (Figure 12(b)). The Tikhononv regularization extracts a smooth map that represents the general shape and connectivity of the reference model without recovering the exact location and geometry (thickness) of the channels. The total variation regularization tends to provide a sharper solution than the Tikhonov method. However, it is not able to capture the exact geometric attributes and thickness of the channels. The solution from l1- Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 70 norm regularization in this case can also capture the general connectivity trend (with less confidence at the channel bifurcation location). The weighted 𝑙 # -norm regularization leads to a better reconstruction result than the other three methods and is only second in performance to the group-sparsity regularization. We note that the weighted 𝑙 # -norm and group-sparsity regularization methods are expected to outperform the regular 𝑙 # -norm, Tikhonov, and total variation regularizations, simply because they use consistent prior model realizations to train the reconstruction coefficients. Figure 2.13. Water-cut data match and predictions for Producer 3 (a) and Producer 8 (b), based on the solutions obtained from group-sparsity, weighted 𝒍 𝟏 , 𝒍 𝟏 , first order Tikhonov, and total variation penalties. Figure 2.13 presents the water-cut data match and prediction results for Producers #3 and #8 for different methods. From the connectivity features in the field, it is clear that Producers #1, #4, #5 and #6 experience early water breakthrough (during calibration period) while Producers #3 and #8 have later water breakthrough times (during forecast period). Consistent with the permeability estimates, the results for Producers #3 and #8 reveal that the groups-sparsity regularization (compared to the weighted 𝑙 # case) is better able to predict the water breakthrough time and the shape of the water-cut curve. Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 71 2.5. Scalability, Computational Costs and Uncertainty Quantification An important aspect of any inverse modeling approach is scalability to large-scale (e.g., with 𝑛> 1,000,000) problem. Without a reduced-order parameterization, the calibration can result in inversion of 𝑛×𝑛 (𝑂(𝑛 N ),2<𝑙 ≤3) matrices. Even matrix multiplication for 𝑛×𝑛 systems is computationally quite demanding (𝑂(𝑛 N ),2<𝑙 ≤ 3). It is, therefore, important to note that the effective dimension of such models is far less than the number of cells. Hence, the first order of approximation is to describe such models in a less redundant basis than the typical Cartesian system. Parameterization with 𝑘 (𝑘~10 @¶W ) bases functions (𝚽 )×c ) can reduce the complexity of the matrix inversion and multiplication to (𝑂(𝑘 N ),2<𝑙 ≤ 3) and approximately (𝑂(𝑛𝑘𝑚), where 𝑚 is the size of observation, respectively. For 𝚽 ò )×- (𝐾~10 W ), the complexity of these operations increases to 𝑂(𝐾 N ),2<𝑙 ≤ 3 and 𝑂(𝑛𝐾𝑚), respectively. In addition to computational complexity, memory usage may provide another constraint in choosing 𝑘 and 𝐾. The computation required in Equation (2.12) involves 𝐕, 𝐑 K , 𝐳 K and 𝐰 K and is, hence, related to 𝐿 and 𝑘 and not the model size 𝑛 (see Appendix C). Therefore, for reasonable values of 𝑘 (parameterization dimension) and 𝐿 (number of prior realization), the related computations are far more affordable. After a set of numerical examples, we conclude that the computation time is most strongly affected by the size of the expanded dictionary (i.e., 𝐾), which depends on the size of the initial dictionary 𝚽 )×c . Hence, by using a low-rank representation for the prior models, the computation time is drastically reduced (see [Liu and Jafarpour, 2013] for details). It is important to consider the fact that the only difference here between the linear and nonlinear inverse problems in case of the computation costs is to add the average time of calculating adjoint of the parameters, Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 72 𝐆 𝐮 in Equation (1.13), to the entire process which is usually considered as a high computation demanding section in this process. An important extension of the proposed deterministic group-sparsity regularization is probabilistic formulation to enable uncertainty quantification. In some cases, a probabilistic formulation can be easily recognized by the form of regularization and a Bayesian formulation may be possible by assuming a Gaussian likelihood function and a prior probability density function (PDF) to promote the same behavior as the regularization term in the deterministic case. In that case, the estimated parameters from minimizing the regularized objective function in the deterministic case represent the maximum a-posteriori (MAP) solution. For 𝑙 # -norm regularization, the prior model can be described using a Laplace PDF [Kotz et al., 2001; Li and Jafarpour, 2010; Khaninezhad and Jafarpour, 2013]. Approximate sampling techniques, such as randomized maximum likelihood (RML) [Kitanidis, 1995; Oliver et al., 1996; Lee and Kitanidis, 2013], may then be used to quantify (approximately) the uncertainty in the solution and future predictions. This approach is taken in [Khaninezhad and Jafarpour, 2013] for standard 𝑙 # -norm regularization using a Laplace prior distribution. Finally, group-sparsity can be used for prior model identification to select a consistent prior geologic scenario from a set of proposed scenarios (e.g., a collection of variogram models or training images), which we will discuss it in the next chapter. 2.6. Summary and Discussion This chapter presented a group-sparsity regularization for constraining ill-posed linear and nonlinear inverse problems based on prior knowledge about the sparsity structure of the solution. While 𝑙 # -norm regularization offers an effective relaxation for the original 𝑙 G -norm sparsity Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 73 regularization, under certain conditions outlined in compressed sensing, additional constraining may be used to constrain the solution of ill-posed inverse problems. The 𝑙 # -norm regularization allows the components of the sparse solution to vary independently to satisfy the measurement constraint. This flexibility is helpful when available data provide sufficient constraining to uniquely identify the solution. In many applications, such as subsurface flow model calibration, the inverse problem is known to be severely ill-posed and the components of the sparse solution may exhibit certain structures/relations. In such cases, the existing sparse structures can be exploited to further constrain the solution of the inverse problem. The group-sparsity regularization is a special case of structured sparse models where sparsity is present across groups or blocks of components (i.e., only a small number of groups is needed to represent the solution). This chapter introduced two grouping schemes for this purpose: one based on the Wavelet tree structure, and another using a recently introduced sparse PCA method. The tree structure of the multiresolution Wavelet decomposition establishes parents- children relations across the scales, which allows for various forms of group definitions. For example, groups can be defined based on the tree paths from each leaf to the root, combining variables that belong to the same family, and expected to be correlated. Sparse PCA algorithm was used as a second approach to learn the grouping based on a set of prior model realizations. Several examples were presented to show that the group-sparse methods outperform the standard 𝑙 # -norm- based sparsity regularization when sparse structure is properly identified and enforced. Group-sparsity is a general concept that has application beyond subsurface model calibration. While the grouping scheme is application dependent, the proposed grouping methods in this chapter demonstrated the importance of group-sparsity in incorporating additional prior information (constraints) in solving ill-posed subsurface flow inverse problems. In general, group- Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 74 sparsity can enhance the inversion solution when the presumed grouping represents the solution structure consistently; however, imposing inconsistent group structures can bias the solution. Therefore, similar to any prior information, the key in successful implementation of group-sparsity regularization is the correct identification of the expected group structures. In many cases, sparse structures can be identified or learned in several ways, including the physics of the problem, the intrinsic properties of the coordinate systems (basis) used for representation, and available prior knowledge (training data). Chapter 2: Group-Sparse Feature Learning for Parameterization Purposes 75 Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 76 CHAPTER 3 GROUP-SPARSE FEATURE LEARNING FOR GEOLOGIC SCENARIO SELECTION Adopting representative geologic connectivity scenarios is critical for reliable modeling and prediction of subsurface flow and transport processes in subsurface environments. Geologic scenarios are often developed by integrating several sources of information, including knowledge of the depositional environment, qualitative and quantitative data such as outcrop and well logs, and process-based geologic modeling. In general, flow and transport response data are usually not included in constructing geologic scenarios for a basin. Instead, these data are typically matched using a given prior geologic scenario as constraint. Since data limitations, modeling assumptions and subjective interpretations can lead to significant uncertainty in adopted geologic scenarios, flow and transport data may also be useful for constraining the uncertainty in proposed geologic scenarios. Constraining geologic scenarios with flow-related data opens an interesting and challenging research area, which goes beyond the traditional model calibration formulations where the geologic scenario is assumed given. In this chapter, group-sparsity regularization, is proposed as an effective formulation to constrain the uncertainty in prior geologic scenario during subsurface flow model calibration. Given a collection of model realizations from several plausible geologic scenarios, the proposed method first applies the truncated singular value decomposition (TSVD) Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 77 to compactly represent the models from each geologic scenario. The TSVD basis for representing each scenario forms a distinct group. The proposed approach searches over these groups (i.e., geologic scenarios) to eliminate inconsistent groups that are not supported by the observed flow/pressure data. In the following of this chapter, we take advantage of the group selection property of the 𝑙 # /𝑙 % regularization to formulate a simultaneous geologic scenario identification and subsurface flow model calibration approach. The primary objective is to simultaneously perform model calibration and identify consistent geologic scenario(s) from several proposed plausible scenarios. A natural way to proceed is to place each geologic scenario into a group, as discussed below. 3.1. Geologic Scenario Identification with Group-Sparsity A significant source of uncertainty that can bias parameter estimation and future predictions can originate from the adopted geologic scenario, which is often derived from available data with considerable interpretation and subjectivity. It is not uncommon for different geologists to provide alternative conceptual models of connectivity when the available data is very limited. However, despite its significant impact, the uncertainty in the geologic scenario, which is responsible for the global flow behavior, is usually neglected in formulation of model calibration [Feyen and Caers, 2006; Suzuki and Caers, 2006; Jafarpour and Tarrahi, 2011; Riva et al., 2011; Khodabakhshi and Jafarpour, 2013; Khaninezhad and Jafarpour, 2014; Rousset and Durlofsky, 2014; Golmohammadi and Jafarpour, 2016]. In this section, a group-sparsity formulation is used to develop to constrain geologic scenario with flow data. We use several realizations from a distinct geologic scenario to form a group that represents the respective geologic scenario. While geologic scenarios are conceptual and qualitative representations of connectivity, process-based geologic modeling can Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 78 be applied to transform such conceptual models into quantitative representations. Furthermore, geostatistical simulation techniques are available to integrate the resulting geologic scenarios (e.g., training images) with available data to derive many conditional realizations that represent the likely distribution of rock formations. Given 𝑝 plausible prior geologic scenarios (denoted by 𝛀 𝐩 -1#,…,ÿ ) that are proposed for model calibration, each scenario is first used to generate an ensemble of model realizations (typically using geostatistical methods). Collecting the realizations for geologic scenario (𝑖) in columns of a matrix 𝐔 - results in 𝐔 - such matrices, i.e., 𝐔 # =[𝐮 ## 𝐮 #% …𝐮 #ñ ], 𝐔 % = [𝐮 %# 𝐮 %% …𝐮 %ñ ],…, 𝐔 ÿ =.𝐮 ÿ# 𝐮 ÿ% …𝐮 ÿñ /. Prior to model calibration, a compact representation of model realizations in each group, 𝐔 - , can be obtained by applying the TSVD parameterization to each group and forming the basis 𝚽 - (of dimension 𝑛×𝑘 - ). Note that, the truncation level 𝑘 - can be different for different groups, to account for the variable complexity of the geologic scenarios. A typical approach is to choose the number of singular vectors to maintain a certain fraction of the total variance (energy) in the original model, which determines the desired level of approximation. In addition, the number of realizations in each prior set should be sufficient to capture the variability and complexity of the connectivity model that they represent. A simple procedure to ensure that the included realizations are sufficient is cross-validation where the TSVD parameterization is used to approximate members from each group that are intentionally left out of the training set that used to construct the singular vectors. Combining the TSVD basis for each group leads to a hybrid basis 𝚽=.𝚽 # ,𝚽 % ,…,𝚽 ÿ /=[𝚽 )×c D ,𝚽 ) ×c E ,…,𝚽 )×c E ]. Hence, a typical model from these groups can be approximated as: 𝐮=𝚽𝐯 =[𝚽 # 𝚽 % … 𝚽 ÿ ][𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ] ð (3.1) Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 79 , where 𝚽 denotes the combined dictionary, [𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ] ð is a (∑ 𝑘 - ) ÿ -1# ×1 vector that contains the expansion coefficients in representing 𝐮. Hence, the vector of 𝐯 -,#Õ-Õÿ = [𝑣 -# ,𝑣 -% ,…,𝑣 -c æ ] ð contains the representation coefficients for basis elements within each group, 𝚽 -,#Õ-Õÿ . With this definition, if the groups are distinct and the solution belongs to one of the groups, the coefficients corresponding to the basis elements within that group will have significantly higher contribution to the expansion. It is possible for more than one single group to have significant contribution to the solution; however, if the groups are distinct, it is less probable for many groups to contribute to the solution, implying the desired group-sparsity property. In this case, using the mixed 𝑙 # /𝑙 % -norm for group-sparsity, the regularized objective function of the inverse problem can be expressed as: min 𝐯 𝐽(𝐯)=∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# 𝑠.𝑡., ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮))´ % % ≤𝜎 % (3.2a) 𝑠.𝑡., 𝐮=𝚽𝐯 =.𝚽 # 𝚽 % …𝚽 ÿ /[𝐯 # ;𝐯 % ;…;𝐯 ÿ ] or, min 𝐯 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝐮))´ % % + 𝜆 % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# (3.2b) 𝑠.𝑡., 𝐮=𝚽𝐯 =.𝚽 # 𝚽 % …𝚽 ÿ /[𝐯 # ;𝐯 % ;…;𝐯 ÿ ]. After solving this inverse problem, the solution 𝐮 and the geologic scenario(s) that significantly contribute to it are identified simultaneously. Appendix B presents the details of solving the optimization problem in Equation (3.2). In this chapter, we consider the 𝑙 % -norm of the coefficients of each group, i.e., ‖𝐯 - ‖ % , as the measure for activeness of the corresponding geologic scenarios that are provided prior to model calibration, i.e., 𝐔 - . In fact, the groups with larger 𝑙 % -norm contribute more in the final solution Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 80 and provide higher relevance for the corresponding geologic scenario (If the basis functions are normalized). Alternative model selection criterions [Ye et al., 2008; Riva et al., 2011; Foglia et al., 2013], including information-based techniques such as AIC [Akaike, 1974] and AICc [Hurvich and Tsai, 1989], Bayesian-based methods such as BIC [Schwartz, 1978] and KIC [Kashyap, 1982], can be used to assign a probability for each geologic scenario. In these cases, 𝐮 - = 𝚽 - 𝐯 - is the contribution of the 𝑖 [\ geologic scenario in representation of the final solution, i.e., 𝐮= [𝚽 # 𝚽 % … 𝚽 ÿ ][𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ] ð . Hence, comparing 𝐮 - =𝚽 - 𝐯 - with 𝐮 using the measurement criteria defined in these methods can provide a probability of contribution for each geologic scenario. The inactive (active) groups will result in a larger (smaller) error term in the comparison of 𝐮 and 𝐮 - ; Consequently, the measurement criteria in these methods will assign a smaller (larger) probability for the corresponding geologic scenarios. With the above grouping of geologic scenarios, a group-sparse formulation, e.g., Equation (3.2), can be invoked to identify consistent geologic scenarios from a set of uncertain prior models. A pseudo code of overall inversion procedure can be found in Appendix D. In the TSVD representation, the singular vectors within each group are ordered based on the magnitude of the singular values, which can provide a reasonable choice for the weights matrix 𝐊 - ð 𝐊 - , which will be given by the diagonal matrix (𝚲𝚲 ð ) ¶# . With this choice, during inversion the leading elements of the TSVD basis (with larger singular values) are penalized less harshly. 3.2. Further Exploration of Group-Sparse Formulation To illustrate the effectiveness of group-sparsity regularization in presence of multiple distinct prior geologic scenarios, we consider the three training images (TI) in Figure 3.1(a) that represent Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 81 different types of connectivity structures. Assuming that the direction of channel continuity is unknown, Figure 3.1(b) shows a total of 10 different scenarios (groups). Figure 3.1. (a) Three TIs with meandering (left), intersecting (middle), and straight (right) channel features; (b) four sample realizations simulated from each TI. The TI with meandering channel is considered with two rotation angles (𝜽= 𝟎 ° and 𝜽=𝟗𝟎 ° ) while the other two TIs each lead to four alternative scenario with rotation angles 𝜽=𝟎 ° ,𝟒𝟓 ° ,𝟗𝟎 ° ,𝟏𝟑𝟓 ° ; (c) Sample TSVD basis elements corresponding to different geologic scenarios, i.e., 𝜱=[𝜱 𝟏 𝜱 𝟐 …𝜱 𝟏𝟎 ] and the TSVD basis obtained after combining all prior realizations, i.e., 𝜱 𝒎 . Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 82 Note that for the meandering channels only two directions are selected, i.e., θ=0 ° and θ=90 ° , while for the non-meandering channels, four distinct channel directions are considered, i.e., θ = 0 ° ,45 ° ,90 ° ,135 ° . Figure 3.1(b) depicts four samples realizations (out of 𝐿= 500) for each set. These realizations are generated from the corresponding TIs using the Single Normal Equations Simulation (SNESIM) algorithm [Strebelle, 2002], with the specified global rotations. Figure 3.1(c) shows the first 25 leading TSVD basis elements for each geologic scenario. The plot to the right in Figure 3.1(c) displays the leading TSVD basis elements, denoted as 𝚽 U , that would be obtained if all the prior model realizations with different geologic scenario were mixed before implementing SVD. While the individual bases 𝚽 -,#Õ-Õ#G preserve the main geologic features for each scenario, the mixed basis 𝚽 U tends to aggregate geologic features from distinct scenarios, resulting in a loss of channel connectivity. Figure 3.2. (a) Eigen-spectrum of the prior models for groups 𝑼 𝟏 (meandering channel) and 𝑼 𝟒 ((intersecting channel); (b) Corresponding TSVD approximation quality for different level of energy. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 83 An important aspect of the TSVD representation is the number of retained basis elements. This choice typically depends on the decay rate of the singular values that contain the variance (energy) of the underlying spatial images. To retain the same level of variance in a truncated approximation, complex geologic features such as meandering channels tend to require more expansion terms that straight channels. Figure 3.2(a) plots, for samples from two geologic scenarios, the cumulative energy against the number of retained basis elements. Since the meandering channel feature is more complex than the intersecting channels, a larger number of basis elements are needed to represent the meandering channel for the same energy level. Figure 3.2(b) shows the approximation quality for the two cases using different energy levels. In this chapter, we chose to retain 80% of the energy in truncating the TSVD bases for each group. Figure 3.3. Approximation of sample models with TSVD bases from different groups: (a) meandering channel example; (b) intersecting channel example; (c) bar plots of the normalized data misfit (RMSE) and the 𝒍 𝟐 -norm of the best approximation coefficients. The central part of the group-sparsity formulation is the mixed 𝑙 # /𝑙 % norm minimization. The use of this mixed norm implies that, amongst all the groups proposed, the correct group Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 84 provides the lowest 𝑙 % -norm for the same approximation quality. This is explored in Figure 3.3, where for a sample model from the first (top) and fourth (bottom) groups, the best approximation with each geologic scenario is shown on the left. The corresponding root-mean-squared-error (RMSE) and 𝑙 % -norm of the expansion coefficients are also displayed as bar plots. In each case, the best data match is achieved by the correct group. To further explore the robustness of group- sparsity algorithm, Figures 3.4(a) though 3.4(c) (first column) depict three reference permeability fields that are approximated (compressed) using 10 prior geologic scenarios demonstrated in Figure 3.1. Hence, in this example, the observed data are the parameter values at all grid blocks (𝐝 stu =𝐮) and 𝐠(𝚽𝐯)=𝚽𝐯 in Equation (3.2). Also, in this case, no noise is added to the observed data. Figure 3.4. Best approximate representations of three models using group-sparsity regularization: (a) reference model consistent with group 𝑼 𝟒 ; (b) reference model has features similar to those in groups 𝑼 𝟑 and 𝑼 𝟔 ; (c) reference model with completely different geologic features than those in prior groups. Figure 3.4(a) shows the reference and best achievable permeability map for the case where the connectivity structure is consistent with those in Group 4 (intersecting channels in θ= 45 X direction). In this case, group-sparsity is able to pick the basis elements in the correct group. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 85 In Figure 3.4(b), the reference model has both intersecting channel features in directions θ=90 G and θ= 135 G . From the reconstruction results, the active bases are from Group 4 and Group 6, which is consistent with the features in the reference model. Finally, Figures 3.4(c) shows a multi- Gaussian reference permeability model, which clearly does not belong to any of the groups. The reconstructed coefficients show that several groups are selected to provide a rough approximation for this model, suggesting that a group-sparse formulation is not appropriate. This typically indicates that a similar solution cannot be obtained by using only a small subset of prior models. One approach to resolve this problem is to revise the initial geological scenarios, based on the obtained results as well as additional geologic input, and repeat the solution. Figure 3.5. The convergence behaviour of group-sparsity during sample iterations, including the misfit term, the regularization terms, reconstruction coefficients and spatial maps for three different initializations: Case 1, the initial coefficients are uniformly distributed across the groups and have relatively small values; Case 2, initial coefficients are uniformly distributed and have relatively large values; Case 3, initial coefficients are zero for all groups except for an inconsistent group, for which relatively large values are assigned. To investigate the behavior of the misfit and regularization terms in the objective function Figure 3.5 shows, with different initializations, the evolutions of these functions along with the expansion coefficients, the resulting reconstruction results, and the contribution (𝑙 % -norm) of each group to the reconstruction. Three different initializations are shown for each example: very small Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 86 (Case 1), very large (Case 2), and randomly assigned (Case 3) coefficients (i.e., regularization term). While the final solution and the selected groups are the same, the behavior of the regularization and misfit terms depends on the initialization. When the initial value of the regularization term is small (Case 1), this term is first increased by activating a few potential groups and later decreased by keeping only the consistent group. On the other hand, when the regularization term is initially large and has several active groups (Case 2), the algorithm tends to reduce the regularization term by eliminating groups with insignificant contributions. In Case 3, the solution is initialized with one inconsistent active group, resulting in a large regularization term. In this case, the regularization term is also decreased by reducing the contribution of the inconsistent group while slightly activating other groups. The regularization term will continuously decrease by eliminating irrelevant groups and identifying the consistent geologic scenario. A similar behavior is observed with several other examples, which is also in agreement with the discussion in the previous section. Overall, regardless of the initialization, the solution algorithm first activates a relatively large number of basis elements (groups) and then selects the groups that provide a better match to data and a minimum length. At later iterations, the group selection is stabilized and the coefficients are refined around the final selected group(s). of the TSVD basis (with larger singular values) are penalized less harshly. 3.3. Numerical Results Two sets of numerical examples will be discussed in this section to examine the effectiveness of group-sparsity regularization for geologic scenario identification. The first experiment involves a travel-time tomographic inversion as a linear inverse problem while the second example consists of a groundwater well pumping test in which the transmissivity of an aquifer is estimated from Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 87 pressure and transmissivity observations. For the pumping test, three separate examples are discussed, a channel type transmissivity field in a 2D aquifer, a 3D test case with a multi-Gaussian transmissivity field and another 3D example with channel type structures. The results are presented and compared for group-sparsity (mixed 𝑙 # /𝑙 % -norm) and regular 𝑙 # -norm sparsity regularizations. Figure 3.6. (a) Tomographic inversion setup; (b) Three anisotropic variogram models with specified ranges; (c) Four sample realizations (out of 500) from 12 different groups that are obtained by assigning four anisotropy directions 𝜽= 𝟎 ° ,𝟒𝟓 ° ,𝟗𝟎 ° ,𝟏𝟑𝟓 ° to each variogram model. The realizations are generated using the sgsim algorithm. Tomographic Inversion: In straight ray tomography, the slowness (1/velocity) map of a subsurface environment is inferred from observations of acoustic wave travel time on straight paths (see Section 1.1 for forward equations). Figure 3.6(a) shows the configuration of this example with a set of transmitter (left) and receiver (right) arrays. A 1000×1000 m % domain is discretized into 100×100 cells of dimensions 10×10 m % . Two sets of experiments are presented, one with multi-Gaussian reference model and prior geologic scenarios (variogram), and a second with complex channel connectivity patterns in the reference and prior geologic scenarios (i.e., TIs). In both cases, 8 transmitters and 8 receivers are used, resulting in 64 arrival time Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 88 observations. Figures 3.6(b) and 3.6(c) display the parameters of the prior variogram models and their corresponding model realizations. Case 1: Multi-Gaussian Slowness Map: This example is used to demonstrate the application of the proposed method to distinguish variogram models that control the connectivity in flow properties. The reference slowness map for this case is shown in Figure 3.7(a) (top left). This map is generated using sequential Gaussian simulation with the variogram model 𝑦 % (ℎ) in Figure 3.6(b). The anisotropy direction in the reference model is θ= 45 ° , however the variogram model parameters and the direction of maximum continuity is unknown. Hence, prior model realizations are generated using three variogram models 𝑦 # (ℎ),𝑦 % (ℎ) and 𝑦 @ (ℎ) (Figure 3.6(b)), each with four anisotropy directions, θ=0 ° ,45 ° ,90 ° ,135 ° , resulting in a total of 12 variogram models (groups). Each variogram is used to generate 500 prior realizations, which are vectorized and used as columns of matrices 𝐔 # ,…,𝐔 #% (see Figure 3.6(c) for sample realizations). These realizations are conditioned on hard data; however, other data types such as tomographic ray arrival times are not incorporated and will be used for variogram identification. The TSVD parameterization is then separately applied to each group to construct the 12 different bases that are combined into 𝚽∈𝐑 #GGGG×#GYX . In each group, the number of retained basis elements for the parameterization is selected to preserve 80% of the variance within that group. Also, to construct 𝚽 U , the SVD decomposition is performed after mixing the realizations from all groups, and the leading 100 basis components are selected. The left column in Figure 3.7(a) displays the reference model (top) and the best achievable solutions with 𝚽 (middle) and 𝚽 U (bottom). Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 89 Figure 3.7. Tomographic inversion results for: (a) multi-Gaussian example; and (b) channel example. In each case, the figures on the left show the reference models (first row) and best approximation maps (assuming full knowledge of all grid cells) with 𝜱 (second row) and 𝜱 𝒎 (third row) while the plots in the second column show the reconstructed maps with 𝒍 𝟏 /𝒍 𝟐 − 𝜱 (top), 𝒍 𝟏 −𝜱 (middle) and 𝒍 𝟏 −𝜱 𝒎 (bottom); the corresponding reconstructed coefficients are shown on the rightmost column. The correct variogram model for this example belongs to Group 𝐔 X . The reconstruction results with the 𝑙 # /𝑙 % -norm minimization mainly picks 𝚽 X as the relevant basis with some active elements from the 𝚽 Y basis. This suggests that some of the elements in 𝚽 Y , have important Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 90 contribution to reproducing the data. Note that the anisotropy ratio for 𝚽 X and 𝚽 Y is the same; however, these groups have different directions of anisotropy. Also, the top left corner of the reference map contains similar features to those in 𝚽 Y . When the 𝑙 # -norm regularization is used instead of 𝑙 # /𝑙 % -norm (middle row), the reconstruction algorithm selects several basis elements from different (and inconsistent) variograms, lacking a major contribution from the consistent basis 𝚽 X . Although the estimated parameter in this case is similar to the case with the 𝑙 # /𝑙 % -norm regularization, the model selection property is not observed in this case. The reconstruction results with 𝚽 U (obtained by aggregating all prior models) does not reveal the main continuity feature in the reference model, demonstrating the sensitivity of the TSVD parameterization to prior model uncertainty. Case 2: Channel-Type Slowness Map: The reference slowness map in this case is shown in Figure 3.7(b) (top left). The low (blue regions) and high (red regions) slowness values are 0.5 and 1 𝑚s/m, respectively. The prior TIs and their corresponding realizations are depicted in Figure 3.1. The TSVD parametrization is designed to preserve 80% of energy within each prior model. The group-sparsity basis consists of 466 elements, i.e., 𝚽 =[𝚽 # 𝚽 % …𝚽 #G ] ∈𝐑 #GGGG× WXX . For the basis generated from the mixed realizations, 𝚽 U , 76 leading elements are selected. The reference map and the best representations with these two different bases are depicted in Figure 3.7(b) (left column) and show acceptable approximations. The reference model in this experiment has similar connectivity patterns to those in the fifth group, i.e., 𝐔 j . The reconstructed slowness map with the group-sparsity (𝑙 # /𝑙 % -norm) regularization is able to clearly detect the straight left-to-right channels, with a less pronounced but clearly visible connectivity between them. The reconstruction coefficients are primarily selected from the consistent basis 𝚽 j . On the other hand, when 𝑙 # -norm regularization is used Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 91 (Figure 3.7(b), second row and second column), the reconstruction quality is not as good. More importantly, the reconstructed coefficients are distributed across several groups and do not identify any group as particularly relevant. The inversion with the mixed basis 𝚽 U fails to capture the correct connectivity pattern, which is consistent with the previous outcomes. Figure 3.8. Effect of data content and model complexity in tomographic inversion example; two different meandering channels are used as reference slowness maps to illustrate the effect of data acquisition configuration (when two channels are intersected by the same ray path). The example in (a) shows the limitation of the TSVD basis for parameterization of complex meandering channels in ill-posed problems. Complex geologic patterns, such as meandering channels, present a parameterization challenge. While SVD-based parameterization is effective for multi-Gaussian variables or possibly straight channels, it is not suitable for preserving more complex meandering or curvilinear connectivity structures. This was demonstrated in Figure 3.2, where the TSVD basis elements for the meandering features do not contain these connectivity patterns. Figure 3.8(a) illustrates this issue with the tomographic reconstruction results, where a meandering channel is reconstructed using group-sparsity. Two meandering channels represent the reference slowness models, which are consistent with groups 𝐔 # or 𝐔 % . The reconstruction results are shown in Figure 3.8(a) and 3.8(b). A number of important observations from these two experiments are in order. First, the inadequacy of the TSVD parameterization in representing meandering features can be seen in Figure 3.8(a). Although mainly the second group is selected, which consists of meandering Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 92 features, the captured connectivity pattern in the center of the field is not crisp. This implies that the proposed group-sparsity formulation is affected by the limitations of the TSVD parameterization. While this limitation affects the performance of the group-sparsity formulation, it is caused by the parameterization and not the regularization method. Moreover, from the results in Figure 3.8, it can be seen that when the channels are parallel to the ray paths (Figure 3.8(a)) the solution can capture the existing channels. However, when two channels are perpendicular to the ray paths, e.g., in Figure 3.8(b), the arrival time data from alone may not be sufficient to reveal the presence of two distinct channels. Aquifer Pumping Test in a Single-Phase Flow: We now shift our discussion to nonlinear problems, specifically focusing on single phase flow in groundwater aquifers. Three pumping-test experiments are considered, one using a 2D model (Case 3) and two with 3D models (Cases 4 and 5). In each case, a pumping well is placed at the center of the model domain and is extracting water with a constant flowrate of 0.0578 U I u at the steady state aquifer conditions. The boundary conditions on the left and right of the domain are assigned constant pressures of 𝑝 # = 20 m and 𝑝 % = 10 m, respectively, resulting in a background flow from left to right. The unknown parameters of interest are the spatial distribution of log-transmissivity and the relevant prior geologic scenario(s), which are to be inferred from observations of transmissivity and pressure head at scattered monitoring well locations surrounding the pumping well. The governing equations of the forward model are derived from Darcy’s law and mass balance principle for a single-phase flow in heterogeneous and saturated porous environment (see Section 1.1). Case 3: 2D Channel-Type Transmissivity: The experimental setup for this case is depicted in Figure 3.9(a). The aquifer has dimensions 1000 m×1000 m×10 m and is discretized into a domain with 100×100×1 uniform grid cells. The reference transmissivity Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 93 field is shown in Figure 3.9(b) (top left) and has low and high log-transmissivity values of -1.7047 (background) and -0.4037 (channel), respectively. The corresponding reference and initial pressure head fields are also shown in Figure 3.9(b). The prior realization sets are the same as those depicted in Figure 3.1, with a total number of 10 geologic scenarios (groups). The TSVD parameterization is used to retain 80% of the energy for each group, resulting in 𝚽∈𝐑 #GGGG×WXX . The best transmissivity map that can be constructed in the subspace defined by these bases is displayed in Figure 3.9(b) (top right). Data from 30 monitoring wells, randomly located around the pumping well (see Figure 3.9(b), bottom left), are used for inversion. For this experiment, the reference model does not belong to any of the existing TIs; rather, it contains both intersecting and straight channel features, which can be reconstructed with the combination of geologic scenarios 7 and 9, that is using 𝚽 j and 𝚽 @ bases. Figure 3.9. Model calibration results for a pumping test experiment: (a) 2D configuration of pumping test; (b) the reference transmissivity field and best achievable approximation of it (top), and the corresponding reference and initial pressure head fields (bottom); (c) transmissivity fields and pressure head after calibration using group-sparsity (𝒍 𝟏 /𝒍 𝟐 - norm) and regular 𝒍 𝟏 -norm sparsity inducing reconstruction. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 94 Figure 3.9(c) depicts the reconstruction results for the log-transmissivity and pressure head fields using 𝚽 and regularization with 𝑙 # /𝑙 % -norm (top) and 𝑙 # -norm (bottom). The reconstruction solution with the group-sparsity regularization is clearly superior to that obtained by using the 𝑙 # - norm regularization. The stronger constraining power of group-sparsity clearly shows its advantage in this example as the limited scattered measurements have resulted in an underdetermined inverse problem. The group-sparse solution identifies significant coefficients from three groups, 𝚽 j ,𝚽 X and 𝚽 @ . While groups 𝚽 j and 𝚽 @ are clearly relevant to the solutions, group 𝚽 X also has channel elements along θ =135 ° direction, which is consistent with the channel directionality in parts of the reference model. On the other hand, the inversion result with 𝑙 # -norm regularization selects several elements from various inconsistent groups and fails to reconstruct an acceptable connectivity pattern or identify a consistent prior model. Figure 3.10 displays the behavior of the group-sparsity solution method as the gradient-based iterations proceed. The initial solution is picked from Group 1, which is incorrect. In early iterations, the group-sparsity algorithm uses the sensitivity information to activate (with very small coefficients) a number of potentially relevant groups, while reducing the contribution of Group 1. This leads to an overall decrease in the regularization term and a drop in the misfit term. As the iterations proceed, the data misfit term increases, and the regularization term decreases in such a way that the overall objective function decreases with every iteration. The decrease in the objective function at later iteration is achieved by retaining and fine-tuning the final group(s) and removing the inconsistent ones. An important issue that affects the final solution is related to well configuration. For a given number of wells, the solution of the inverse problem depends on the distribution and relative position of the wells. However, the pressure and flow data tend to provide more global information Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 95 about the communication among the wells and the connectivity of the flow property fields, which may be used to discriminate against alternative scenarios. Figure 3.10. The behaviour of the group-sparsity reconstruction during model calibration in the 2D pumping test: the first eight rows show the reconstructed coefficients (left) with their corresponding spatial maps (middle) and group contributions (left) for select iterations. The last row displays the evolution of the misfit term, regularization term, and the overall objective function. Case 4: 3D Multi-Gaussian Transmissivity: In this experiment, an aquifer with dimensions 1000 m×1000 m×10 m is discretized into a 100×100×10 uniform grid system. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 96 Figure 3.11. Reconstruction results for the 3D pumping test: (a) 3D aquifer configuration for the pumping test; (b) reference transmissivity field and the best achievable approximation (assuming perfect knowledge of the field), as well as the reference and initial pressure head fields; (c) reconstruction results for transmissivity and pressure head fields with group-sparsity (𝒍 𝟏 /𝒍 𝟐 -norm) and regular sparsity (𝒍 𝟏 -norm) regularization. Figure 3.11(a) depicts the schematic of the domain and pumping test configuration. The reference log-transmissivity field has a multi-Gaussian structure and is shown in Figure 3.11(b) (top left) along with the best achievable reconstruction, as well as the corresponding pressure heads. The reference log-transmissivity map is generated using Variogram 1 in Table 3.1 with a θ = 45 ° direction of anisotropy. Three variogram models with parameters defined in Table 3.1 are used to represent the anisotropy ratios. For each variogram, four anisotropy directions, i.e., θ= 0 ° ,45 ° ,90 ° ,135 ° , are considered. As a result, 12 variogram models are used to represent the uncertainty in the prior geologic continuity scenario. Hence, 12 different prior distributions with various means and covariance matrices are suggested as alternative prior models. After stacking the bases generated by TSVD parameterization to retain 80% energy within each group, 1884 basis Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 97 elements are included in 𝚽. In general, to preserve 80% of the energy in each group, the prior groups with large-scale continuity patterns (Variogram 1) contain fewer basis elements than those with small-scale features (Variogram 3), see Figure 3.11(c) (right column). The observed data are noise-corrupted (with 2.5% noise) log-transmissivity and pressure head at monitoring well locations. Table 3.1. Variogram parameters for Case 4 in Section 3.3 (3D pumping test). Property 𝜸 𝟏 𝜸 𝟐 𝜸 𝟑 𝒂 𝒎𝒂𝒙 800 400 200 𝒂 𝒎𝒆𝒅 400 100 50 𝒂 𝒎𝒊𝒏 400 100 50 The reconstruction results with group-sparsity and regular 𝑙 # -norm are shown in Figure 3.11(c). The solution is initialized by giving equal weights to each group. At convergence, the group-sparsity regularization is able to eliminate inconsistent groups (Variograms 2 and 3) and retain mainly Group 2, corresponding to Variogram 1. On the other hand, the solution of 𝑙 # -norm regularization contains several groups in the final solution. While the solution with 𝑙 # -norm recovers some of the trend that are similar to the spatial continuity features in the reference model, the overall connectivity patterns are not captured as accurately as they are with group sparsity. A comparison of the RMSE values from the two solutions also reveals the superiority of the results with group-sparsity. Finally, as in previous examples, no dominant prior model is identified with 𝑙 # -norm regularization, showing a completely different behavior than group-sparsity, in which distinct geologic scenarios with significant contribution to reproducing the data are selected. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 98 Figure 3.12. Two samples realizations (top and bottom) from prior scenarios 1, 2, 16, and 27 for the channel type 3D transmissivity field in Case 5. The training images 𝑻𝑰 𝟏 , 𝑻𝑰 𝟐 and 𝑻𝑰 𝟑 represent meandering, intersecting and straight channels, respectively (as depicted in Figure 3.1(a)). Case 5: 3D Channel-Type Transmissivity: In this example, an aquifer with dimensions 1000 m×1000 m×10 m is discretized into 100×100×9 cells (see Figure 3.11(a) for schematic of the experiment). Each subsequent three layers in the formation are assumed to belong to the same geologic unit with distinct connectivity patterns. Three distinct TIs (Figure 3.1(a)) are used to represent the connectivity in the geologic units; however, it is not known a-priori which TI represent each of the three geologic units. With this setup, a total of 27 possible prior models are proposed and 500 realizations are generated to represent each prior model (see Figure 12 for two sample realizations). The TSVD parameterization basis for each prior is designed to preserve 80% of energy for each prior, leading to the combined 2411 basis elements, that is 𝚽∈𝐑 @GGGG×%W## . The reference transmissivity field is depicted in Figure 3.13(a). The first (layers 1-3), second (layers 4-6) and third (layers 7-9) geologic units consist of intersecting, straight and meandering channels, respectively. The detailed plot of the reference and initial transmissivity models and their corresponding pressure head fields are displayed in Figure 3.13(b). The initial model is assumed Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 99 to have meandering channel structure in all the layers. The reference model is similar in its connectivity structure to those belonging to group number 16. The observations are collected from wells that are fully penetrated in the entire formations and are distributed in the field as shown in Figures 3.13(a)-(b). As a result, there are 30×9=270 measurements of pressure head that can be used for prior model identification and model calibration. Figure 3.13. Reference transmissivity and pressure field for the 3D pumping test with channel-type connectivity. The locations of monitoring and pumping wells are shown with black dots and a red star, respectively. Figure 14(a) depicts the reconstruction results for the transmissivity field at different iterations of the model calibration optimization. The connectivity type in layers 1-3 and 4-6 of the solution are consistent with those in the reference model. Layers 7-9 contain complex meandering channel features that are not amenable to PCA representation. Figure 3.14(b) illustrates the 𝑙 % - norm of the coefficients for the 27 prior models in the iterations. The final result (the last row of Figure 3.14(b)) indicates that the most dominant coefficients belong to the group number 16; however, elements from groups 10, 11, 12 and 22 also tend to contribute to the solution. Layers (1-3,4-6,7-9) of the prior models 10, 11, 12 and 22 are generated from the training images (TI % , TI # , TI # ), (TI % , TI # , TI % ), (TI % , TI # , TI @ ) and (TI @ , TI % , TI # ), respectively. The basis functions corresponding to layers with intersecting and straight channels are better able to represent the Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 100 common patterns in their group. On the hand, the basis functions for the meandering channels do not capture the meandering features properly and tend to contribute to representation of non- meandering channels. The pressure field corresponding to the solution is depicted in Figure 3.14(c), and shows a significant improvement over the initial pressure field in Figure 13(b). Figure 3.14. Reconstruction results: (a) evolution of the transmissivity field with iterations (each row shows the results for layers 1-3, 4-6 and 7-9); (b) evolution of the 𝒍 𝟐 -norm of the coefficients corresponding to the 27 prior groups (geologic scenarios) with iterations; (c) the pressure head field predicted from the final transmissivity field; (d) the initial and final active coefficients. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 101 3.4. Summary and Discussion Using flow-related data to constrain prior geologic scenarios is not typically performed in conventional subsurface flow model calibration. By exploiting the concept of group-sparsity in sparse reconstruction literature, we developed a systematic framework for simultaneously identifying geologic scenarios and estimating a calibrated model. The method generates several hundreds of realizations from each geologic scenario and uses them to construct a TSVD basis that can separately parameterize the models within each group. By combining the TSVD bases for each group a large hybrid basis is generated that is capable of compactly representing models from any group, or a model with features that cannot be found in a single group, but may be represented using multiple groups. However, since the groups are vastly different from each other only a small number of groups are needed to accurately represent a given model (or the inversion solution). This property translates into a group-sparse behavior that can be effectively induced using a mixed 𝑙 # /𝑙 % -norm regularization term, which is minimized concurrently with a data mismatch objective function in an inverse problem. The results in this chapter clearly illustrate the properties of the group-sparse regularization and the advantages it has over regular 𝑙 # -norm based sparsity regularization. It is shown that promoting group-sparsity can be more powerful in constraining the solution of ill-posed model calibration problems. A particularly useful property of group-sparsity is its ability to discriminate against multiple prior geologic scenarios that account for the uncertainty in the knowledge of geologic continuity. Uncertainty in prior geologic scenario is an important topic that has not received its well-deserved attention in the literature. Traditionally, a given geologic scenarios, which is derived from limited available data, subjective interpretation, and imperfect modeling assumptions, is used to constrain the solution of model calibration problems. As such, the Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 102 uncertainty in the geologic scenario is rarely studied in the context of model calibration. Interestingly, this source of uncertainty can have a detrimental impact on the quality of model calibration solutions and the related future predictions. Furthermore, the constraining power of flow data can be very informative in selecting plausible geologic scenarios and developing sound predictive models. While in this study geologic continuity was represented with either TIs or variogram models, in general, uncertainty in geologic scenario can arise from various sources in subsurface flow model development workflows. Prior geologic scenarios can be derived from stochastic representation of process-based geologic models, or by alternative interpretations of the global connectivity structure by different geologists, or the combination of them. The main message of this chapter is that, regardless of how the uncertainty in the geologic scenario is introduced and represented, the uncertainty in global geologic parameters can and should be incorporated in the model calibration formulation as it can have significant consequences in predicting flow and transport behavior of weakly-constrained and poorly-observed subsurface models. The proposed group-sparsity formulation in this chapter offers an effective workflow to account for uncertainty in prior geologic scenarios. More importantly, it illustrates the feasibility of constraining uncertain geologic scenarios with flow related data, a novel concept that goes beyond the conventional subsurface model calibration formulations. Chapter 3: Group-Sparse Feature Learning for Geologic Scenario Selection 103 Chapter 4: Promoting Discrete Features through Potential Functions 104 CHAPTER 4 PROMOTING DISCRETE FEATURES THROUGH POTENTIAL FUNCTIONS Subsurface flow model calibration problems involve many more unknowns than measurements, leading to ill-posed problems with non-unique solutions. To alleviate the non-uniqueness issue, the problem is regularized by constraining the solution space based on prior knowledge. In certain sedimentary environments, such as fluvial systems, the contrast in hydraulic properties of different facies types tends to dominate the flow and transport behavior, with within facies heterogeneity playing a less significant role. Hence, flow model calibration in those formations reduces to delineating the spatial structure and connectivity of different lithofacies types and their boundaries. A major difficulty in calibrating such models is honoring the discrete, or piecewise constant, nature of facies distribution. The problem becomes more challenging when complex spatial connectivity patterns with higher-order statistics are involved. In this chapter, we propose a novel formulation for calibration of complex geologic facies by imposing appropriate constraints to recover plausible solutions that honor the spatial connectivity and discreteness of facies models. To incorporate prior connectivity patterns, we learn plausible geologic features from available training models. This is achieved by learning spatial patterns from training data, e.g., 𝑘-SVD sparse learning or the traditional Principal Component Chapter 4: Promoting Discrete Features through Potential Functions 105 Analysis (PCA). To impose solution discreteness, we introduce discrete regularization penalty functions that encourage piecewise constant behavior in the solution while minimizing mismatch between observed and predicted data. We solve the resulting regularized least-squares minimization problem by invoking variable splitting to arrive at a flexible and efficient gradient- based alternating directions algorithm. Numerical results show that incorporating facies discreteness as regularization leads to geologically consistent solutions that improve facies calibration quality. 4.1. Connectivity-Preserving Constraint Seeking high-resolution model parameters from limited observations leads to model calibration problems that are underdetermined. Several inversion techniques are developed in subsurface flow modeling literature to estimate unknown rock properties (model parameters), e.g., hydraulic conductivity, from observed response data. To tackle the resulting underdetermined inverse problem, various parameterization and regularization techniques have been introduced to constrain the solution. An important goal of parameterization and regularization methods is to impart expected structural attributes on the solution. In this chapter, we consider two such attributes in developing our method: (i) preserving pre-specified spatial connectivity patterns in the solution, and (ii) solution discreteness. Specifically, we demonstrate that the discrete regularization terms are able to force the connectivity better and decrease the ill-posedness level of subsurface inverse problems. When prior information on the expected connectivity patterns in the solution is available, it is important to constrain the solution to preserve such patterns. While simple mathematical expressions, such as Tikhonov regularization forms, exist to promote generic attributes, e.g., Chapter 4: Promoting Discrete Features through Potential Functions 106 smoothness, on the solution, specialized constraints that can be learned from reliable prior information are more effective. A classical approach to learn compact representations for spatial connectivity patterns from prior models is through low-rank Singular Value Decomposition (SVD). In [Khaninezhad et al., 2012], training data were used to learn sparse representations to incorporate and preserve connectivity patterns in solving inverse problems. Specifically, the 𝑘- SVD algorithm [Ahron et al., 2006] was used to learn sparse representation for complex spatial patterns in rock property distributions from prior training data. The advantages of using the 𝑘-SVD algorithm to learn sparse representation are reported in [Khaninezhad et al., 2012]. In addition, more details about 𝑘-SVD algorithm are provided in Appendix E. The SVD and 𝑘-SVD expansion functions are constructed from a set of 𝐿 training model realizations that provide spatial connectivity constraints for parameters. These models can be constructed by combining static measurements of rock properties with a conceptual model of spatial continuity, using geostatistical simulation techniques. In the case of SVD, a reduced subspace is predefined by the 𝑆 leading left singular vectors of the training data matrix. The coordinates of this subspace compactly describe high-dimensional spatial patterns in the training data. On the other hand, in the 𝑘-SVD algorithm, the linear expansion functions are learned (i.e., derived) from the parameter training dataset to provide sparse expansion coefficients without any particular order assigned to the expansion functions. Thus, the solution subspace is not pre- determined and has to be found during model calibration. Sparse reconstruction algorithms have been developed to search a relatively large number of the 𝑘-SVD expansion functions to identify a subspace (i.e., small subset of expansion functions) that best represents the solution. Sparse reconstruction algorithms use sparsity-promoting regularization forms that have selection property to identify the best combination of expansion functions based on observed response data [Ahron Chapter 4: Promoting Discrete Features through Potential Functions 107 et al., 2006; Li and Jafarpour, 2010; Elsheikh et al., 2013; Elsheikh et al., 2014; Khaninezhad et al., Khaninezhad and Jafarpour, 2014; 2012; Golmohammadi et al., 2015; Golmohammadi and Jafarpour, 2016]. In our formulation, to preserve geologic connectivity we use both SVD and 𝑘- SVD methods (Appendix E provides a brief overview of the 𝑘-SVD algorithm). When 𝑘-SVD representation is used in our formulation, a sparse reconstruction algorithm is applied to find the solution, whereas for the SVD parameterization no regularization term is needed as the expansion functions are predetermined. The least-squares objective function that combines the data mismatch and parameterization is expressed as: 𝐯 ^,𝐮 É = min 𝐯,𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝚽𝐯))´ % % + 𝜆 _ % 𝐽(𝐯)+𝜆 𝚽 % ‖𝚽𝐯−𝐮‖ % % (4.1) Here, 𝐽(𝐯) is a general regularization constraint on the transformed coefficients 𝐯, and 𝚽 represents the matrix containing the expansion functions; 𝐮 represents the discrete parameters, and 𝚽𝐯 denotes its continuous approximation. The regularization constraint 𝐽(𝐯) depends on the characteristics of the basis functions in 𝚽. For example, for 𝑘-SVD basis functions, 𝑙 # -norm can be used to promote sparsity, i.e., 𝐽(𝐯)=‖𝐯‖ # . This regularization is used to only retain significant 𝑘-SVD expansion functions during model calibration. When SVD parametrization is used, 𝐽(𝐯) is not needed as the significant SVD basis functions are predetermined as the 𝑆 leading left singular vectors of the matrix containing the prior parameter realizations in its columns. It is important to note that in each case, the expansion functions 𝚽 are derived (learned) from parameter training data and contain the expected spatial connectivity patterns while the expansion coefficients 𝐯 are the corresponding weights given to each element. The penalty term 𝜆 𝚽 % ‖𝚽𝐯−𝐮‖ % % is introduced to ensure that approximation with linear expansion remains close to the discrete solution. The Chapter 4: Promoting Discrete Features through Potential Functions 108 regularization parameters 𝜆 _ % and 𝜆 𝚽 % are positive weights that control the importance of their corresponding regularization terms. 4.2. Discrete Regularization The objective function in Equation (4.1) needs additional regularization terms to promote solution discreteness. When the discrete values that the solution can take (number of facies and their corresponding property values) are known a priori 𝐂={𝑙 # ,…,𝑙 ` }, we can introduce an integer constraint to require each component of the solution to belong to the set 𝐂 ={𝑙 # ,…,𝑙 ` }, that is 𝑢 - ∈𝐂 ={𝑙 # ,…,𝑙 ` } [Hemmecke et al., 2010]. A hard integer constraint can be shown to lead to a NP-hard problem [Hemmecke et al., 2010]. A soft version of the constraint can be introduced through discrete regularization. A simple regularization function, denoted as 𝐷(𝐮), can be considered in which the function value is zero when the model parameters, 𝐮, assume discrete values (facies indicators) and positive non-zero values otherwise. A typical discreteness regularization consists of the sum of non-negative penalties for each single grid block in the model, i.e., 𝑢 - in 𝐮, that is [Lukić, 2011]: 𝐷(𝐮)=∑ 𝐷 ÿ (𝑢 - ) ) -1# (4.2) , where 𝑛 is the total number of grid blocks, and 𝐷 ÿ (𝑢 - ) is a single-variable locally convex (well potential) function with its minimums at (discrete) facies values 𝑙 # ,…,𝑙 ` . Such a function can be generally defined as (the subscript 𝑝 refers to the type of function, i.e., well potential in this case): £ 𝐷 ÿ (𝑢 - ) =0 if 𝑢 - ∈𝐂= {𝑙 # ,…,𝑙 ` } 𝐷 ÿ (𝑢 - ) >0 if 𝑢 - ∉𝐂={𝑙 # ,…,𝑙 ` } ∃ 0< 𝜖 ≪1 ∶ ∀𝑧 ∈.𝑙 K −𝜖,𝑙 K +𝜖/ 𝐷 ÿ (𝑢 - ) is convex (4.3) Chapter 4: Promoting Discrete Features through Potential Functions 109 After adding the discrete regularization function as a constraint, the objective function in Equation (4.1) can be expressed as: 𝐯 ^,𝐮 É = min 𝐯,𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝚽𝐯))´ % % + 𝜆 _ % 𝐽(𝐯)+𝜆 f % 𝐷(𝐮)+𝜆 𝚽 % ‖𝚽𝐯−𝐮‖ % % (4.4) Figure 4.1. (a) a discrete regularization function; (b) Sample discrete regularization functions for three-valued; (c) and four-valued facies; (d) the behaviour of alternative discrete regularization functions at the neighbourhood of the 𝒊 𝒕𝒉 discrete level. Figure 4.1(a) depicts an example of 𝐷 ÿ (𝑢) for 𝑢∈.𝑙 K ,𝑙 K¾# /. In addition, Figures 4.1(b) and 4.1(c) demonstrate the general behavior of 𝐷 ÿ (𝑢) for 3 and 4 valued discrete levels, respectively. In this chapter, we consider fourth-order well potential functions of the form: 𝐷 ÿ (𝑢) =£ ℎ K »𝑢−𝑙 K¶# ½ % »𝑢−𝑙 K ½ % for 𝑢 ∈.𝑙 K¶# ,𝑙 K /, 𝑗 ∈{2,…,𝑐} 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.5) , where ℎ K denotes the weight given to each part of the function to control the penalty for deviation from different discrete levels. For two facies models with 𝑙 # =0 and 𝑙 % = 1, the regularization function takes the form (assuming ℎ K =1): 𝐷 ÿ (𝑢) =£ 𝑢 % (𝑢−1) % for 𝑢∈[0,1] 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.6a) Chapter 4: Promoting Discrete Features through Potential Functions 110 When three facies models are considered with 𝑙 # = 0, 𝑙 % =2, and 𝑙 @ =3, the function is expressed as: 𝐷 ÿ (𝑢) = ⎩ ⎪ ⎨ ⎪ ⎧ ℎ # (𝑢) % (𝑢−2 ) % for 𝑢∈[0,2] ℎ % (𝑢−2) % (𝑢−3) % for 𝑢∈[2,3] 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.6b) We note that the regularization functions in Equations (4.6a) and (4.6b) are zero for discrete values; they are convex and positive in the closed neighborhood of the discrete values. Furthermore, the size of penalty can be controlled by parameters ℎ # and ℎ % . Geologically, by adding the regularization term 𝐷(𝐮) to the objective function, the continuous values assigned to the distribution of rock permeability 𝐮 in Equation (4.4) are forced to converge to one of the facies discrete levels. Figure 4.1(d) depicts a few alternative functions that are introduced in the literature to promote solution discreetness in inverse problems. 4.3. Optimization Approach To solve the optimization problem in Equation (4.4), we use a two-step alternating directions algorithm to iteratively solve the following optimization sub-problems to update 𝐯 and 𝐮: Step (1): 𝐯 c¾# = argmin 𝐯 ´𝐂 𝛜 ¶ D E (𝐝 stu −𝐠(𝚽𝐯))´ % % +𝜆 _ % 𝐽(𝐯)+𝜆 𝚽 % ‖𝚽𝐯−𝐮 c ‖ % % Step (2): 𝐮 c¾# =argmin 𝐮 𝜆 f % 𝐷(𝐮)+𝜆 𝚽 % ‖𝚽𝐯 c¾# −𝐮‖ % % (4.7) We note that, for compactness, the constant terms in each of the above objective functions that do not affect the optimization solution are eliminated. It is also important to note that the term 𝜆 𝚽 % ‖𝚽𝐯−𝐮‖ % % couples the two optimization steps and ensures that the updates are applied gradually to avoid drifting too far from the solution in the previous update. We also note that the Chapter 4: Promoting Discrete Features through Potential Functions 111 solutions of the two optimization sub-problem depend on the regularization parameters 𝜆 _ % , 𝜆 f % , and 𝜆 𝚽 % . As in other nonlinear regularized least-squares problems, a practical approach for identifying reasonable choices for these regularization parameters can be obtained through sensitivity analysis and cross-validation. Our results suggest that within a reasonably wide range for these variables the solution does not show too much sensitivity to these parameters (see next section for the discussion). The main advantage of the alternating directions algorithm is that it divides complex objective functions into simpler functions that are easier to solve. For instance, the second step of the optimization in Equation (4.7) does not involve the data mismatch term, which is complex and computationally demanding to compute (due to required forward simulation). In some cases, some of the optimization sub-problems may have closed-from solutions. Furthermore, since the optimization problems are solved multiple times (and sequentially) it is not necessary to find the exact solution for each iteration and only a few iterations of the optimization sub-problems may be sufficient to find an acceptable solution for the next iteration. However, it is recommended that this property be verified for any given problem before its adoption. It is also worthwhile to mention that the proposed algorithm has some similarity with the well-known Alternating Directions Method of Multipliers (ADMM) [Boyd et al., 2011]. For model calibration, Step (1) of the above algorithm can be solved using a standard gradient-based method, such as Gauss-Newton, conjugate gradient, or BFGS algorithms. In our examples, we have solved Step (1) using the Gauss-Newton algorithm for only 2 iterations to get a “close-enough” approximate solution. Step (2) of the above algorithm can be separated into 𝑛 grid-cell level optimization problems because the regularization term, i.e., 𝐷(𝐮)=∑ 𝐷 ÿ (𝑢 - ) ) -1# , Chapter 4: Promoting Discrete Features through Potential Functions 112 operates on individual pixels of 𝐮. Therefore, if we define 𝛉 c¾# = [𝜃 # c¾# 𝜃 % c¾# …𝜃 ) c¾# ] ð =𝚽𝐯 c¾# , Step (2) of the optimization problem can be divided into 𝑛 subproblems as: ∀𝑖 ∈{1,2,…,𝑛}: 𝑢 - c¾# =argmin n æ 𝜆 f % 𝐷 ÿ (𝑢 - )+𝜆 𝚽 % Ü𝜃 - c¾# −𝑢 - Ü % % (4.8) By defining 𝛽 = o E 𝚽 E , the optimization problems in Equation (4.8) are further simplified to: ∀𝑖 ∈{1,2,…,𝑛}: 𝑢 - c¾# =argmin n æ Ü𝜃 - c¾# −𝑢 - Ü % % +𝛽𝐷 ÿ (𝑢 - ) (4.9) which can be easily solved by the following rules: (i) 𝑢 - c¾# =min {𝑙 # ,…,𝑙 ` } if 𝜃 - c¾# ≤min {𝑙 # ,…,𝑙 ` }, (ii) 𝑢 - c¾# =max {𝑙 # ,…,𝑙 ` } if 𝜃 - c¾# ≥max {𝑙 # ,…,𝑙 ` }, and (iii) 𝑢 - c¾# =argmin n æ Ü𝜃 - c¾# −𝑢 - Ü % % +𝛽𝐷 ÿ (𝑢 - ,𝑙 K ,𝑙 K¾# ) if 𝑙 K ≤𝜃 - c¾# ≤𝑙 K¾# . (4.10) Note that 𝐷 ÿ (𝑢 - ,𝑙 K ,𝑙 K¾# ) is defined as: 𝐷 ÿ (𝑢 - ,𝑙 K ,𝑙 K¾# ) = £ 𝐷 ÿ (𝑢 - ) for 𝑢 - ∈.𝑙 K ,𝑙 K¾# / 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.11) Figure 4.2. The behaviour of the discrete penalty function (𝜽 𝒊 −𝒖 𝒊 ) 𝟐 +𝜷𝒖 𝒊 𝟐 (𝒖 𝒊 −𝟏) 𝟐 ,𝒖 𝒊 ∈[𝟎,𝟏] for different values of 𝜷 and input argument 𝜽 𝒊 . In each plot, the x and y axes show the 𝒖 𝒊 values and the value of the objective function, respectively. As the value of 𝜷 decreases, the minimum of this function gets closer to the discrete values 0 or 1, depending on the value of the continuous variable 𝜽 𝒊 . Chapter 4: Promoting Discrete Features through Potential Functions 113 Figure 4.2 depicts the objective function Ü𝜃 - c¾# −𝑢 - Ü % % +𝛽(𝑢 - ) % (𝑢 - −1) % for different values of 𝜃 - c¾# ∈[0,1] and 𝛽 ∈{100,0.1,0.01,0.001}, for discrete levels 0 and 1. As shown in Figure 4.2, the solution of the optimization problem approaches the discrete levels of 0 and 1 when 𝛽 is large. For smaller values of 𝛽, the solution depends on the value of 𝜃 - c¾# (converges to discrete levels only when 𝜃 - c¾# is close to discrete values). 4.4. Numerical Results Three sets of numerical experiments are discussed to examine the effectiveness of the proposed discrete regularization for identification of lithofacies. The first experiment is a linear inverse problem involving straight-ray travel time tomography. The remaining examples involve a groundwater pumping test with constant draw-down in which the horizontal hydraulic conductivity of the aquifer is estimated from pressure and hydraulic conductivity observation. For the pumping tests, three separate examples with alluvial/fluvial channel connectivity patterns in a 2D layer of an aquifer are discussed. For these experiments, in addition to discrete regularization, we use either reduced-order parameterization in a truncated SVD basis or sparsity-promoting regularization with sparse 𝑘-SVD dictionaries, both serving to preserve the facies connectivity that are learned from prior models. We note that without discrete regularization, low-rank SVD and 𝑘-SVD approximations provide continuous solutions. In these experiments, zero-mean Gaussian observation noise with standard deviation equal to 5% of the observed data is added to the simulated data, which are generated using the reference cases. Tomographic Inversion (Example 1): We will first examine the performance of the proposed discrete regularization framework in a linear tomographic reconstruction problem. Chapter 4: Promoting Discrete Features through Potential Functions 114 Figure 4.3. Travel-time tomography example: (a) Transmitter/Receiver setup; (b) reference three-facies model; (c) initial model realizations generated using sequential indicator simulation; and (d) the corresponding 𝒌-SVD dictionary atoms learned from the realizations (with 𝒌=𝟏𝟎𝟎𝟎,𝑺=𝟑𝟎). An array of sources/receivers for the tomographic problem is shown in Figure 4.3(a) in which straight ray paths from source arrays (on the top) to the receiver arrays (on the bottom) are shown. Figure 4.3(b) shows the reference slowness map used to generate the synthetic tomographic observations. The reference map has dimensions 6400×6400×10 ft @ which are discretized into a 64×64×1 grid domain. The objective of this example is to reconstruct the reference slowness map from the 6×6=36 travel time noisy measurements at the receivers, using the proposed regularization functions. The realizations of the prior model are generated using the Sequential Indicator Simulation technique [Deutsch and Journel, 1992; Liu, 2006]. The initial model realizations (first 20 out of 2000) as well as the corresponding 𝑘-SVD elements (for 𝑘 =1000, Chapter 4: Promoting Discrete Features through Potential Functions 115 𝑆 = 30) are shown in Figures 4.3(c) and 4.3(d), respectively. In this chapter, we primarily focus on the discrete regularization technique. Additional details about the above parametrization methods can be found in [Khaninezhad et al., 2012]. Figure 4.4. Travel-time tomography reconstruction results with different discrete regularization function forms: (a)-(c) show examples in which more weight is given to identification of shale, sand, and both shale and sand facies types, respectively, as discrete values; the results in (d) are for the locally shifted Tukey’s bi-weight function, which is quite similar to the results in (c), suggesting similar effect for the two regularization forms. The reconstructed models with the above tomography setup are shown in Figure 4.4 under three main assumptions. In this example, facies types {0, blue}, {2, yellow} and {3, red} represent the low, medium and high fluid conductivity regions, respectively. The resulting regularization function from Equation (4.5) takes the form: Chapter 4: Promoting Discrete Features through Potential Functions 116 𝐷 ÿ (𝑢)= £ ℎ # (𝑢−0) % (𝑢−2) % for 𝑢∈[0,2] ℎ % (𝑢−2) % (𝑢−3) % for 𝑢 ∈[2,3] 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.12) For (ℎ # =100, ℎ % =1), (ℎ # =1, ℎ % =1), (ℎ # =1, ℎ % =100) the resulting regularization functions are shown in Figures 4.4(a)-4.4(c), respectively. In each case the corresponding solutions are shown next to the regularization function. In Figure 4.4(a), the regularization function is designed to give more weight to correct identification of facies type {0} than to differentiate between facies {2} and {3}. This can be verified by noting that the boundaries of the yellow and red facies are smeared. Similarly, in Figure 4.4(b), the rock type with the highest property value (e.g., sand facies type) is more important to reconstruct when ℎ # =1 and ℎ % = 100 (the facies in red has sharp boundaries while the boundaries between the yellow and blue facies is blurred). Figure 4.4(c) shows the regularization function for the case with ℎ # =ℎ % =1. To compare the performance of different discrete regularization functions, Figure 4.4(d) shows the reconstruction results for locally shifted Tukey’s function (see Figure 4.1(d)), which assigns the same penalty (flat region of the function) to deviation beyond a certain distance from the discrete values. Our results show that the reconstruction outcomes with these regularization functions are quite similar. In fact, the use of other discrete regularization functions (shown in Figure 4.1(d)) resulted in similar outcomes (not shown), suggesting that the results are less sensitive to the exact shape of the discrete regularization function. However, we point out that this observation cannot be generalized and may depend on the weights given to the regularization parameters. Integration of Dynamic and Static Data (Example 2): The experimental setup in this example is depicted in Figure 4.5(a). No-flow boundary condition is applied to the top and bottom boundaries of the domain while constant pressure heads are applied to the right and left boundaries to establish a pressure gradient (background velocity field) from left to right. A single Chapter 4: Promoting Discrete Features through Potential Functions 117 well in the middle of the domain is pumping water with a constant draw-down head of 14 m at a steady state flow condition. Figure 4.5. Well configuration (a) and reference facies model (b) and its corresponding pressure head map (c) for Examples 2; (d) and (e) show sample prior (training) models (left) used for constructing SVD basis (middle) and 𝒌-SVD (right) sparse dictionaries without and with conditioning on hard data, respectively. The spatial distribution of log-hydraulic conductivity (unit: Log(m/s)) is assumed to be the only unknown parameter to estimate. The log-hydraulic conductivity values for the two facies types are -4 and -2.9, respectively. Pressure heads at monitoring wells surrounding the pumping well and water extraction rate at the pumping well are measured. The reference hydraulic conductivity Chapter 4: Promoting Discrete Features through Potential Functions 118 model and its corresponding pressure head distribution are depicted in Figures 4.5(b) and 4.5(c). In this example, the aquifer is of size 2107×2107×10m @ and is uniformly discretized into 64×64×1 cells. The pumping well is located in cell (32, 32). A set of 2000 facies model realizations with representative discrete patterns are used as prior (learning) data. Figure 4.5(d1) shows twenty sample prior facies models that are generated using multiple-point geo-statistics SGEMS without conditioning on hard data. Shown in Figure 4.5(e1) are twenty samples that are conditioned on hard data at the pumping and monitoring wells. The corresponding SVD basis functions and 𝑘-SVD dictionary atoms for each case are shown in Figures 4.5(d2)-(d3) and 4.5(e2)-(e3), respectively. To conduct these experiments, extensive sensitivity analysis was performed to identify regularization parameters that provided stables solutions (i.e., several trials were used to bracket regularization parameters that provided similar solutions while minimizing the terms in the objective function). Figure 4.6 summarizes the reconstruction results using only dynamic pressure head data at monitoring wells and water extraction rate at the pumping well. The calibrated models shown in the first row of Figure 4.6, from left to right, correspond to the standard SVD parameterization (Figure 4.6(a1)), continuous solution from the SVD parameterization with discrete regularization (Figure 4.6(b1)), 𝑘-SVD sparse reconstruction (Figure 4.7(c1)), and the continuous solution from the 𝑘-SVD sparse reconstruction with discrete regularization (Figure 4.6(d1)). Note that these solutions are continuous and do not represent the final desired discrete map. The results indicate that promoting discreteness on the solution enhances the structural connectivity in the continuous (non-discrete) estimated models. The final discrete solution for each case is shown under the corresponding continuous solution in Figures 4.6(a1)-(d1), and include simple post-processing of Chapter 4: Promoting Discrete Features through Potential Functions 119 the continuous solutions of SVD (Figure 4.6(a1)) and 𝑘-SVD (Figure 4.6(c1)), and the results obtained from the proposed formulations (Figures 4.6(b1) and 4.6(c1), respectively). Figure 4.6. Calibration results for Examples 2 without (a1)-(d1) and with (a2)-(d2) hard data conditioning. Columns (1)- (4) contain the continuous and discrete solutions using SVD, SVD+discrete regularization, 𝒌-SVD, and 𝒌-SVD+discrete regularization, respectively. Comparing the results from these four cases indicates that promoting discreteness improves the facies representation and enhances the structural connectivity in the estimated model. It is also evident that the results obtained by using 𝑘-SVD representation are superior to those generated by Chapter 4: Promoting Discrete Features through Potential Functions 120 SVD parameterization, an outcome that is consistent with the reported results in [Khaninezhad et al., 2012]. Figure 4.7. Data match and forecasts for pressure-head at two sample monitoring wells (left and middle), as well as water extraction rate at the pumping well (right) in Example without (a) and with (b) hard data conditioning. The proposed discrete imaging framework simultaneously constrains the solution to preserve structural connectivity and discreteness. On the other hand, the standard SVD and 𝑘-SVD methods result in continuous solutions that are converted to discrete maps through thresholding. This post-processing step, however, can perturb the obtained data match from model calibration step, and therefore degrade the predictive accuracy of the model. Unlike post-processing, the proposed approach simultaneously incorporates prior information about connectivity and discreteness in the reconstruction process without affecting the flow data match. Figure 4.7(a), from left to right, displays the estimated pressure heads and water rate forecasts for monitoring wells #1 (north-west corner) and #2 (south-west corner) and the pumping well for different methods. For the SVD and 𝑘-SVD solutions, the post-processed discrete results are used to generate future forecasts. Visual and quantitative (RMSE) comparison of the forecast plots Chapter 4: Promoting Discrete Features through Potential Functions 121 indicates that data match violations due to post-processing can be significant. Since each iteration of the formulation involves a data match step followed by a discrete regularization step that keeps the solution close to the continuous solution, the discrete regularization constraint improves the results in two ways: first by providing a discrete solution, and second by consistently improving the structural connectivity of the SVD and 𝑘-SVD continuous parameterization outcomes (i.e., through enforcing discreteness to obtain binary connectivity features). We also consider the same pumping test scenario in which noisy hydraulic conductivity measurements are included in model calibration. Figures 4.6(a2)-(d2) show the final results for calibration with conditioning on hard data. As expected, these figures show improved results in all cases due to incorporation of hard data. The solution with 𝑘-SVD and discrete regularization is noticeably superior to other methods. These results also indicate that, for both SVD and 𝑘-SVD, promoting discreteness in identifying the facies distribution enhances the structural connectivity in the estimated models. Figures 4.7(b) shows the data match for different methods in Example 2. For the traditional SVD and k-SVD solutions, simple thresholding results are used to generate the production forecasts. As can be seen from the solution with the SVD parameterization, thresholding can impact the data match quality. Large-Scale Pumping Test with Complex Geology (Example 3): As our last example in this chapter, we consider a larger and more complex facies model. The aquifer size in this example is 6583×6583×10m @ , which is uniformly discretized into 200×200×1 cells. Figure 4.8(a) shows the well configuration, consisting of 5 pumping wells under constant draw- down with pressure head of 42.2 meters, and 20 monitoring wells distributed throughout the aquifer. No-flow boundary condition is applied to the top and bottom of the domain boundaries while constant pressure heads are specified for the right and left boundaries to establish a pressure Chapter 4: Promoting Discrete Features through Potential Functions 122 gradient (background flow) from left to right. The hydraulic conductivity measurements from the reference model are shown in Figure 4.8(b). These hard data are used to generate 3000 prior model realizations with the SNESIM algorithm using the same training image that was used generate the reference map. The mean map that shows the average value of hydraulic conductivity for each cell across all the 3000 realizations is shown in Figure 4.8(c). Figure 4.8. Well configuration for Example 3: (a) showing the locations of extraction (pumping) and monitoring wells; (b) Reference hydraulic conductivity model; (c) mean map of prior (training) hydraulic conductivity models conditioned on hard-data at well-locations (both extraction and monitoring wells). For large-scale problems, learning algorithms such as 𝑘-SVD can become computationally demanding. However, in many cases the spatial representation of the training data does not provide a suitable domain for learning the spatial connectivity. Liu and Jafarpour [2013] proposed dictionary learning in a low-dimensional subspace (such as DCT or SVD) that preserves the main connectivity in the spatial domain. The goal here is to eliminate the redundancy in the description of the training data. This step is performed to compactly represent the training data prior to learning. For this example, we used a low-rank representation of the training data to speed up the learning computation. To this end, the training data were projected onto the subspace defined by the 𝑇 leading left singular vectors of the training dataset. To identify the dimension of the projection subspace, we followed the approach presented in [Liu and Jafarpour, 2013] by considering the trade-off between the subspace dimension and projection (approximation) quality, Chapter 4: Promoting Discrete Features through Potential Functions 123 in terms of RMSE. Figures 4.9(a) shows the expected total RMSE error between low-rank approximated prior models and their full rank representations. Figure 4.9. Low-rank representation error measures (a) RMSE, (b) normalized RMSE & Computation Time measure; (c) prior (training) hydraulic conductivity models used for constructing sparse dictionaries with hard-data conditioning for Example 4 (a); SVD elements with ranks 1-9 (d), and 492-500 (e); (f)-(k) depict example k-SVD dictionary atoms using a rank-T prior model representations followed by k-SVD dictionary learning for different 𝑻, 𝒌, and 𝑺 values: (f) 𝑻= 𝟓𝟎𝟎,𝒌=𝟓𝟎𝟎,𝑺=𝟒𝟎; (g) 𝑻=𝟓𝟎𝟎,𝒌=𝟓𝟎𝟎,𝑺=𝟔𝟎; (h) 𝑻=𝟓𝟎𝟎,𝒌=𝟓𝟎𝟎,𝑺=𝟏𝟎𝟎; (i) 𝑻=𝟓𝟎𝟎,𝒌=𝟏𝟎𝟎𝟎,𝑺=𝟒𝟎; (j) 𝑻=𝟓𝟎𝟎,𝒌=𝟏𝟎𝟎𝟎,𝑺=𝟔𝟎; (k) 𝑻=𝟓𝟎𝟎,𝒌=𝟏𝟎𝟎𝟎,𝑺=𝟏𝟎𝟎. Chapter 4: Promoting Discrete Features through Potential Functions 124 Figure 4.9(b) displays the approximation performance by considering both the computational costs and the total RMSE error associated with low-rank representation. Based on this figure, we projected all the training realizations onto the subspace defined by the 500 leading singular vectors of the training data and applied the 𝑘-SVD learning to the resulting 500-dimensional dataset. Figure 4.10. Calibrated hydraulic-conductivity models for Example 3 (with hard-data conditioning) using: (a) truncated SVD parameterization, (b) truncated SVD parameterization with discrete regularization, (c) k-SVD based sparse reconstruction followed by thresholding; (d) k-SVD based sparse reconstruction with discrete regularization. (a1)-(d1) show the continuous solutions; (a2)-(d2) display the corresponding discrete solutions (in (a2) and (c2), show discrete solutions after thresholding); (a3)-d3) depict the estimation error (difference between the reference model and the solutions obtained in each case). Figure 4.9(c) shows 9 sample prior model realizations for this experiment. Sample SVD basis functions #1-#9 and #31-#40 are shown in Figures 4.9(d) and 4.9(e), respectively. Figures Chapter 4: Promoting Discrete Features through Potential Functions 125 4.9(f)-4.9(k) display 𝑘-SVD dictionary elements for different values of 𝑘 (i.e., total number of dictionary elements), and S (i.e., sparsity level). From analysis of the computation time and estimation quality of the training models, we chose a rank-500 SVD representation for inversion. The 𝑘-SVD sparse dictionary learning was performed with 𝑘 =1000 and 𝑆 = 60. The model calibration results with discrete regularization for SVD parameterization (𝑇 =60) and 𝑘-SVD (with 𝑇 =500,𝑘 =1000,𝑆 =60) are shown in Figures 4.10(b2) and 4.10(d2), respectively. The corresponding continuous models for the SVD and 𝑘-SVD methods are shown in Figures 4.10(b1) and 4.10(d1), respectively. Comparing Figures 4.10(a1) with 4.10(c1) and the corresponding discrete images in Figures 4.10(b1) with 4.10(d1), indicates a superior performance for 𝑘-SVD in capturing the correct connectivity patterns. The same observations can be made by comparing the related cell-level mismatch rates for the two models (reported underneath each plot). Figures 4.10 (a3)-4.10(d3) shows the solution errors (difference between the reference and calibrated models) for each case. The solutions from discrete imaging formulations show superior performance to those from continuous parameterization methods. 4.5. Summary and Discussion In this chapter, we developed a new framework for reconstruction of geological facies distribution from limited linear/nonlinear measurements. The formulation uses parametrized prior models to retain the connectivity structures in facies distribution and discrete regularization functions to honour the categorical nature of facies. A regularized least-squares formulation of the problem leads to multiple regularization terms. The resulting complex objective function is decomposed and solved using variable splitting and an alternating directions optimization method. The optimization algorithm involves sequential minimization of two simpler objective functions, one Chapter 4: Promoting Discrete Features through Potential Functions 126 involving a data mismatch function with a constraint on parameter spatial connectivity, and the other consisting of a discrete regularization function with a constraint on the solution obtained from the first problem. The proposed discrete regularization and alternating directions solution framework for reconstruction of geologic facies is a novel approach for dynamic facies characterization from flow data. While other methods for facies calibration, such as the level set method and pluri-Gaussian techniques, have been developed in the literature, these methods are not amenable to efficient gradient-based model calibration due to the complex relation between facies models and the continuous parameters (e.g., sign function) used to tune them. Furthermore, representing complex geological patterns such as curvilinear fluvial shapes is difficult to accomplish with simple sign functions or Gaussian distributions. Alternatively, the use of continuous methods combined with post-processing steps (such as thresholding) to convert continuous parameters to discrete facies is can deteriorate the quality of data match and model predictions, which is not desirable. The proposed method extends the classical regularized least-squares formulation to solve calibration of geologic facies by promoting solution discreteness. Overall, the results indicate that the proposed approach can simultaneously preserve structural connectivity and promote facies discreteness, while integrating nonlinear flow data. The proposed method is flexible and can be applied to reconstruct the distribution of multiple facies types from well head and flowrate data. Several examples were used to show the superiority of the proposed approach to traditional post- processing methods. Our results in this chapter show that discrete regularization can provide an intuitive method for promoting facies reconstruction from dynamic data. In its simple form a discrete regularization function can operate at a cell level and force the solution to assume discrete values with minimal Chapter 4: Promoting Discrete Features through Potential Functions 127 impact on the resulting data match. In addition to generating discrete solutions, including discrete regularization in the objective function can result in improved estimation of connectivity patterns in the continuous solution; and variable splitting with alternative directions method of optimization divide a complex objective function into sub-problems with simpler objective functions, which are easier to solve. We also compared the performance of the proposed 4 th order polynomial discrete regularization function with locally shifted Tukey’s functions and observed similar outcome. The developed method in this chapter is straightforward to apply to large-scale 3D models in practice as it is based on the solution of existing model calibration methods with and additional discrete regularization function. In particular, the use of alternating directions method in this chapter can further simplify the solution by separating the discrete regularization formulation as an additional step that can be included in existing optimization methods. A key point to emphasize is that the discrete regularization method has to be implemented iteratively and not as a post- processing after finding a continuous solution. Our experience shows that iterative application of the discrete regularization tends to improve the connectivity patterns that are identified in the continuous solution. As a final observation, we note that discrete regularization promotes solution discreteness at the cell-level. An important improvement can be achieved by including additional information from neighbouring cells in the discrete regularization. Combining discreteness of each cell with information about the expected local patterns around the cell is expected to improve the performance of the proposed method. One way to incorporate such mechanisms is by introducing higher-order statistical information, e.g., in the form of conditional probabilities, in the regularization function. The conditional probabilities can be quantified using prior datasets or a training image. The next chapter is focused on moving from a cell-level discrete penalty to a Chapter 4: Promoting Discrete Features through Potential Functions 128 spatially informed discrete penalty that incorporates, from available prior information, the expected statistical patterns for the neighbouring cells. Chapter 4: Promoting Discrete Features through Potential Functions 129 Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 130 CHAPTER 5 PATTERN-BASED FEATURE LEARNING FOR HONORING FEASIBILITY CONSTRAINTS Calibration of heterogeneous subsurface flow models usually leads to ill-posed nonlinear inverse problems, where too many unknown parameters are estimated from limited response measurements. When the underlying parameters form complex (non-Gaussian) structured spatial connectivity patterns, classical variogram-based geostatistical techniques cannot describe the underlying distributions. Modern pattern-based geostatistical methods that incorporate higher- order spatial statistics are more suitable for describing such complex spatial patterns. Moreover, when the unknown parameters are discrete (e.g., geologic facies distribution), conventional model calibration techniques that are designed for continuous parameters cannot be applied directly. In this chapter, we introduce a novel pattern-based model calibration method to reconstruct spatially complex facies distributions from dynamic flow response data. To reproduce complex connectivity patterns during model calibration, we impose a geologic feasibility constraint that ensures the solution honors the expected higher-order spatial statistics. For model calibration, we adopt a regularized least-squares formulation, involving (i) data mismatch, (ii) pattern connectivity, and (iii) feasibility constraint terms. Using an alternating directions optimization algorithm, the regularized objective function is divided into a parameterized model calibration sub-problem, Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 131 which is solved via gradient-based optimization. The resulting parameterized solution is then mapped onto the feasible set, using the 𝑘 nearest neighbors (𝑘-NN) as a supervised machine learning approach, to honor the expected spatial statistics. The two steps of the model calibration formulation are repeated until the convergence criterion is met. Several numerical examples are used to evaluate the performance of the developed method. In this chapter, we denote the set of feasible models (all valid samples corresponding to a prior geologic scenario, e.g., a given training image) by 𝛀 𝐩 . When closed-form probabilistic models are available to describe the distribution of rock properties, they can be used to completely characterize the feasible set 𝛀 𝐩 . However, except for very special cases, realistic scenarios are far more complex and do not lend themselves to descriptions with closed-form models. In that case, the feasible set can be defined either by a conceptual model that represents the expected spatial statistics (e.g., a training image) or by a very large set of model realizations that approximately represent the statistical patterns in such model. 5.1. Problem Statement and Formulation A simple model calibration problem statement conditioned on a feasible set (𝛀 𝐩 ) can be presented as a constrained weighted least square formulation: min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu − 𝐠(𝐮))´ % % =»𝐝 obs − 𝐠(𝐮)½ 𝑇 𝐂 𝛜 −1 (𝐝 obs − 𝐠(𝐮)) 𝑠.𝑡., 𝐮∈𝛀 𝐩 (5.1) , where 𝐮∈𝛀 𝐩 restricts the least-squares solution to the predefined feasible set. Here, 𝐝 stu denotes the observed data, and 𝐂 𝛜 is a weight matrix that accounts for data noise. In this chapter, we assume that 𝐂 𝛜 is a diagonal matrix with diagonal elements that are proportional to the expected magnitude Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 132 of entries of 𝐝 stu . Therefore, we assign larger noise variance for the data type(s) with larger expected magnitude. The regularization term can also be expressed using an indicator function of the form: 𝐼»𝐮,𝛀 𝐩 ½=u 0 𝑖𝑓 𝐮∈𝛀 𝐩 ∞ 𝑖𝑓 𝐮∉𝛀 𝐩 (5.2) , which can be used to rewrite Equation (5.1) as: min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu − 𝐠(𝐮))´ % % 𝑠.𝑡., 𝐼»𝐮,𝛀 𝐩 ½ =0 (5.3) For special cases, such as a multi-Gaussian distribution of parameters, it may be possible to express the constraint 𝐼»𝐮,𝛀 𝐩 ½= 0 analytically (for example to honor the variogram or covariance function of the underlying distribution). However, in more complex problems where an analytical expression cannot be used to describe the constraint 𝐼»𝐮,𝛀 𝐩 ½ =0, it is not trivial to express this constraint or enforce it during model calibration. Furthermore, exhaustive search of the feasible set to find the minimum of the objective function is computationally impractical. In this chapter, we develop an indirect method for solving the constrained minimization problem by defining a two-step alternating directions method. The first step uses a standard gradient-based method to find an approximate parameterized solution to the problem. The second step maps the parameterized solution onto the feasible set while ensuring that the updated solution remains close to the parameterized solution from the first step. These two steps are repeated until no further improvement in the objective function is obtained. In this chapter, we use a machine learning algorithm to implement the mapping in the second step (see Section 5.2). To develop the two-step solution approach, we first expand the parameters into a set of feasible values 𝐮 and the corresponding parameterized approximation using the linear expansion, Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 133 i.e., 𝐮 w ≈𝚽𝐯. Since the transformation matrix 𝚽 is known, we choose the expansion coefficients 𝐯 to represent 𝐮 w. Hence, we can now write the new objective function to find 𝐮 and 𝐯 as follows: min 𝐯,𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu − 𝐠(𝚽𝐯))´ % % 𝑠.𝑡., ‖𝚽𝐯−𝐮‖ % % ≤𝜖 % & 𝐼»𝐮,𝛀 𝐩 ½ =0 (5.4) Note that the constraint ‖𝚽𝐯−𝐮‖ % % ≤ 𝜖 % is used to ensure that the solution in the linear expansion space does not deviate significantly from the sample space of the feasible set. In addition, 𝐠(𝐮) is replaced by 𝐠(𝚽𝐯) in the data mismatch term to (i) reduce problem dimensionality, and (ii) separate 𝐠(𝐮) and 𝐼»𝐮,𝛀 𝐩 ½ in the alternating steps (Equations (5.6) and (5.7)). An alternative form of the objective function in Equation (5.4) can be written as: min 𝐯,𝐮 ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝚽𝐯)½´ % % +𝐼»𝐮,𝛀 𝐩 ½ +𝜆 % ‖ 𝚽𝐯−𝐮 ‖ % % (5.5) Here, 𝜆 % is a regularization parameter that controls the linear expansion approximation error. For linear inverse problems, i.e., when 𝐠(.) is a linear operator, Cross Validation and L-curve methods are developed to properly set the value of 𝜆. However, these methods do not extend to nonlinear problems. For nonlinear problems, practical approaches such as trial-and-error and sensitivity analysis are commonly used. In some cases, the solution may not show sensitivity to a wide range of 𝜆 and a simple sensitivity analysis identifies this range. To minimize the objective function in Equation (5.5), we use the following two-step alternating directions algorithm (similar to Chapter 4), where the first steps updates the parameterized solution through 𝐯 while the second step maps this solution onto the closest feasible solution 𝐮 in the feasible set: Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 134 (i) 𝐯 c =argmin 𝐯 ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝚽𝐯)½´ % % +𝐼»𝐮 c¶# ,𝛀 𝐩 ½ +𝜆 % ‖ 𝚽𝐯−𝐮 c¶# ‖ % % (5.6) (ii) 𝐮 c =argmin 𝐮 ´𝐂 𝛜 ¶ D E 𝐝 stu − 𝐠(𝚽𝐯 c ) ´ % % +𝐼»𝐮,𝛀 𝐩 ½ +𝜆 % ‖ 𝚽𝐯 c −𝐮 ‖ % % , which can be simplified to: (i) 𝐯 c =argmin 𝐯 ´𝐂 𝛜 ¶ D E »𝐝 stu − 𝐠(𝚽𝐯)½´ % % +𝜆 % ‖ 𝚽𝐯−𝐮 c¶# ‖ % % (5.7) (ii) 𝐮 c =argmin 𝐮 𝐼»𝐮,𝛀 𝐩 ½ +𝜆 % ‖ 𝚽𝐯 c −𝐮 ‖ % % The minimization problem in Step (i) of Equation (5.7) can be carried out using standard gradient- based optimization techniques. For this purpose, 𝐠(𝚽𝐯) is usually approximated by a linear Taylor expansion. Because of this approximation, the nonlinear optimization in Step (i) is simplified to a convex optimization problem, which can be easily solved by setting the gradient of it to zero. The update in Step (ii) is implemented by mapping the parameterized solution 𝐮 w c = 𝚽𝐯 c onto the closest feasible model using a machine learning algorithm that is discussed in the next section. Simple convergence criteria, such as a threshold on ‖ 𝐮 c −𝐮 c¶# ‖ % % and ‖ 𝐯 c −𝐯 c¶# ‖ % % or the objective function value can be used to stop the iterations in Equation (5.7). In the two-step alternative direction approach of Equation (5.7), Step (i) updates 𝐯 by minimizing the data mismatch while keeping the parameterized approximation error, i.e., 𝜆 % ‖ 𝚽𝐯−𝐮 c¶# ‖ % % , small. This regularization term defines a closed neighborhood surrounding 𝐮 c¶# , which is the estimation result obtained from the feasible set in the most recent iteration. Consequently, the updated parameter in Step (i), i.e., 𝐮 w c =𝚽𝐯 c , remains close to 𝐮 c¶# while seeking a new solution to improve the data match. Step (ii) uses the updated parameterized solution from Step (i), i.e., 𝐯 c , to update 𝐮 inside this feasible set. Since 𝚽𝐯 c is forced to remain close to 𝐮 c¶# , the new mapping result 𝐮 c is updated gradually. Therefore, the regularization term Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 135 𝜆 % ‖ 𝚽𝐯−𝐮 c¶# ‖ % % connects the gradual transition from the feasible set to the parameterization space and vice versa. In addition, the regularization term 𝜆 % ‖ 𝚽𝐯−𝐮 c¶# ‖ % % provides an opportunity to overcome the ill-posedness of the problem. 5.2. Enforcing Feasibility Constraint for Pattern Learning To enforce the complex feasibility constraint in the second step of the alternating directions algorithm, we develop a machine learning approach that is discussed in this section. Machine learning algorithms are data driven modeling approaches that can learn and predict the functionality of models from data [Kotsiantis et al., 2007; Nasrabadi, 2007; Witten et al., 2016]. In general, these techniques are classified into supervised, unsupervised, and semi-supervised or reinforcement learning. In supervised learning, a set of input/output, i.e., feature/label, samples are observed (or constructed), and incorporated to learn an approximation of the mapping function between the inputs and outputs [Kotsiantis et al., 2007]. In unsupervised learning, no output samples (labels) are available, and the input samples are utilized to learn the hidden patterns in the data. Classification and clustering [Jain and Dubes, 1998] are famous examples of supervised and unsupervised learning problems, respectively. In semi-supervised learning, a small portion of the samples are labeled, and the primary goal is similar to that of unsupervised learning [Chapelle et al., 2009]. Reinforcement learning [Sutton and Barto, 1998] couples the system’s functionality with the dynamic environment to reach to a certain goal, which is usually carried out by maximizing a reward function. In this chapter, we use a classification algorithm (as a supervised learning approach) to implement the mapping in Step (ii) of the solution approach. Several algorithms, such as 𝑘-Nearest Neighbor (𝑘-NN), linear discriminant analysis, support vector machines, decision trees, logistic Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 136 regression, Neural networks or etc. [Nasrabadi, 2007; Witten et al., 2016] are proposed for classification problems, each with specific applications. The 𝑘-Nearest Neighbor (k-NN) [Larose, 2005; Ripley, 2007], which is implemented in this chapter, is a simple classification technique where the decision about the label of a feature vector is made by investigating the behavior of most similar feature vectors in the learning dataset. The algorithm selects the 𝑘 most similar feature vectors, using a distance measure, from the training dataset and assigns the most frequent label corresponding to these feature vectors as the output label. Despite simplicity of the 𝑘-NN algorithm, it enjoys strong theoretical background as a classification technique [Ripley, 2007]. We present the developed 𝑘-NN algorithm for our application next. Figure 5.1. The schematic of mapping a parameterized solution onto the feasible set 𝜴 𝒑 . Mapping 𝜱 𝒗 𝒌 onto the feasible set 𝜴 𝒑 results in 𝒖 𝒌 . Mapping onto Feasible Set: Figure 5.1 depicts a simple schematic of mapping a sample parameterized solution 𝐮 w c = 𝚽 𝐯 c onto 𝛀 𝐩 . In this figure, 𝐮 w c = 𝚽 𝐯 c is represented on the top left while the feasible set 𝛀 𝐩 is shown on the right. For this example, we have used the basis 𝚽 to be the truncated PCA basis, which represent the eigenvectors of the sample covariance Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 137 matrix associated with feasible set. As it is demonstrated in Figure 5.1, the objective is to identify a corresponding sample for the parameterized map in the discrete feasible set. We denote 𝐫┴𝛀 𝐩 as the function that maps 𝐫, which is outside 𝛀 𝐩 , onto the set 𝛀 𝐩 . Additionally, a set of learning samples, i.e., Ɗ={(𝐫 - ,𝐮 - ):𝑖 =1,…,𝑁} are provided, where 𝐮 - is the mapping of 𝐫 - onto the set 𝛀 𝐩 . The goal is to use Ɗ={(𝐫 - ,𝐮 - ):𝑖 =1,…,𝑁} to learn the mapping operator, which can be used to find the feasible solution 𝐮 based on the continuous approximation 𝐮 w. Figure 5.2 demonstrates the schematic of a supervised learning approach to learn the mapping onto the set 𝛀 𝐩 . We assume that the outputs of the learning samples, i.e., 𝐮 - ’s, are the representative realizations from 𝛀 𝐩 , and the corresponding input samples, i.e., 𝐫 - ’s, are the representation of 𝐮 - ’s in the parameterization space 𝚽. For example, 𝐫 - ’s can be calculated as 𝐫 - =𝚽(𝚽 ð 𝐮 - ) if the basis functions are orthogonal, e.g., PCA. This way of generating the learning dataset ensures that the input learning samples 𝐫 - ’s, and the sample to be mapped during inversion, i.e., 𝚽 𝐯 c are defined in the same basis. Figure 5.2. Illustration of the mapping operator using a supervised learning approach. The pairs (𝒓 𝒊 ,𝒖 𝒊 ) are the learning samples, where 𝒖 𝒊 is the mapping of 𝒓 𝒊 onto 𝜴 𝒑 , which are used to learn the mapping operator. The learned mapping operator is then applied to 𝜱 𝒗 𝒌 to obtain 𝒖 𝒌 . Figure 5.3 depicts how the parameterized learning samples are constructed to correspond to the same level and type of parameterization as in Step (i) of the model calibration approach. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 138 While this procedure ensures similar parameterizations is used for the learning samples and parameterized model calibration, the inversion results from the Step (i) of the Equation (5.7), i.e., 𝚽 𝐯 c , may not have similar connectivity patterns to those in the training data. However, the mapping, which is discussed next, is based on local connectivity patterns, and identifies the closet pattern to what is in the parameterized solution 𝚽 𝐯 c . Figure 5.3. Construction of the learning data pairs (𝒓 𝒊 ,𝒖 𝒊 ) from the training image: The feasible models are generated from the training image (shown on left) while the corresponding parenthesized samples 𝒓 𝒊 ’s are PCA approximations of the feasible samples. The rank of the PCA parametrization is the same for the training data and Step (i) of the inversion approach. A major criterion in learning the mapping from parameterized input samples, i.e., 𝐫 - ’s, to their corresponding feasible output samples, i.e., 𝐮 - ’s, is to preserve the statistical consistency of the patterns. A simple approach is to associate the full size parameterized maps with their corresponding feasible maps, and during inversion find the closest (using a similarity measure) parameterized map to the current iterate and select the corresponding feasible map as the solution. However, this strategy is unlikely to work in automatic model calibration for two reasons: first, arbitrary updates are applied to the solution at each iteration of model calibration, which can result in significant deviation from the global connectivity patterns in the training models; second, it Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 139 requires an extremely large set of training models to cover all possible global pattern configurations, which is neither necessary nor practical. A more flexible alternative is to incorporate the statistical information in the prior data based on local patterns, which we implement in two stages. A similar approach is used in multiple-point geostatistical simulation techniques such as SNESIM [Strebelle, 2003] and SIMPAT [Arpat and Caers, 2007]. In the first stage, using a specified template size, we scan the obtained parameterized image 𝚽 𝐯 c (i.e., current iterate) and compute the corresponding feature vectors (local patterns inside the defined template). For each instance of the scanning template, we use the 𝑘-NN classification algorithm to identify the 𝑘 closets feature vectors in the parameterized dataset, which are generated from {𝐫 - :𝑖 = 1,…,𝑁}. The corresponding 𝑘 feasible labels vectors in the training dataset are then stored for the cells covered by the scanned area (without assigning final feasible values). In the second step, an aggregation approach is used to combine all the stored facies instances in each model cell to represent the conditional distribution for the cells. Each cell is then assigned the feasible value with highest (empirical) frequency. The resulting map constitutes the feasible solution obtained in Step (ii) of Equation (5.7) and is passed to Step (i) to continue the iterations. We note that the feature vectors do not need to be the spatial descriptions of the patterns and could be defined based on various factors, including computation. For instance, one class of feature vectors can be the projected coefficients of the patterns onto a user-defined subspace (e.g., PCA). The feature vector could also be defined by considering a subset of grid cells within the template (either randomly or deterministically), which can result in computational savings. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 140 Figure 5.4. Illustration of the neighbourhood templates (left) and the feature/label vectors (right). The feature space is defined by the parameterized representation of the patterns inside the templates while the label vectors correspond to the feasible patterns. 𝒌-NN Classification: Figure 5.4 (left) demonstrates the learning stage where a circular template is used to scan prior image realizations, i.e., pairs of (𝐫 - ,𝐮 - ), to generate corresponding segments of the parameterized and feasible input images. In this case, the feature vectors, denoted as 𝐱 - , are defined as the exact grid-block values that are located inside the templates. The corresponding discrete labels are denoted as 𝐲 - . As shown in Figure 5.4, in this chapter we create the feature vectors by extracting the exact grid-block values that are located inside the neighborhood templates. The pattern dataset Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 =1,…,𝑁 Í } is used to implement the mapping onto the feasible set in Step (ii) of Equation (5.7). Figure 5.5 demonstrates the schematic of the first stage of the segmentation procedure. The classification approach, which is based on the 𝑘-NN classifier explores the feature space in the training dataset to find the 𝑘 feature vectors that are most similar to each scanned portion of 𝚽 𝐯 c , and stores the corresponding discrete label vectors for the scanned region. As it is shown in Figure 5.5, the same template that is used to construct the learning dataset is applied to extract the feature vectors in 𝚽 𝐯 c . By considering a similarity measure, e.g., a second norm of the error ‖𝐱−𝐱 - ‖ % , between 𝐱 (in the inversion solution, Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 141 i.e., 𝚽 𝐯 c ) and 𝐱 - ’s (in Ɗ Í ), the 𝑘 most similar feature vectors in Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 =1,…,𝑁 Í } are chosen, and the corresponding label vectors 𝐲 - ’s are stored for the corresponding template in the solution. We note that the first stage results in significant amount of overlap between the scanned regions and provide a large number of samples for each cell in the model. Figure 5.5. The schematic of the 𝒌-NN classifier, which is used to replace parameterized patterns with their corresponding feasible samples in the feasible set. For each cell in the model 𝜱 𝒗 𝒌 , a local template is used to extract pattern features and identify its k closest feature vectors and their corresponding labels in the learning dataset. Aggregation: After the initial classification and storing of the label vectors, an aggregation (or voting) step is used to combine all the assignments to each cell and decide about the feasible values for each grid-block. Figure 5.6 depicts the aggregation approach where for a given cell (marked with a dot) the information from scanning different regions are combined to generate the conditional facies probability. In this chapter, the feasible value with the highest ( empirical) frequency is assigned to each grid-block. The aggregation step ensures that facies assignment to each grid-block takes into account the spatial statistics (and connectivity patterns) from an extended neighborhood (approximately twice the size of the template in each direction). Moreover, aggregation leads to far more samples (than 𝑘) for each grid cell, which substantially Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 142 increases the accuracy and spatial consistency of the method. Figure 5.7 illustrates this point using a simple example. Figure 5.6. Implementation of the mapping method with aggregation in the second Step of the algorithm. The aggregation step stores the entire spatial labels (instead of the centre cells), thereby including significantly larger number of sample labels for each cell and extending the spatial connectivity beyond a single template size. In this example, the samples for the cell indicated with a black circle are generated by considering the labels identified for all the templates that include this cell. Without aggregation, only the k labels corresponding to the template centred at this cell location would be used. The plot on the right shows the generate sample conditional PDF for this cell. Figure 5.7. Results of mapping with discrete thresholding (top right), mapping onto the feasible set without aggregation (middle), and with aggregation (bottom). The results are shown for template sizes 𝒓=𝟓 and 𝒓=𝟐𝟓 and 𝒌 values (in 𝒌- NN classifier) of 𝒌=𝟏 and 𝒌=𝟓. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 143 The parameterized map 𝚽 𝐯 c from Figure 5.1 is shown on the top (left), along with a discrete version that is obtained by thresholding (top right). Clearly, thresholding does not incorporate spatial statistics and results in a poor reconstruction of the parameterized map. The middle and the bottom rows of Figure 5.7 show the results of the proposed mapping approach without and with using the aggregation step, respectively. The results are shown for different template sizes, i.e., 𝑟 =5 and 𝑟 =25, and different 𝑘 values (in 𝑘-NN classifier), 𝑘 =1 and 𝑘 = 5. In both cases, large values of the template size and 𝑘 lead to improved reconstruction. In the case without aggregation, the results are not as accurate and random discontinuity emerges in the discrete solution. Aggregation improves the connectivity by exploiting additional spatial information about extended local neighborhood. Furthermore, when aggregation is used, less sensitivity is observed with respect to 𝑘 as aggregation provides many more samples than can be achieved by a modest increase in 𝑘 (many overlapping templates cover the same cell in the model). 5.3. Computational Complexities Learning the mapping operator for Step (ii) of Equation (5.7) can be computationally demanding for large template sizes. During the 𝑘-NN classification step, the feature vector in the solution, i.e., 𝐱, is compared with the 𝑁 Í feature vectors in Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 = 1,…,𝑁 Í }, where the comparison is done by applying a normed measure, such as 𝑙 % -norm. Therefore, the computational cost of mapping 𝚽 𝐯 c onto 𝛀 𝐩 is proportional to the number of grid-blocks, the value of 𝑁 Í , and the complexity of calculating ‖𝐱−𝐱 - ‖ for an arbitrary index 𝑖. The computational complexity of calculating ‖𝐱−𝐱 - ‖ is directly related to the type of norm used and the dimension of the feature vector 𝐱. Increasing the dimension of the feature vector, i.e., size of 𝐱 - ’s, linearly increases the Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 144 computational complexity in calculating ‖𝐱−𝐱 - ‖; thus, larger neighborhood templates require more demanding calculations. To construct the feature vectors, a template structure and a reasonable value for 𝑁 Í is assigned, initially. The feature vectors are then generated by scanning the data samples in Ɗ= {(𝐫 - ,𝐮 - ):𝑖 =1,…,𝑁}. For practical implementation, the scanning does not need to be exhaustive (which would involve redundancy) and can be performed by considering a random subset of regions within each sample, which also reduces the memory demand. The value of 𝑁 Í , i.e., number of feature vectors in the learning dataset, linearly increases the computational complexity of the 𝑘-NN classifier; therefore, a reasonable value for 𝑁 Í has to be considered, e.g., ~1000. Figure 5.8. Spatially uniform and full samples within the templates (a), and random subsampling (b) to speed up the implementation of the learning approach. The mapping results (classification + aggregation) for different subsampling rates are shown in (c). In this case, 1000 feature vectors are used in the learning dataset, and the neighbourhood templates are squares with size of 𝟓𝟎×𝟓𝟎. The value of 𝒌 in the 𝒌-NN classifier is set to 1. A major contributor to computational complexity is the number of 𝑘-NN classifications before the aggregation step. One way to reduce this computation is to use a small subset of the Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 145 overlapping templates. Figures 5.8(a) and 5.8(b) depict the schematic of the uniform, i.e., using all the grid-blocks, and random 𝑘-NN classifications, respectively. The reconstructed discrete solutions for different subsampling ratios are shown in Figure 5.8(c). It can be seen that even using only 10% of the cells does not deteriorate the reconstruction results significantly. This is due to the role of the aggregation in increasing the number of samples for each grid block. The computational complexity is, however, significantly reduced by introducing subsampling. An additional computational consideration is in solving the optimizations in Step (i) of Equation (5.7), which require several numerical flow simulation runs. In general, finding the exact minimum of Step (i) in Equation (5.7) is not necessary and a few iterations are sufficient to move to Step (ii). It is important to note that the term ‖𝚽𝐯−𝐮‖ % % is used in both steps of the algorithm, which discourages significant departure from the solution obtained in the previous step of the algorithm, and results in a more gradual convergence. In summary, the computational complexity of a single mapping is proportional to number of cells in the random path for classification, the size of the feature vectors, and the number of features vectors in Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 =1,…,𝑁 Í } (usually, 𝑁 Í does not change significantly by increasing the dimension of the model). In addition, minimization of Equation (5.7) require several forward simulations and adjoint computational, which can be computationally very significant for field-scale problems. An important point to discuss before presenting the results is that while other methods may be applied to perform the mapping in Step (ii), one needs to consider the properties and effectiveness of the implementation. For instance, one may consider using a scaled version of the parametrized solution as facies probability map (soft data) to generate a conditional discrete facies realization as in PCM [Jafarpour and Khodabakhshi, 2011]. However, the PCM was used in Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 146 ensemble-based data assimilation with EnKF, where the updated ensemble mean after data assimilation was used as soft data to regenerate an updated ensemble (i.e., conditional facies realizations). This approach ensured that the correct statistics were inherited from the training image before implementing the next forecast and update steps. However, in PCM., using the scaled version of the ensemble mean as soft data implies that departure from the local connectivity patterns in the probability map is allowed. This is acceptable in PCM because the ensemble realizations did not follow the same exact connectivity that was in the ensemble mean. As a result, in the PCM, only the mean of the generated ensemble is consistent with the facies probability map (due to the stochastic nature of the SNESIM simulation). In the current formulation, the goal is to obtain a feasible map that is closest to parametrized solution that is obtained in Step (i). Therefore, the method should not allow departure from the parameterized solution. There are also other important differences between these methods: the method in the current chapter is based on pattern learning and can be applied to continuous and discrete patterns, whereas PCM was developed for discrete facies; additionally, the proposed method does not rely on the SNESIM (or any other) algorithm for facies simulation. The use of a prior training set allows the method to be applicable to training datasets that are obtained from other geostatistical simulation techniques. This property generalizes the method and makes it independent of the geostatistical method used to generate the prior training set. 5.4. Numerical Results We present three numerical experiments to examine the performance of our developed approach for inference of rock facies distribution developed in this chapter. The first example is a straight- ray travel time tomographic inversion, which leads to a linear inverse problem. The second Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 147 experiment involves a ground water pumping test in which the facies represent the hydraulic conductivity of the field. In this case, steady-state pressure head data are used to reconstruct the spatial distribution of lithofacies that represent the hydraulic conductivity maps. In the last experiment, we consider a two-phase flow problem, where a 3-dimensional model of facies distribution is estimated from transient pressure and flowrate data. In all of these examples, in implementing the 𝑘-NN classification step, we use a random subsampling approach by scanning only 20% of the grid-blocks. For the feature/label data pairs, i.e., Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 =1,…,𝑁 Í }, we use a total of 𝑁 Í = 2000 samples. Figure 5.9. Straight-ray travel time tomographic inversion example: (a) source-receiver configuration; (b) reference slowness map; (c) best parameterized approximation of the reference model with 40 leading PCA basis functions; (d) the initial guess for model calibration, which is the mean of the prior realizations. Tomographic Inversion: In tomographic inversion, measurements of travel (arrival) times of transmitted acoustic waves are used to estimate the slowness (1/velocity) of the medium between the source and receiver pairs (subsurface environment). The configuration of the travel time tomographic inversion is shown in Figure 5.9(a), where the sources (left) transmit the acoustic waves to the receivers (right). The reference slowness map contains a complex meandering structure, as depicted in Figure 5.9(b). The model domain has a dimension of 1000×1000m % , which is discretized uniformly into 100×100 cells. In this experiment, an array of 6 sources and 6 receivers with equal interval are used, resulting in 36 arrival time measurements. The prior training data consist of 𝑁 = 500 model realizations, which are drawn from a training image (shown in Figure 5.10(a)) consisting of meandering connectivity patterns. These Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 148 samples, i.e., previously defined 𝐮 - ’s in Ɗ={(𝐫 - ,𝐮 - ):𝑖 = 1,…,𝑁}, are used to approximately represent the patterns in the feasible set 𝛀 𝐩 . We note that for small values of 𝑁 (e.g., 𝑁= 50) the existing patterns in {𝐮 - :𝑖 = 1,…,𝑁} may not be representative of the entire connectivity features of the feasible set; in addition, it may result in a poor PCA parameterization. On the other hand, large values of 𝑁 (e.g., 𝑁 =20000) may overestimate the complexity of the patterns in the feasible set and result in redundant patterns and drastic computational overhead. Therefore, a reasonable value needs to be assigned for 𝑁. Figure 5.10(b) displays four sample realizations from this set. The corresponding parameterized models, i.e., 𝐫 - ’s in Ɗ={(𝐫 - ,𝐮 - ):𝑖 =1,…,𝑁}, are obtained using truncated PCA parameterization of {𝐮 - :𝑖 =1,…,𝑁}. The corresponding PCA basis functions, 𝚽, of the parameterization subspace are shown in Figure 5.10(c). Figure 5.9(c) depicts the projection of the reference model on the truncated PCA basis. Figure 5.10. (a) Prior training image for a meandering channel; (b) four corresponding sample realizations from the resulting feasible set 𝜴 𝒑 ; (c) and the first 49 PCA leading basis functions generated from the prior realizations in 𝜴 𝒑 . For inversion, the initial map, which is the mean of the samples in 𝛀 𝐩 , is shown in Figure 5.9(d). Figure 5.11(a) shows 10 iterations of the inversion results, i.e., 𝚽𝐯 (top) and 𝐮 (bottom). The discrete mapping step uses a 50×50 square template, and 𝑘 (in the 𝑘-NN algorithm) is set to 5. We initially start with 20 leading PCA bases functions for parameterization and gradually increase the dimension of parametrization to 50 throughout 10 consequent iterations. This Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 149 objective is to initially capture the large-scale connectivity using fewer global connectivity patterns and gradually increase the parameter size to improve the resolution of the reconstructed model. As shown in Figure 5.11(a), the inversion process is able to detect the meandering connectivity structures with good accuracy within the first 10 iterations. During the initial iterations the global connectivity structures are inferred, while at later iterations the estimation quality is enhanced gradually. Figure 5.11. Tomographic inversion iterations, showing the parameterized solution 𝜱𝒗 in Step (i) (top) and the feasible solution 𝒖 obtained in Step (ii) (bottom). Effect of within Facies Heterogeneity: In this case, the reference map, which is demonstrated in Figure 5.12(a), consists of non-discrete heterogeneity inside and outside the meandering channel structure. The heterogeneity within the fluvial channels are modeled independently of the heterogeneity within the background facies. The heterogeneity inside and outside the channel (in the reference map) is modeled by Gaussian processes using two different Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 150 variogram models. The direction of maximum continuity for the heterogenicity (i.e., the azimuth) is θ=−45 X . Figure 5.12(d) depicts the histogram of the parameter values in the reference model, which indicates a bimodal distribution. We note that to the contrast in the facies values, the variability within channel facies is not as visible as that in the background facies. Figure 5.12(e) shows four (out of 𝑁 = 500) of model realizations that were used as training data to represent the feasible set. These prior models were also used to construct the PCA basis 𝚽 to form the parameterization subspace. In this case, we use 100 basis functions to represent the solution in the parameterization space. However, we start the inversion with 20 basis functions and uniformly increase the number of basis functions to 100 throughout 15 iterations until convergence. The best achievable solution and the initial map are shown in Figures 5.12(b) and 5.12(c), respectively. All other parameters are the same as previous case (discrete facies without within facies variability). Figure 5.12. Tomographic inversion with within facies variability: (a) Reference slowness map; (b) best parameterized approximation of the reference model with 100 leading PCA basis functions; (c) the initial guess for model calibration, which is the mean of the prior realizations; (d) histogram of cell values in the reference slowness map; and (e) 4 sample prior model realizations. The results of parameter reconstruction, for 𝚽𝐯 and 𝐮, are summarized in Figure 5.13, showing the evolution of the parameterized and feasible solutions throughout the iterations. The results show that the method can reconstruct models that include facies and within facies viability. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 151 Interestingly, the algorithm can detect the channel and non-channel facies and identify the general trend in the background heterogeneity. We note also that if only the continuous solution is used to reconstruct the model, the resulting solution provides a lumped image, making it hard to separate the facies distribution from the background heterogeneity, whereas the feasibility mapping step identifies the heterogeneity in addition to the facies configuration. Figure 5.13. Tomographic inversion with within facies variability: inversion iterations showing the parameterized solution 𝜱𝒗 in Step (i) (left) and the feasible solution 𝒖 obtained in Step (ii) (right). 2D Pumping Test: In the second example, we consider integration of data from a groundwater pumping text. Figure 5.14(a) shows the schematic of the test, which includes a pumping well in the middle of a 1500×1500×20m @ domain, which is divided into 100×100 ×1 cells. The well is extracting water with a rate of 60 U I . The top and bottom boundaries of the model have no-flow boundary conditions while the boundary conditions on the left and the right sides of the domain are constant pressures of 20m and 10m, respectively. A uniformly placed network of 25 monitoring wells (Figure 5.14(e)) is used to measure the hydraulic conductivity values and the steady-state pressure heads. The logarithm of the hydraulic conductivity in the Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 152 reference model, depicted in Figure 5.14(b), for the two facies types are -2.3 and 0.5 log(m/s), respectively. The reference and initial log conductivity maps are shown in Figures 5.14(b) and 5.14(d), respectively, while their corresponding pressure distributions are depicted in Figures 5.14(e) and 5.14(f), respectively. Figure 5.14(c) shows the best continuous approximation of the reference map based on a rank-40 PCA approximation. Figure 5.14. Experimental set up in constant rate pumping test: (a) pumping and monitoring well configuration; (b) reference discrete hydraulic conductivity map; (c) best rank-40 PCA approximation of the reference model; (d) the initial model of hydraulic conductivity in the inversion; (e)-(f) the pressure maps corresponding to the reference and initial hydraulic conductivity models. Figure 5.15. (a) Prior training image for intersecting fluvial channels; with (b) four sample realizations of facies model in the training data; and (b) 49 leading PCA basis functions generated from the training samples in 𝜴 𝒑 . The training image in Figure 5.15(a) consists of intersecting channel features is used to generate 𝑁 = 500 facies realizations that are used as prior training data, i.e., 𝐮 - ’s in Ɗ= {(𝐫 - ,𝐮 - ):𝑖 =1,…,𝑁}. Figure 5.15(b) shows four samples from the resulting prior training data. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 153 The prior samples are used to develop the PCA basis functions, 𝚽, (Figure 5.15(c)) for parameterization of the discrete facies models. The inversion process starts by including 20 leading PCA basis functions, and gradually increasing the number of PCA basis function to 40, by adding 2 basis elements per iteration. To examine the significance of projecting the continuous solution onto the correct feasible set, we compare the results from our 𝑘-NN machine learning approach with an alternative method in which discrete thresholding of the parameterized solution iterations 𝚽 𝐯 c are applied in Step (ii) of Equation (5.7). Figure 5.16. Model calibration results for the pumping test using the proposed machine-learning-based solution approach. The parameterized solution 𝜱𝒗 (top) and feasible solution 𝒖 (bottom) are shown for 10 iterations. The feasible solution is obtained by mapping the parameterized solution onto the feasible set at each iteration. Figures 5.16 and 5.17 summarize the solution iterations for our method and the simple thresholding approach, respectively. For the 𝑘-NN method, the value of 𝑘 is set to 3, and the neighborhood template is a circle with radius 25 cells. We note that small values of 𝑘, e.g., 𝑘 =3−10, result in Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 154 reasonable classification/aggregation outputs. When large values of 𝑘 are used, e.g., 𝑘 = 100, the 𝑘-NN algorithm becomes a loose classifier. Comparison between Figures 5.16 and 5.17 shows that mapping onto the feasible set provides superior results to a simple thresholding. The main difference between the two method stems from the absence of lack of correct spatial statistics in the grid-based thresholding approach. From Figure 2.16, it is evident that mapping the parameterized solution onto the correct feasible set provides patterns and spatial statistics that are consistent with the prior model, whereas a discrete thresholding does not help with spatial connectivity and only replaces continuous values with their closest discrete values. It is particularly noteworthy that mapping onto the feasible set tends to correct errors in channel connectivity and eliminate inconsistent lack of connectivity. Figure 5.17. Model calibration results for the pumping test using thresholding to obtain feasible solutions. The parameterized solution 𝜱𝒗 (top) and feasible solution 𝒖 (bottom) are shown for 10 iterations. The feasible solution is obtained by using a thresholding scheme to replace continuous values with their closest discrete level values. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 155 Figure 5.18. Experimental setup for waterflooding: (a) well configuration; (b) reference permeability field (shown for Layers 1-4 and 5-8); (c) evolution of the non-wetting (e.g., hydrocarbon) phase saturation distribution; and (d) evolution of pressure distribution. Two-Phase Flow in 3D Formation: For our last example, we consider two-phase flow in a 3D formation to integrate transient and highly nonlinear observations of pressure and flow rate. The model configuration and the reference map for this example are shown in Figure 5.18(a) and 5.18(b). The model consists of a 1000×1000×160 m @ domain, which is divided into a uniform grid system with 100×100×8 cells of size 100×100×20 m @ . The reference permeability map, with 10 𝑚D and 600 𝑚D values for low and high permeability regions, includes two distinct spatial patterns, each extending four vertical layers of the model (Figure 5.18(b) bottom). There are four fluid injection and five fluid extraction wells in the domain. Constant injection rate and extraction pressure are used to control the wells, while the pressure at the Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 156 injection wells and phase fluid flow rates at the extraction wells are observed (with one month interval) and used for model calibration. In the first 4 years (model calibration stage), 0.8 pore volume of the wetting phase (e.g., water) is uniformly injected into the formation, resulting in 48 observation time intervals. Another 0.8 pore volume is injection during the next four years (the forecast period). The main uncertainty in this example is related to the spatial distribution of rock facies types, which is constrained by assimilating the observed dynamic responses at the well locations. Figures 5.18(c) and 5.18(d) show the dynamic evolution of the non-wetting phase (i.e., hydrocarbon) saturation and fluid pressure in the formation. For model calibration, 𝑁 =1000 prior realizations of facies are generated using the same training image in Figure 5.15(a). The spatial patterns used for the two zones (Layers 1-4 and Layers 5-8) are assumed to belong to the same training image. Figure 5.19(a) shows the four samples from the prior training data. The leading 12 (out of 50) PCA basis function for parameterization are shown in Figure 5.19(b) while the best rank-50 PCA approximations for the two zones are depicted in Figure 5.19(c). Figure 5.19. (a) Prior model realizations; (b) 12 leading PCA basis images; and (c) rank-50 PCA approximation of the reference model. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 157 Figure 5.20 summarizes the inversion results, which are shown separately for Layers 1-4 and 5-8. Figure 5.20. Model calibration results for Example 3 involving two-phase flow in a 3D formation. The two columns on the left show the results for Layers 1-4, and the two columns on the right display the results for Layers 5-8. For the first step of the algorithm (i.e., parameterized solution), the PCA parameterization is initialized using 30 leading basis functions and two new basis functions are added in the 10 Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 158 subsequent iterations to arrive at a rank-50 approximation. For the second stage of the inversion, (mapping the parameterized solution 𝚽 𝐯 c onto the feasible set), Layers 1-4 and 5-8 are mapped onto 𝛀 𝐩 independently. The neighborhood template for the 𝑘-NN algorithm has a size of 40×40, and 𝑘 is 3. The mapping results for Layers 1-4 and 5-8 are depicted in the second and fourth columns of Figure 5.20. Figure 5.21 shows the model response during model calibration (Years 1- 4) and prediction (Years 5-8) phases, which show consistent results with the references model. Figure 5.21. Data match and model predictions for waterflooding example. 5.5. Summary and Discussion An important aspect of applying inverse modeling to physical systems is imposing feasibility constraint. In subsurface flow model calibration, ensuring the model calibration solutions are geologically plausible can be a difficult constraint to enforce when the expected connectivity patterns are complex and hard to preserve during model updating. For instance, updating Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 159 meandering fluvial channels to match the observed data, while preserving their shape and connectivity pattern, is very challenging and difficult to achieve using classical model calibration techniques. In this chapter, we employed machine learning techniques to develop a framework for automatic calibration of complex facies models while maintaining their complex connectivity patterns. Specifically, we defined a feasible set for model parameters that must be observed during model calibration. Using examples from fluvial systems, we described the feasible set to with a large number of prior model realizations that summarize the expected spatial statistics of the solution. To implement the feasibility constraint, we formulated a regularized least-square formulation in which the regularization imposes a complex feasibility constraint. Using alternative directions method of optimization, we split the objective function into two sequential optimization sub-problems, where in the first problem a parameterized model calibration is solved to use dynamic flow data to infer the approximate connectivity patterns with partially honoring the feasibility constraint. In the second step, this solution is mapped onto the feasible set by employing the 𝑘-NN algorithm as a supervised machine learning technique. The 𝑘-NN algorithm is used to learn the mapping operator using multi-dimensional feature and label vectors. Initially, the realizations of the feasible set, e.g., discrete fluvial channel maps, are projected onto a PCA parameterization space to obtain the corresponding continuous maps. A local neighborhood template is then used to scan and store the discrete and corresponding continuous maps in the training data. This obtained learning dataset is then used during the second stage of the inversion solution to implementing the mapping onto the feasible set. Given the continuous solution from the first step of model calibration, the 𝑘-NN algorithm is used to scan the parameterized solution with a predefined template size to identify the 𝑘 best representative features Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 160 in the training dataset. The corresponding discrete labels for those k maps are stored as instances or samples belonging to the scanned locations. After scanning is completed, the stored maps are processed using an aggregation method, where all the relevant samples for a given cell (including those obtained by overlapping templates) are used to characterize the conditional probability for every cell in the domain. Using the resulting conditional probability, a maximum likelihood approach is used to assign facies to each cell of the model. The aggregation step is very important in increasing the consistency of the results and in reducing the computational complexity of the method. Using a series of increasingly complex numerical experiments, we examined the performance of the proposed approach to highlight the importance of honoring complex prior models as feasibility constraint and demonstrating the performance of the machine learning approach in implementing this constraint. Several aspects of the introduced method can be extended. For instance, instead of a supervised learning, which was the core of the introduced method in this chapter, an unsupervised approach may be applicable. In this case, the learning process will only depend on the label features, i.e., no output vectors are available for the learning. We used a simple definition of the feature vector, which was the group of cells that were covered by the template. However, a more sophisticated and efficient description of the feature vector may be considered, e.g., by including kernel functions. In this chapter, we used the proposed approach within a deterministic inverse modeling context, without focusing on probabilistic (Bayesian) model calibration techniques to quantify solution uncertainty. Probabilistic treatment of model calibration problems with complex connectivity patterns is not trivial as the underlying spatial patterns are difficult to represent using simple probabilistic models. Machine learning techniques, such as the one presented in this chapter, can prove effective in preserving complex spatial connectivity patterns. The results Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 161 presented in this chapter were based on synthetic models, albeit with complex connectivity. Application of the presented approach to laboratory and field data is needed to further explore the promise of this approach. Chapter 5: Pattern-Based Feature Learning for Honoring Feasibility Constraints 162 Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 163 CHAPTER 6 PATTERN-BASED FEATURE LEARNING UNDER UNCERTAINTY IN GEOLOGIC SCENARIOS Under uncertainty in geologic scenario, subsurface inverse modelling problems can be formulated in two steps: (i) selection of consistent geologic scenaio(s), and (ii) estimation of unknown rock properties and their spatial distribution from dynamic data. A major implication of this approach is the ability to use the available data to provide geologic insight that can be used to refine, limit or potentially correct initially uncertain geologic scenarios. New inversion frameworks are needed to exploit dynamic data to help geologists improve their geologic scenarios. In chapter 3, we proposed a group-sparse formulation as an effective method to constrain geologic scenarios with flow-related data. As a result, uncertainty in prior geologic scenario were considered during subsurface flow model calibration. In that chapter, the 𝑙 # /𝑙 % -norm was employed to search the groups of parameterization spaces (corresponding to the geologic scenarios) for eliminating inconsistent groups that are not supported by the flow/transport data. In Chapter 5, we introduced a pattern-based model calibration formulation to reconstruct spatially complex facies distributions from dynamic flow response data. In our proposed method in Chapter Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 164 5, we imposed a feasibility constraint to ensure that the solution follows the expected higher-order spatial statistics that are proposed by a “single” geologic scenario. In that chapter, the feasibility constraint was implemented using a supervised machine learning algorithm. In Chapter 5, we defined the feasibility constraint based on the assumption of single geologic scenario. In other words, we did not consider uncertainty in geologic scenarios while developing the proposed pattern-based model calibration formulation. As a result, the obtained feasible solution was constrained on a single geologic scenario, and the flow/transport data were not employed to eliminate inconsistent geologic scenarios. As discussed in Chapter 3, the assumption of single geologic scenario can be risky since it may bias the obtained solution or result in unwanted instabilities. In this section, we extend the pattern-based model calibration formulation developed in Chapter 5 by considering uncertainty in prior geologic scenarios. To this end, several geologic scenarios, denoted by 𝛀 𝐩 # ,…, 𝛀 𝐩 ÿ , are proposed prior to model calibration. SVD algorithm is applied to the model realizations corresponding to each geologic scenario, and a TSVD parameterization space is constructed to compactly represent model realizations within each geologic scenario. Similar to Chapter 3, we adopt 𝑙 # /𝑙 % -norm to activate those parameterization spaces that are best supported by the flow/transport measurements. However, a feasibility constraint is adopted to impose the characteristics (e.g., discreteness and connectivity patterns) of the “refined” geologic scenarios on the inferred parameters. Here, the refined geologic scenarios are constructed by limiting the primary highly-uncertain geologic models to those active and relevant ones that are supported by the inversion data during the iterations of convergence. Similar to Chapter 5, the optimization problem consists of parameterized and feasible versions of the solution, and it is divided into two sub-problems that iteratively converge to the Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 165 parameterized and feasible solutions. The first subproblem searches for the best solution on the parameterization domain and surrounding the current feasible solution. In addition, in the first subproblem the 𝑙 # /𝑙 % -norm is imposed to annihilate the contribution from inconsistent geologic scenarios. In the second step, a supervised pattern-based mapping (similar to Chapter 5) is applied to map the parameterized approximate solution from Step 1 onto the feasible set defined by the proposed geologic scenarios. The supervised learning is based on the 𝑘-NN algorithm and is implemented for local pattern learning within each geologic scenario. 6.1. Problem Statement and Formulation We start with a least-squares model calibration formulation under uncertainty in geologic scenarios (feasible set), i.e., 𝛀 𝐩 = {𝛀 𝐩 # ,…,𝛀 𝐩 ÿ }, as: min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu − 𝐠(𝐮))´ % % 𝑠.𝑡., 𝐮∈𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 ÿ } (6.1) , where 𝐮∈𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 ÿ } ensures that the least-squares solution is restricted to the predefined (uncertain) feasible set. Here, each 𝛀 𝐩 - represents a single geologic scenario that is proposed for conditioning the calibration process. Similar to Chapter 5, the regularization term 𝐮∈ 𝛀 𝐩 = {𝛀 𝐩 # ,…,𝛀 𝐩 ÿ } can equivalently rewritten as: 𝐼»𝐮,𝛀 𝐩 ½=u 0 𝑖𝑓 𝐮∈𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 ÿ } ∞ 𝑖𝑓 𝐮∉𝛀 𝐩 = {𝛀 𝐩 # ,…,𝛀 𝐩 ÿ } (6.2) , which can be added to the objective function in Equation (6.1): min 𝐮 ´𝐂 𝛜 ¶ D E (𝐝 stu − 𝐠(𝐮))´ % % +𝐼»𝐮,𝛀 𝐩 ½ (6.3) Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 166 Similar to Chapter 3, we assume that each prior geologic scenario (denoted by 𝛀 𝐩 -1#,…,ÿ ) is represented by an ensemble of its model realizations, i.e., 𝐔 # =[𝐮 ## 𝐮 #% …𝐮 #ñ ], 𝐔 % = [𝐮 %# 𝐮 %% …𝐮 %ñ ],…, 𝐔 ÿ =.𝐮 ÿ# 𝐮 ÿ% …𝐮 ÿñ /. Prior to model calibration, TSVD parameterization is applied to each 𝐔 - to form the parameterization basis functions 𝚽 - (of dimension 𝑛×𝑘 - ), that are used to compactly represent the model realizations in 𝛀 𝐩 - . Note that, the truncation level 𝑘 - can be different for these groups, depending on the complexity of the geologic scenarios and the desired approximation quality. After combining the SVD basis for each group, the approximate representation of a model can be written in the hybrid basis 𝚽 =.𝚽 # ,𝚽 % ,…,𝚽 ÿ /= [𝚽 )×c D ,𝚽 ) ×c E ,…,𝚽 ) ×c E ] as: 𝐮=𝚽𝐯 =[𝚽 # 𝚽 % … 𝚽 ÿ ][𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ] ð =∑ 𝚽 - 𝐯 - ÿ -1# (6.4) , where 𝚽 denotes the hybrid parameterization domain, and [𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ] ð is a (∑ 𝑘 - ) ÿ -1# ×1 vector that contains the expansion coefficients for representing 𝐮. The vector 𝐯 -,#Õ-Õÿ = [𝑣 -# ,𝑣 -% ,…,𝑣 -c æ ] ð contains the approximation coefficients for the basis elements within each group, i.e., 𝚽 -,#Õ-Õÿ . Clearly, the vector of coefficients, i.e., [𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ], has a group-sparse structure if a model 𝐮 is representable by only a few of the geological scenarios. Hence, only the elements within a few of the groups, i.e., 𝐯 - ’s, assume non-zero values in approximating a given model. Using 𝐮 É = ∑ 𝚽 - 𝐯 - ÿ -1# as the parameterized solution and the mixed 𝑙 # /𝑙 % -norm for promoting group-sparsity in [𝐯 # ð 𝐯 % ð …𝐯 ÿ ð ] ð , the regularized objective function in Equation (6.3) can be reformatted as: min 𝐮,𝐯1[𝐯 D ,…,𝐯 E ] ´𝐂 𝛜 ¶ D E (𝐝 stu − 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½´ % % +𝐼»𝐮,𝛀 𝐩 ½+𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# (6.5a) Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 167 𝑠.𝑡., Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮Ü % % ≤ 𝜖 % or equivalently, min 𝐮,𝐯1.𝐯 D ,…,𝐯 E / 𝐂 𝛜 ¶ # % (𝐝 stu − 𝐠𝚽 - 𝐯 - ÿ -1# % % +𝐼»𝐮,𝛀 𝐩 ½+ 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# +𝜆 % Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮Ü % % (6.5b) , where the regularization term Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮Ü % % ≤𝜖 % is adopted to ensure that the parameterized and feasible solutions, i.e., ∑ 𝚽 - 𝐯 - ÿ -1# and 𝐮, respectively, remain close to each other. Furthermore, the group-sparsity regularization term, i.e., 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# , is incorporated to force the model calibration process to select the relevant geological scenarios (represented through 𝚽 - ’s) that are best supported by the available measurements, i.e., 𝐝 stu , see [Golmohammadi and Jafarpour, 2016]. In the SVD representation, the singular vectors within each group are ordered based on the magnitude of the singular values. The inverse of the singular values, i.e., the diagonal elements of (𝚲𝚲 ð ) ¶# , provide a reasonable choice for the weights matrix 𝐊 - ð 𝐊 - . With this choice, during inversion the leading elements of the SVD basis (those with larger singular values) are penalized less harshly (see Chapter 3 for detailed discussion). In Equation (6.5), 𝐮 in 𝐠(𝐮) (see Equation (6.3)) is replaced by ∑ 𝚽 - 𝐯 - ÿ -1# to (i) decrease the number of unknowns, and (ii) provide flexibility, in presence of 𝐼»𝐮,𝛀 𝐩 ½, to reduce the optimization problem (Equation (6.5b)) into practical sub-problems (see the next section for details). In fact, without this replacement 𝐠(𝐮) and 𝐼»𝐮,𝛀 𝐩 ½ should be evaluated concurrently, which is cumbersome and impractical due to the mathematically inconvenient definition of 𝐼»𝐮,𝛀 𝐩 ½. The next section describes a two-step algorithm, which is based on the method of Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 168 alternating directions (similar to Chapters 4 and 5), for solving the optimization problem concluded in Equation (6.5b). 6.2. Optimization Approach To pursue towards a practical approach for solving the optimization problem in Equation (6.5b), we adopt alternating directions algorithm (similar to Chapters 4 and 5) to break the original optimization problem into two iterative sub-problems. The first optimization sub-problem updates the parameterized solution 𝐯 using a conventional model calibration approach, while the second step maps the parameterized solution onto the closest feasible solution 𝐮∈𝛀 𝐩 . These two sub- problems are summarized as: Step (i) 𝐯 # c ,…,𝐯 ÿ c =argmin 𝐯 D ,…,𝐯 E ´𝐂 𝛜 ¶ D E 𝐝 stu − 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½ ´ % % +𝐼»𝐮 c¶# ,𝛀 𝐩 ½ + 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# + 𝜆 % Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮 c¶# Ü % % (6.6a) Step (ii) 𝐮 c =argmin 𝐮 ´𝐂 𝛜 ¶ D E 𝐝 stu − 𝐠»∑ 𝚽 - 𝐯 - c ÿ -1# ½ ´ % % +𝐼»𝐮,𝛀 𝐩 ½ + 𝜆 # % ∑ Ü𝐊 - 𝐯 - c Ü % ÿ -1# + 𝜆 % Ü(∑ 𝚽 - 𝐯 - c ) ÿ -1# −𝐮Ü % % (6.6b) , which can be simplified to: Step (i) 𝐯 # c ,…,𝐯 ÿ c =argmin 𝐯 D ,…,𝐯 E ´𝐂 𝛜 ¶ D E 𝐝 stu − 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½ ´ % % + + 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# + 𝜆 % Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮 c¶# Ü % % (6.7a) Step (ii) 𝐮 c =argmin 𝐮 𝐼»𝐮,𝛀 𝐩 ½ + 𝜆 % Ü(∑ 𝚽 - 𝐯 - c ) ÿ -1# −𝐮Ü % % (6.7b) Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 169 Similar to Chapter 5, standard gradient-based optimization techniques can be used to minimize the objective function in Equation (6.7a). For this purpose, 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½ needs to be approximated by a linear function using its first order Taylor expansion, which may be costly depending on the dimension and complexity of the forward problem, i.e., 𝐠(.). In addition, as discussed in Chapter 3 and Appendix B, the gradient of ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# is highly nonlinear, and several iterations, which approximate the gradient of ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# by a linear function, are needed to converge to the group- sparse solution at iteration 𝑘 of Equation (6.7a). Therefore, to calculate 𝐯 # c ,…,𝐯 ÿ c at iteration 𝑘 several linear approximations of 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½ may be needed. In the next section, we discuss an approximate method that can be applied to reduce the computational costs of solving this sub- problem. The second subproblem, i.e., Equation (6.7b), is solved by mapping the parameterized solution ∑ 𝚽 - 𝐯 - c ÿ -1# onto the feasible set, i.e., 𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 ÿ }. Similar to the previous chapter, we use our developed pattern-based learning algorithm (see Chapter 5) to first replace the local patterns in ∑ 𝚽 - 𝐯 - c ÿ -1# with those feasible ones from the learning dataset. Then, an aggregation over the overlapping local patterns is applied to assign the feasible values to the entries of 𝐮. One approach for mapping ∑ 𝚽 - 𝐯 - c ÿ -1# onto 𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 ÿ } is to include all the prior geologic scenarios, i.e., 𝛀 𝐩 # ,…,𝛀 𝐩 ÿ , for generating the local learning dataset, i.e., Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 = 1,…,𝑁 Í } (see Section 5.2 for further discussions). However, this approach does not take advantage of the first step of the solution in which the important groups are identified. Therefore, in this case the learning dataset may become highly diverse, as it is constructed by considering all the geologic scenarios. Consequently, irrelevant feature vectors can be assigned as the result of 𝑘-NN classifier. We implement an alternative approach in which the feasible set at each iteration is defined by limiting the geologic scenarios to those that are selected by the group-sparsity formulation in the Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 170 first step. In this approach, irrelevant geologic scenarios are gradually eliminated as the iterations proceed. In the next section, we discuss the effect of refining the geologic scenarios on the mapping step in details. Similar to Chapter 5, simple convergence criteria, such as a threshold on ‖ 𝐮 c −𝐮 c¶# ‖ % % , ‖ 𝐯 c −𝐯 c¶# ‖ % % or the objective function value, can be used to stop the iterations in Equation (6.7). In the two-step alternating directions approach of Equation (6.7), Step (i) updates 𝐯 # ,…,𝐯 ÿ by minimizing the data mismatch while keeping the parameterized approximation error, i.e., 𝜆 % Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮 c¶# Ü % % , small. This regularization term defines a closed neighborhood surrounding 𝐮 c¶# , which is the estimation result obtained from the feasible set in the most recent iteration. Consequently, the updated parameter in Step (i), i.e., 𝐮 w c = ∑ 𝚽 - 𝐯 - c ÿ -1# , remains close to 𝐮 c¶# while seeking a new solution to improve the data match. In fact, the regularization term 𝜆 % Ü (∑ 𝚽 - 𝐯 - ÿ -1# )−𝐮 Ü % % connects the gradual transition from the feasible set to the parameterization space and vice versa, and it is the major difference between the formulation developed here and the methods that find the solution on the parameterization domain and use a mapping for filtering the obtained solution, e.g., thresholding. 6.3. Exploring the Proposed Algorithm In this section, we discuss the properties of the parameterized and feasible solutions, derived from Equations (6.7a) and (6.7b), based on a 2D pumping test example. The configuration of the pumping test and the reference hydraulic conductivity map are depicted in Figures 6.1(a) and 6.1(b), respectively. Figure 6.1(e) demonstrates the reference pressure field and location of the pumping and monitoring wells. As depicted in this figure, 25 uniformly distributed monitoring Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 171 wells measure the steady-state pressure values, and these pressure values as well as the hydraulic conductivity samples at location of the monitoring wells are the sources of data for inversion. In this example, dimension of the domain is 1500×1500×20m @ , and it is divided into 100×100 ×1 cells. The pumping well extracts water with a rate of 60 U I in the middle of the domain. The top and bottom boundaries of the model have no-flow boundary conditions while the boundary conditions on the left and the right sides of the domain are constant pressures of 20m and 10m, respectively. The logarithm of the hydraulic conductivity in the reference model, depicted in Figure 6.1(b), for the two facies types are -2.3 and 0.5 log(m/s), respectively. Figure 6.1. Experimental set up for the constant rate pumping test: (a) pumping and monitoring well configuration; (b) reference discrete hydraulic conductivity map; (c) best PCA approximation of the reference model using the basis functions of the true geologic scenario; (d) the initial model of hydraulic conductivity in the inversion; (e)-(f) the pressure maps corresponding to the reference and initial hydraulic conductivity models. Three different training images, as illustrated in Figure 6.2(a), are provided to model the distinct connectivity patterns in the geologic scenarios. In addition, the direction of continuity is another source of uncertainty in proposing the geologic scenarios. As a result, 10 different geologic scenarios that differ in type and direction of connectivity are proposed as prior knowledge. 500 unconditional model realizations are sampled from the training images to represent each geologic scenario, i.e., 𝛀 𝐩 -1#:#G . In Figure 6.2(b), four samples, out of 500, are illustrated for each geologic scenario. Furthermore, Figure 6.2(c) represents the PCA parameterization basis functions, i.e., 𝚽 -1#:#G , that are generated to compactly represent the realizations within each geologic scenario. Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 172 Also, the best achievable solution through these parameterization spaces is depicted in Figure 6.1(c). Figure 6.2. (a) Three TIs with meandering (left), intersecting (middle), and straight (right) channel features; (b) four sample realizations simulated from each TI. The TI with meandering channel is considered with two rotation angles (𝜽= 𝟎 ° and 𝜽=𝟗𝟎 ° ) while the other two TIs each lead to four alternative scenario with rotation angles 𝜽=𝟎 ° ,𝟒𝟓 ° ,𝟗𝟎 ° ,𝟏𝟑𝟓 ° ; (c) Sample TSVD basis elements corresponding to different geologic scenarios, i.e., 𝜱=[𝜱 𝟏 𝜱 𝟐 …𝜱 𝟏𝟎 ]; and (d) initial coefficients, 𝐯 G . The initial hydraulic conductivity map (𝐮 G ), which is considered to be the mean of all the samples in the 10 geologic scenarios, as well as the obtained initial pressure map are depicted in Figures Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 173 6.1(d) and 6.1(f), respectively. Furthermore, 𝐯 G , which is shown in Figure 6.2(d), is obtained by equally projecting 𝐮 G on all the 10 parameterization spaces. At iteration 𝑘 of Equation (6.7a), the regularization term 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# encourages the coefficients of the parameterization, i.e., 𝐯 c =[𝐯 # c ,…,𝐯 ÿ c ], towards a block-sparse solution. However, as discussed in Chapters 2-3 and Appendix B several iterative linear approximations of the derivative of 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# and consequently 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½ are needed to obtain the exact block-sparse solution in Equation (6.7a) (see Appendix B). This can be computationally cumbersome and impractical in high-dimensional and nonlinear problems. To overcome this issue in solving the optimization sub-problem in Equation (6.7a), we start from 𝐯 c¶# = [𝐯 # c¶# ,…,𝐯 ÿ c¶# ], linearize 𝐠»∑ 𝚽 - 𝐯 - ÿ -1# ½ and the derivative of 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# surrounding 𝐯 c¶# , and obtain 𝐯 c = [𝐯 # c ,…,𝐯 ÿ c ] throughout one iteration. As a result, a group-sparse solution may not be achieved at each iteration, specially in the primary iterations. However, the results of inversion, which are presented next, suggest that the optimal solution has group-sparse property, and it smoothly converges to a block-sparse solution. Under this assumption, 𝐯 c is obtained by solving: Step (i) 𝐯 c =argmin 𝐯 ´𝐂 𝛜 ¶ D E (𝐝 stu −(𝐠(𝚽𝐯 c¶# )+𝐆 𝐯 (𝐯−𝐯 c¶# )))´ % % + + D E % 𝐯 ð 𝐖 c 𝐯+ 𝜆 % Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮 c¶# Ü % % (6.8) As discussed in the previous Chapters, 𝐆 𝐯 contains the derivatives of 𝐠(.) with respect to 𝐯 calculated at 𝐯 c¶# . Furthermore, 𝐖 c is a diagonal matrix that approximates the derivative of 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# at iteration 𝑘 with a linear function, i.e., 𝛁»∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# ½≈𝐖 c 𝐯 (see Appendix B for derivation). Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 174 The inversion results for Equations (6.7a) and (6.7b), i.e., the parametrized (𝚽𝐯) and feasible (𝐮) solutions, are shown for 25 iterations of convergence in Figures 6.3(a) and 6.3(b), respectively. Figure 6.3. Model calibration results for the pumping test using the proposed algorithm: The parameterized solution 𝜱𝒗 (a) and feasible solution 𝒖 (b) are shown for 25 iterations (the results are shown for iterations 0-9, 12, 15, 18, 22 and 25); (c) final coefficients 𝒗 %j ; and (d) obtained pressure from the final feasible solution 𝒖 %j . In this example, we use the samples of the entire geological scenarios, i.e., 𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 #G }, to generate the learning dataset for the mapping step (see Chapter 5 and the discussion for the mapping stage). Overall, 2000 local feature/label pairs with template size of 40×40 are generated to construct the learning dataset, and 𝑘 is set to 5 in 𝑘-NN classification. The inversion results (Figures 6.3(a) and 6.3(b)) indicate that the algorithm tends to capture the global connectivity patterns in the initial iterations and refines the local connectivity structures in later iterations. This Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 175 observation is in agreement with the results that were obtained in Chapter 5. As discussed in Chapter 5, the algorithm refines the local connectivity patterns by capturing the multiple point statistics that are learned from the geological scenarios in the mapping step, i.e., Equation (6.7b). Figure 6.3(c) depicts the coefficients of the parameterization in the last iteration, i.e., 𝐯 %j , and the obtained pressure map from the optimal feasible solution, i.e., 𝐮 %j , is demonstrated in Figure 6.3(d). Figure 6.4 illustrates the 𝑙 % -norm of the coefficients of each group during the 25 iterations of the convergence. Initially, the parameterization coefficients corresponding to all geologic scenarios, i.e., 𝐯 # ,…,𝐯 #G , have the same 𝑙 % -norm. As iterations proceed, the first geologic scenario, which consists of similar connectivity patterns to the reference hydraulic conductivity map, becomes the dominant group, and contribution of the other groups diminishes smoothly. Figure 6.4. Selected geologic scenarios based on their contribution in representing the parameterized solution: Here, similar to Chapter 3 the 𝒍 𝟐 -norm of the coefficients within each group, i.e., ‖𝒗 - ‖ % , is calculated as the measure of activeness for each geologic scenario. Top and middle rows correspond to the iterations 0-9, and the bottom row corresponds to the iterations 12, 15, 18, 22 and 25. Mapping onto the Refined Feasible Set: In the example discussed in Section 6.3, we used the entire set of geological scenarios, i.e., 𝛀 𝐩 = {𝛀 𝐩 # ,…,𝛀 𝐩 ÿ }, to generate the learning dataset for the mapping stage, i.e., Ɗ Í ={(𝐱 - ,𝐲 - ):𝑖 =1,…,𝑁 Í } (see Chapter 5). This can reduce the efficiency and accuracy of the obtained feasible solution in the mapping stage, i.e., Equation (6.7b), since the learning dataset consists of highly-diverse learning samples. Next, we discuss Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 176 how the selection property of the regularization term 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# can be used to select the relevant geological scenarios and refine the learning dataset at each iteration of Equation (6.7). As demonstrated in Figure 6.4, the group-sparsity regularization 𝜆 # % ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# smoothly promotes a block-sparse property for 𝐯=[𝐯 # ,…,𝐯 ÿ ] at iterations of Equation (6.7). As discussed in Chapter 3, the active groups of coefficients in 𝐯= [𝐯 # ,…,𝐯 ÿ ] contribute more in representing the parameterized solution, i.e., 𝚽𝐯, and have larger 𝑙 % -norm. Therefore, the selected geologic scenarios, i.e., those with larger 𝑙 % -norm in 𝐯=[𝐯 # ,…,𝐯 ÿ ], can be used to refine the initially diverse learning dataset generated from 𝛀 𝐩 = {𝛀 𝐩 # ,…,𝛀 𝐩 ÿ }. Figure 6.5(a) demonstrates the 𝑙 % -norm of the coefficients, i.e., ‖𝐯 # ‖ % ,…,‖𝐯 #G ‖ % , for iterations 1, 15 and 25 of the example discussed in Section 6.3. As shown in this figure, a threshold on ‖𝐯 # ‖ % ,…,‖𝐯 #G ‖ % values can be applied to determine the active groups that have more significant contributions in representing the parameterized solution, i.e., 𝚽𝐯, at each iteration. Therefore, at iteration 𝑘 of Equation (6.7b) the relevant geologic scenarios can be limited to those with Ü𝐯 - c Ü % values that exceed a specified threshold. After refining the geologic scenarios to the selected ones, i.e., 𝛀 𝐩 lmn-)mo , a learning dataset can be constructed based on these refined geologic scenarios for the mapping step. Figures 6.5(b) and 6.5(c) demonstrate this with an example. In Figure 6.5(b), a 40×40 local template is located inside 𝚽𝐯 domain (the parameterized solution in iteration 3 of the Example in Section 6.3), and the pixel values inside the template are extracted to construct the feature vector, i.e., 𝐱. In Figure 6.5(c), the selected features and their corresponding label vectors (𝑘 =10 in 𝑘-NN classifier) are demonstrated for two cases in which the learning dataset is constructed based on the (i) 𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 #G } (left), and (ii) refined geologic scenarios, i.e., 𝛀 𝐩 lmn-)mo (right). As it is clear in this figure, although the feature vectors in these two cases are Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 177 close to 𝐱, i.e., the feature vector demonstrated in Figure 6.5(b), the corresponding label vectors on the right are more representative and accurate compared to those on left. Figure 6.5. Effect of refining learning dataset on the mapping step: (a) selected geologic scenarios based on their contribution in representation of the parameterized solution; (b) the parameterized solution in iteration 3 and a sample feature vector constructed from the depicted spatial template; and (c) selected feature and their corresponding label vectors based on (left) entire (𝜴 𝒑 ) and (right) refined (𝜴 𝒑 𝒓𝒆𝒇𝒊𝒏𝒆𝒅 ) learning dataset. This is important in the aggregation step since more representative label vectors increase the accuracy of the empirical conditional probabilities and, hence, the mapping results. Figure 6.6 shows the empirical probability maps and mapping results for these two cases. As it is clear in this Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 178 figure, the empirical probability map and obtained feasible solution (𝐮 @ ) are more accurate when the geologic scenarios and consequently the learning dataset are limited to the selected ones, i.e., 𝛀 𝐩 lmn-)mo . Next, we briefly discuss the effect of regularization terms in solving the optimization sub-problem in Equation (6.7a). Figure 6.6. The result of mapping and obtained empirical probability maps for the two cases in which the learning dataset is constructed based on the (i) 𝜴 𝒑 ={𝜴 𝒑 𝟏 ,…,𝜴 𝒑 𝟏𝟎 } (top), and (ii) refined geologic scenarios 𝜴 𝒑 𝒓𝒆𝒇𝒊𝒏𝒆𝒅 (bottom). Effect of Regularization Terms: The two regularization terms in Equations (6.7), i.e., 𝜆 # % and 𝜆 % , control the characteristics of the parameterized and feasible solutions. We first note that a sensitivity analysis needs to be done to properly adjust the values of these two regularization terms. Our observations suggest that the parameterized solution in Equation (6.7a) does not show significant sensitivity to the values of these two regularization terms within a proper range. However, when any of these two parameters are either grossly over/under-estimated the updated solution may become unacceptable. We first discuss the effect of 𝜆 # % in Equation (6.7a), which controls the impact of the regularization term ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# , i.e., promoting block-sparse property in 𝐯=[𝐯 # ,𝐯 % ,…,𝐯 ÿ ]. Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 179 When 𝜆 # % is underestimated, the regularization term ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# becomes less important in Equation (6.7a). The first consequence of this observation is that the block-sparse property may not be promoted in 𝐯=[𝐯 # ,𝐯 % ,…,𝐯 ÿ ], and geologic scenario selection may fail. Furthermore, in this case the ill-posedness may increase since the parameterization space 𝚽 consists of highly- diverse subspaces, i.e., 𝚽 # ,...,𝚽 ÿ . Therefore, without a proper regularization term it may result in unacceptable updates of 𝐯. On the other hand, when 𝜆 # % is overestimated the regularization term ∑ ‖𝐊 - 𝐯 - ‖ % ÿ -1# dominates the solution and may result in values close to zero. The coefficient 𝜆 % regularizes the constraint Ü(∑ 𝚽 - 𝐯 - ) ÿ -1# −𝐮Ü % % ≤𝜖 % , used in Equation (6.5a) to ensure that the parameterized and feasible solutions remain close to each other. Underestimating this regularization term results in ignoring the feasibility constrain and reducing the problem statement to the group-sparse formulation discussed in Chapter 3. Hence, in this case the feasible solution does not constraint the parameterized solution, and the two stages of the Equation (6.7) act independently from each other. On the other hand, overestimating 𝜆 % results in ignoring the data mismatch and group-sparsity terms. In this case, the updated parameterized solution at iteration 𝑘 of Equation (6.7a) will be the very close to the projection of 𝐮 c¶# onto the parameterization basis functions. Therefore, by overestimating 𝜆 % the two stages of the optimization problem in Equation (6.7) may not help each other, and the algorithm may get stock in nonconvergent (𝐯,𝐮) and not converge to the optimal point. 6.4. Numerical Results Here, we present two sets of numerical examples to explore the performance of the inversion algorithm discussed in the previous section. The first example is based on a straight-ray travel time Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 180 tomographic inversion, which leads to a linear inverse problem. The second experiment involves a large-scale ground water pumping test in which the facies represent the hydraulic conductivity of the field. In this case, steady-state pressure head data are used to reconstruct the spatial distribution of lithofacies that represent the hydraulic conductivity maps. Figure 6.7. Tomographic inversion set up: (a) configuration of the tomographic inversion; (b) reference slowness map; (c) best achievable solution through the parameterization; and (d) initial starting model 𝜱𝒗 G . Travel-Time Tomographic Inversion: In travel-time tomographic inversion, the slowness (1/velocity) map of a subsurface environment is inferred from observations of acoustic wave travel time on straight paths (see Section 1.1 for forward equations). Figure 6.7(a) shows the configuration of this example with a set of transmitter (left) and receiver (right) arrays. A 1000×1000 m % domain is discretized into 100×100 cells of dimensions 10×10 m % . In this example, 6 transmitters and 6 receivers are used, resulting in 36 arrival time observations. The reference slowness map in this case is shown in Figure 6.7(b). The low (blue regions) and high (red regions) slowness values are log(10) and log(600) 𝑚s/m, respectively. The prior TIs and 4 realizations (out of 500) of each geologic scenarios are depicted in Figures 6.2(a) and 6.2(b), respectively. As it is clear in Figure 6.2(b), besides connectivity patterns direction of continuity is considered to be uncertain resulting in 10 different geologic scenarios, i.e., 𝛀 𝐩 ={𝛀 𝐩 # ,…,𝛀 𝐩 #G }. Similar to Chapter 3, the TSVD parametrization is designed to preserve 80% of energy within each geologic scenario. As a result, 𝚽 consists of 466 basis functions, i.e., 𝚽= [𝚽 # 𝚽 % …𝚽 #G ]∈𝐑 #GGGG× WXX . The best achievable solution using the parameterizing basis Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 181 functions, and the initial model (𝚽𝐯 G ) are depicted in Figures 6.7(c) and 6.7(d), respectively. In the mapping step (Equation (6.7b)), we use a square template of size 40×40 to generate the local feature vectors, and 𝑘 is set to 5 in the 𝑘-NN classification. At each iteration, the selected geologic scenarios and consequently the learning dataset are limited to those that 𝑙 % -norm of their coefficients, i.e., ‖𝐯 -1#:#G ‖ % , falls above the threshold value of {10% of max (‖𝐯 -1#:#G ‖ % )}. Figure 6.8. Tomographic inversion example: (a) parameterized (𝜱𝒗 𝒌 ) and (b) feasible (𝒖 𝒌 ) solutions in iterations 𝒌=𝟎−𝟗. The reference model in this experiment has similar connectivity patterns to those in the fifth and ninth geologic scenario, i.e., 𝛀 𝐩 j and 𝛀 𝐩 @ in Figure 6.2(b). The inversion results for the parameterized (𝚽𝐯) and feasible (𝐮) solutions are shown in Figures 6.8(a) and 6.8(b), respectively, for iterations 0-9. In addition, Figures 6.9(a) and 6.9(b) depict the coefficients of the parameterization (𝐯=[𝐯 # 𝐯 % …𝐯 #G ]) and the ‖𝐯 -1#:#G ‖ % , respectively, for iterations 0-1-2-7 and 9. Figure 6.8 illustrates that the inversion algorithm is able to retrieve the complex meandering structure that exists in the reference slowness model during the iterations of convergence. Similar to the examples in Chapter 5 and Section 6.3, we observe that the method captures the global connectivity patterns in the initial iterations, and it enhances the local connectivity during the later iterations. In addition, Figure 6.9 demonstrates the efficiency of the algorithm in selecting the Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 182 relevant geologic scenarios and eliminating the irrelevant ones. As it is clear in this figure, initially all the geologic scenarios have the same contribution in representing the parameterized solution. However, during the iterations the geologic scenarios 𝛀 𝐩 j and 𝛀 𝐩 @ become dominant, and other groups lose their contribution in repressing the parameterized solution. Figure 6.9. (a) coefficients of the parameterizations, i.e., 𝒗, in iterations 0-1-2-7 and 9 (from top to bottom); (b) 𝒍 𝟐 −norm of the coefficients within the 10 geologic scenarios during the same iterations. Large Scale 2D Pumping Test: We now focus on a single-phase flow in a groundwater aquifer. A pumping-test experiment, using a high-dimensional 2D model, is considered to explore the inversion method discussed in this chapter. The experimental setup for this case is depicted in Figure 6.10(a). The aquifer has dimensions 3500 m×3500 m×10 m and is discretized into a domain with 200×200×1 uniform grid cells. The reference hydraulic conductivity field as well as location of the pumping and monitoring wells are shown in Figure 6.10(b). In the reference hydraulic conductivity map, the low and high log-HC values are -2.4 Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 183 (background) and 0.6 (channel), respectively. The initial hydraulic conductivity (𝚽𝐯 G ), reference and initial pressure head fields are shown in Figures 6.10(e), 6.10(c) and 6.10(f), respectively. Figure 6.10. Experimental set up in constant rate pumping test: (a) pumping and monitoring well configuration; (b) reference hydraulic conductivity map as well as location of pumping (green stars) and monitoring (black dots) wells; (c) pressure map corresponding to the reference hydraulic conductivity map; (d) best achievable solution using the parametrizing basis functions; (e) the initial model of hydraulic conductivity in the inversion; and (f) the pressure maps corresponding to the initial hydraulic conductivity model. Figure 6.11. (a) The proposed training image to model the connectivity patterns; and (b) 4 geologic scenarios initiated based on uncertainty in the direction of continuity. Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 184 In this example, 4 pumping wells (see Figure 6.10(b)) are located at the cells with coordinates (50,50), (50,150), (150,50) and (150,150) of the model domain and extract water with a constant flowrate of 0.0578 U I u at the steady state aquifer conditions. The boundary conditions on the left and right of the domain are assigned constant pressures of 𝑝 # =20 m and 𝑝 % =10 m, respectively, resulting in a background flow from left to right. The governing equations of the forward model are derived from Darcy’s law and mass balance principle for a single-phase flow in heterogeneous and saturated porous environment (see Section 1.1). The training image depicted in Figure 6.11(a) models the connectivity patterns in this example. At the same time, the direction of continuity is considered to be the uncertain variable (θ=0 s ,45 s ,90 s ,135 s ) in developing the geologic scenarios. With these assumptions, a total number of 4 geologic scenarios are constructed, in which 4 realizations of each (out of 1000) are shown in Figure 6.11(b). The TSVD parameterization is used to retain 80% of the energy for each group, resulting in 𝚽= [𝚽 # ,…,𝚽 W ]∈𝐑 WGGGG×WGG . The best hydraulic conductivity map that can be achieved in the subspace defined by these bases functions is displayed in Figure 6.10(d). Pressure and hydraulic conductivity samples from the 49 monitoring wells, which are uniformly located around the pumping well (see Figure 6.10(b)), are used for inversion. For this experiment, as it is clear in Figure 3.11(b) the geologic scenario 𝛀 𝐩 @ best represents the connectivity patterns and direction of continuity in the reference map (Figure 6.10(b)). Figures 6.12(a) and 6.12(b) demonstrate the parameterized (𝚽𝐯) and feasible (𝐮) solutions obtained in iteration 1-5, 15, 20, 25, 30 and 40 of convergence. Similar to the previous examples in this chapter, we observe that the global connectivity structures are reconstructed in the initial iterations of convergence, and the later iterations promote the local connectivity patterns by learning multiple point statistics that are provided by the geologic scenarios, i.e., 𝛀 𝐩 lmn-)mo . In this Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 185 example, the learning dataset contains 4000 local feature/label pairs of size 80×80, 𝑘 is set to 5, and 10% of the cells are randomly picked for local template matching (𝑘-NN step). Figure 6.12. Inversion results in iterations 1-5, 10, 15, 25, 30 and 40 of convergence: (a) parameterized solution (𝜱𝒗 𝒌 ); and (b) feasible solution (𝒖 𝒌 ). Figure 6.13. Coefficients of parameterization, i.e., 𝒗=[𝒗 𝟏 𝒗 𝟐 𝒗 𝟑 𝒗 𝟒 ], (a) 𝒗 𝟎 and (b) 𝒗 𝟒𝟎 ; (c) 𝒍 𝟐 −norm of the coefficients within the 4 geologic scenarios during the iterations 1-5, 10, 15, 25, 30 and 40. Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 186 Figures 6.13(a) and 6.13(b) display the behavior of the initial and final coeffects, i.e., 𝐯 G and 𝐯 WG , respectively. In addition, Figure 6.13(c) depicts the 𝑙 % -norm of the coefficients corresponding to the geologic scenarios in iterations 1-5, 10, 15, 25, 30 and 40. As it is clear in this figure, initially all the 4 geologic scenarios have the same contribution in representing the parameterized solution. However, during the iterations the geologic scenario 𝛀 𝐩 @ , which is more representative of the reference hydraulic conductivity map, becomes dominant, and other groups lose their contribution in repressing the parameterized solution. 6.5. Summary and Discussion Accurate prediction of the flow and transport processes in subsurface formations hinges on reliable knowledge about the distribution of rock flow properties such as permeability, hydraulic conductivity and porosity. While extensive data acquisitions are conducted to generate predictive reservoir models, due to inconvenient and costly measurements, significant uncertainty remains not only about the local distribution of these properties, but also about the global geologic continuity model, such as a variogram or training image. These uncertainties complicate subsurface model development and raise valid concerns about dynamic performance predictions. Monitoring and production data are typically used to calibrate initial models to reproduce field observations, a process known as inverse problem. Even after incorporating dynamic response data, a considerable portion of the initial uncertainty is expected to persist, especially at early stages of field development when available data is limited. Traditionally, the focus of the subsurface inverse problems have been on updating initial models while honouring a single model of geologic continuity (e.g., variogram or training image). However, geologic continuity models are derived from limited data and subjective Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 187 interpretations/assumptions and, hence, carry significant uncertainty. Neglecting this source of uncertainty during inversion can lead to significant bias in the calibrated models and their corresponding performance predictions. Furthermore, fixing the geologic continuity models during inversion eliminates the opportunity to use available data to update such models and to remove potentially large errors in specifying the global continuity patterns in the property distribution. In this chapter, we developed a novel formulation for identifying geologic continuity model(s) that are consistent with inversion data from multiple candidate scenarios. The developed approach is inspired by group-sparsity regularization (Chapter 3) and the concept of pattern-based mapping onto feasible sets (Chapter 5). Similar to Chapter 3, the presented formulation combines compressive transforms, here TSVD, for block-sparse parameterization of spatially correlated random fields. In addition, the group-sparsity regularization is applied to promote geologic scenarios that are better able to reproduce the observed data. The parameterization is used to compactly represent the dominant features of each candidate geologic model while the group- sparse regularization serves as a selection operator to pick consistent geologic scenarios. In addition, similar to Chapter 5 a feasibility constraint is considered to promote the geologic connectivity patterns that are suggested by the geologic scenarios. The inversion approach, i.e., the optimization problem, developed in this chapter consists of parameterized and feasible versions of solution. Similar to Chapter 5, the parameterized solution is obtained on the parameterizing basis functions, which decompose the geologic scenarios into their compact representations, and the feasible solution is obtained from the feasible set. The optimization problem has 4 major components, which are (i) a misfit term to constrain the data match, (ii) a group-sparsity regularization term to promote block-sparse property, (iii) a feasibility Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 188 constraint to promote connectivity patterns that are suggested by the geologic scenarios, and (iv) a least-squares constraint to ensure that the parameterized and feasible solutions remain close to each other at the convergence. Using the alternating direction algorithm, the optimization problem is decomposed into two sub-problems, in which the first subproblem updates the parameterized solution and the second one maps the parameterized solution onto the selected geologic scenarios. The selection step (for the geologic scenarios and consequently learning dataset) is done by refining the geologic scenarios to those that have significant contribution in representing the parameterized solution at each iteration of convergence. Chapter 6: Pattern-Based Feature Learning under Uncertainty in Geologic Scenarios 189 Chapter 7: Summary, Conclusions and Future Work 190 CHAPTER 7 SUMMARY, CONCLUSIONS AND FUTURE WORK In this chapter, we first briefly summarize the main topics and research elements that this dissertation covers. Then, we outline the main conclusions that are obtained in this work. At the end, we provide future works and potential lines of research that can be directed based on the obtained conclusions or uncompleted tasks related to this dissertation. 7.1. Summary In this dissertation, we focused on subsurface parameter estimation problems, specifically inference of 2D or 3D rock properties, under (i) uncertainty in geologic scenarios and (ii) feasibility constraints. In our terminology, “geologic scenarios” are conceptual models that are designed for representing rock properties, and they are developed under specific modelling assumptions, e.g., connectivity patterns or direction of continuity. In addition, “feasibility constraints” refer to the specific modeling assumptions, e.g., connectivity patterns, that are embedded within each geologic scenario. Chapter 1 of the dissertation was an introduction to subsurface inverse problems. In addition, in that chapter we introduced the general scope of the research in this work. In Chapter Chapter 7: Summary, Conclusions and Future Work 191 2, we introduced group-sparsity regularization and discussed its model selection property. Furthermore, we explored (i) tree structure of wavelet basis functions and (ii) sprase-SVD algorithm for the purpose of learning group-sparse parameterization spaces. In Chapter 3, we introduced another application of group-sparsity regularization for geologic scenario selection. In Chapter 4 of the thesis, we applied discrete regularization, i.e., potential, functions combined with sparsity-promoting dictionaries (developed based on 𝑘-SVD algorithm) to promote discreetness in inference of discrete geologic facies. In Chapter 5, we extended the objective of facies reconstruction from preserving discreetness (the objective in Chapter 4) to promote “feasibility constraints”, e.g., connectivity patterns, that are provided by a single geologic scenario. In that chapter, feasibility constraints were promoted by a mapping operator from the parametrization domain onto the feasible set, and a supervised pattern-learning algorithm was introduced to learn and employ the mapping operator. Chapter 6 was devoted to enhancing the objective of Chapters 3, i.e., geologic scenario selection, by imposing additional constraints (feasibility constraints) that are provided by geological scenarios. The current chapter contains the summary, conclusions and future work for this dissertation. 7.2. Conclusions The key conclusions that are obtained as results of this work can be summarized as following: v Chapter 2: • 𝑙 # /𝑙 % -norm is a stronger regularization term to reconstruct block-sparse signals compared to regular sparsity-promoting penalties, e.g., 𝑙 # -norm. This property of 𝑙 # /𝑙 % - norm can be applied to decrease the ill-posedness level in reconstructing group-sparse Chapter 7: Summary, Conclusions and Future Work 192 signals. However, grouping structure of block-sparse signals needs to be known prior to inversion. • Concept of block-sparsity can be taken advantage to learn or design proper parameterization domains that promote group-sparse properties. As a result, parameterization domains designed by incorporating group-sparse features can further decrease the ill-posedness in inverse problems. • We explored the tree structure of wavelet basis functions to design a generic group- sparse parameterization domain. We applied group-sparse wavelet basis functions regularized with 𝑙 # /𝑙 % -norm in subsurface inverse problems, and we observed that tree structure of wavelet basis functions can help to better preserve spatial connectivity trends through the parent-child relationship that is intrinsically embedded in their structure. • We applied sparse-SVD algorithm on a set of predefined basis functions to learn block- sparsity promoting parameterization spaces. As a result, we observed that prior geologic model realizations can be used to restructure the basis functions (of a predefined parameterization domain) into another group-sparse domain that carries the information provided by a geologic scenario. Similar to the tree structure of wavelet basis functions, during inversion we applied a weighted version of 𝑙 # /𝑙 % -norm to promote the group-sparsity on the leaned groups of basis functions. v Chapter 3: • We developed a systematic inversion framework for simultaneous subsurface parameter estimation and prior model selection under uncertainty in geologic scenarios. Chapter 7: Summary, Conclusions and Future Work 193 • We defined a group-sparse parameterization space, in which each group consists of PCA basis functions that compactly represent realizations of each geologic scenario. Further, we applied 𝑙 # /𝑙 % -norm to regularize the inverse problem during inversion. We observed that 𝑙 # /𝑙 % -norm is capable of selecting the parameterization spaces that best agree with the information available through measured data that are used in inversion. Finally, the geologic scenarios corresponding to the active parameterization groups were considered to be relevant, and this criterion was applied to refine the initially uncertain geologic scenarios. • Our results indicated that group-sparsity regularization is an effective method to eliminate the geologic scenarios that are not supported by the inversion data. v Chapter 4: • We introduced a novel regularization term, based on the concept of potential functions, to promote pixel-level discreetness in subsurface inverse problems. • We noticed that discrete regularization terms, e.g., potential functions, can help promoting discreetness in inference of discrete geological facies models. However, these type of regularization terms need to be applied in presence of a proper parameterization space. Otherwise, discreetness may be locally promoted, while connectivity may be undermined globally. • Our results indicated that discrete regularization terms can further enhance the performance of sparsity-promoting dictionaries (learned through 𝑘-SVD algorithm) in subsurface discrete image reconstruction problems. • Here, the optimization problem consisted of “parameterized” and “feasible” solutions. To solve the optimization problem, we divided it into two conditional sub-problems Chapter 7: Summary, Conclusions and Future Work 194 based on the idea of alternating directions algorithm. In the first step of the algorithm, a solution was obtained on the parameterization domain and surrounding the most current discrete solution. Then, the discrete regularization was applied in the second sub-problem to assign discrete values to each cell and update the feasible solution. v Chapter 5: • We developed a novel inversion framework to restrict the solution of subsurface inverse problems to predefined “feasible set” corresponding to a single geologic scenario. • The feasibility constraint was defined based on an indicator function, which took values 0 and ∞ for feasible and non-feasible solutions, respectively. Similar to Chapter 4, the optimization problem consisted of both “parameterized” and “feasible” solutions. Similarly, to solve the optimization problem we divided it into two sub-problems based on the alternating directions algorithm. However, contrary to Chapter 4 the feasible solution was obtained through a mapping from the parameterization space onto the feasible set. • We introduced a supervised pattern-learning algorithm to learn the mapping operator from parameterization space onto feasible set. Prior to learning the mapping operator, we generated a feature/label dataset, in which the features contained local non-feasible patterns while the label vectors carried the corresponding feasible values. In the mapping step, we first searched over the parameterized solution and used 𝑘-NN classifier to find similar features from the learning dataset to those existing in the parameterized solution. Then, we stored the corresponding feasible local patterns as a replacement to the existing ones in the parameterized solution. In the second step of Chapter 7: Summary, Conclusions and Future Work 195 the algorithm, we applied an aggregation on the overlapping label vectors and assigned the most probable feasible values to each grid-block of the feasible solution. Final assignment of feasible values was based on cell-level empirical probability distributions, which were calculated based on the overlapping feasible patterns. • We observed that parameterized and feasible solutions help each other in the two sub- problems of the original optimization problem in converging to a solution that belongs to the feasible set. In fact, without the feasibility constraint a similar mapping would take the parameterized solution onto the feasible set. However, our developed formulation guarantees that the parameterized and feasible solutions will be close to each other in the convergence point. This is done through an additional least-squares constraint that is added to the original optimization problem to force the parameterized and feasible solutions remain close to each other. In the two sub-problems of the original optimization problem, this least-squares constraint plays the role of a feedback operator from feasible set to parameterization space and vice versa. • The obtained results show that our supervised pattern-learning algorithm is capable of learning multiple point statists provided by a geologic scenario and projecting it onto the feasible solution in the mapping step. Our algorithm goes beyond reconstruction of discrete facies models (which are typically represented by training images), and it is can be applied to learn complex non-discrete patterns too, e.g., channelized models with variability and heterogeneity within each facies. v Chapter 6: • We used the group-sparsity regularization combined with the feasibility constraints for the purpose of geologic scenario selection. Chapter 7: Summary, Conclusions and Future Work 196 • We observed that performance of the formulation developed in Chapter 3 (for geologic scenario selection) enhances by incorporating the feasibility constraints that are imposed by geologic scenarios. • Similar to Chapter 3, we noticed that group-sparsity regularization promotes block- sparsity in the groups of a predefined block-sparse parameterization space. In addition, the feasibility constraints force the solution of our optimization problem towards additional properties that are embedded in geologic scenarios; those that are not captured by the parameterizing basis functions. Therefore, compared to Chapter 3 the formulation developed in Chapter 6 has a stronger performance in geologic scenario selection since, in addition to group-sparsity regularization, it incorporates feasibility constraints in this process. 7.3. Future Work There are many directions that could be pursued in areas relevant to the elements and objective of this dissertation. Our suggestions for future research are as follows: v Although we introduced systematic frameworks for geologic scenario selection in Chapters 3 and 6, our methodologies were based on deterministic approaches. A probabilistic method can further make the selection process of geologic scenarios interesting and more practical. v Our methodologies for geologic scenario selection provided candidate scenarios that could be combined with each other to globally represent the parameter. However, when more than one geologic scenario was selected our methods did not provide an insight about how each geologic scenario contributes locally in the parameter space. Inference of nonstationary fields may be a good example of this situation, in which the parameter may be relevant to dissimilar geologic Chapter 7: Summary, Conclusions and Future Work 197 scenarios in different spatial locations. An extension to our methodologies that can locally relate the parameter domain to geologic scenarios would a point of interest. v In this work, we developed inversion frameworks to incorporate measured data for reducing uncertainty in geologic scenarios. However, several types of uncertainty, e.g., boundary conditions, are typically uncertain in this process. A valuable and interesting line of research is to introduce systematic methodologies to reduce various types of uncertainty, other than geologic scenarios, that may be encountered in these problems. v In chapters 5 and 6, we introduced two inversion frameworks that relied on the idea of “feasible set”. However, our formulations were based on deterministic least-squares solution. A probabilistic approach that extends the formulation to account for uncertainty quantification under the feasibility constraints would an interesting potential line of research. Chapter 7: Summary, Conclusions and Future Work 198 199 NOMENCLATURE 𝐝 𝐨𝐛𝐬 : vector of observations (measurements) 𝐠(.) : nonlinear forward model 𝐮 : parameter of interest 𝐯 : coefficients of parameterization 𝐯 (c) : 𝐯 at iteration 𝑘 𝐮 (c) : 𝐮 at iteration 𝑘 𝐂 𝛜 : covariance of noise vector 𝐆 : linear forward model 𝐆 𝐮 : Jacobian matrix with respect to 𝐮 𝐆 𝐯 : Jacobian matrix with respect to 𝐯 𝐊 : permeability 𝐊 \ : hydraulic conductivity 𝐔 : matrix containing prior realizations in its columns 𝐷 ÿ (.) : discrete level potential function 𝐽(.) : regularization term 𝑃 : pressure 𝑃 : pressure of non-wetting phase 𝑃 : pressure of wetting phase 𝑆 : saturation of non-wetting phase 𝑆 : saturation of wetting phase 𝑘 : number of basis functions in the parameterization space 𝑚 : dimension of measurements 𝑛 : dimension of parameter of interest 𝑝𝑟𝑜𝑏(.) : probability 𝑞 : rate of production or injection 𝑠(𝑥 ⃗) : slowness (inverse of velocity) 200 𝑡 : time 𝑣(𝑥 ⃗) : velocity 𝑣 - : coefficient of the 𝑖 [\ basis function (𝑖 [\ entry in 𝐯) 𝑢 - : 𝑖 [\ entry in 𝐮 𝑥 ⃗ : spatial coordinates 𝛁 : gradient operator 𝛜 : vector of observations noise 𝚽 : parameterization space 𝝓 - : 𝑖 [\ basis function 𝛀 𝐩 : feasible set 𝛀 𝐩 lmn-)mo : refined feasible set ϕ(.) : kernel operator 𝜆 % : regularization parameter ‖.‖ G : number of non-zero entries ‖.‖ #,% : mixed 𝑙 # /𝑙 % -norm 201 BIBLIOGRAPHY • Aanonsen, S.I., Nævdal, G., Oliver, D.S., Reynolds, A.C. and Vallès, B., 2009. The ensemble Kalman filter in reservoir engineering--a review. Spe Journal, 14(03), pp.393-412. • Aharon, M., Elad, M. and Bruckstein, A., 2006. $ rm k $-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing, 54(11), pp.4311-4322. • Ahmed, N., Natarajan, T. and Rao, K.R., 1974. Discrete cosine transform. IEEE transactions on Computers, 100(1), pp.90-93. • Akaike, H., 1974. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), pp.716-723. • Arpat, G.B., 2005. Sequential simulation with patterns. Stanford University. • Arpat, G.B. and Caers, J., 2007. Conditional simulation with patterns. Mathematical Geology, 39(2), pp.177-203. • Bach, F., Jenatton, R., Mairal, J. and Obozinski, G., 2012. Structured sparsity through convex optimization. Statistical Science, pp.450-468. • Baraniuk, R.G., 2007. Compressive sensing [lecture notes]. IEEE signal processing magazine, 24(4), pp.118-121. • Batenburg, K.J. and Sijbers, J., 2011. DART: a practical reconstruction algorithm for discrete tomography. IEEE Transactions on Image Processing, 20(9), pp.2542-2553. • Bhark, E.W., Jafarpour, B. and Datta-Gupta, A., 2011. A generalized grid connectivity–based parameterization for subsurface flow model calibration. Water Resources Research, 47(6). • Blumensath, T. and Davies, M.E., 2009. Iterative hard thresholding for compressed sensing. Applied and computational harmonic analysis, 27(3), pp.265-274. • Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1), pp.1- 122. • Bracewell, R.N. and Bracewell, R.N., 1986. The Fourier transform and its applications (Vol. 31999). New York: McGraw-Hill. • Bregman, N.D., Bailey, R.C. and Chapman, C.H., 1989. Crosshole seismic tomography. Geophysics, 54(2), pp.200-215. • Burgers, G., Jan van Leeuwen, P. and Evensen, G., 1998. Analysis scheme in the ensemble Kalman filter. Monthly weather review, 126(6), pp.1719-1724. • Caers, J., 2003. History matching under training-image-based geological model constraints. SPE journal, 8(03), pp.218-226. • Caers, J. and Zhang, T., 2004. Multiple-point geostatistics: a quantitative vehicle for integrating geologic analogs into multiple reservoir models. • Caers, J. and Hoffman, T., 2006. The probability perturbation method: a new look at Bayesian inverse modeling. Mathematical geology, 38(1), pp.81-100. • Candès, E.J., 2008. The restricted isometry property and its implications for compressed sensing. Comptes rendus mathematique, 346(9-10), pp.589-592. • Candès, E.J. and Wakin, M.B., 2008. An introduction to compressive sampling. IEEE signal processing magazine, 25(2), pp.21-30. • Cardiff, M. and Kitanidis, P.K., 2009. Bayesian inversion for facies detection: An extensible level set framework. Water Resources Research, 45(10). 202 • Carrera, J. and Neuman, S.P., 1986. Estimation of aquifer parameters under transient and steady state conditions: 1. Maximum likelihood method incorporating prior information. Water Resources Research, 22(2), pp.199-210. • Chapelle, O., Scholkopf, B. and Zien, A., 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3), pp.542-542. • Chen, S. and Doolen, G.D., 1998. Lattice Boltzmann method for fluid flows. Annual review of fluid mechanics, 30(1), pp.329-364. • Chen, S.S., Donoho, D.L. and Saunders, M.A., 2001. Atomic decomposition by basis pursuit. SIAM review, 43(1), pp.129-159. • Chen, Y. and Oliver, D.S., 2012. Multiscale parameterization with adaptive regularization for improved assimilation of nonlocal observation. Water resources research, 48(4). • Chorin, A.J., 1968. Numerical solution of the Navier-Stokes equations. Mathematics of computation, 22(104), pp.745-762. • Comunian, A., Renard, P. and Straubhaar, J., 2012. 3D multiple-point statistics simulation using 2D training images. Computers & Geosciences, 40, pp.49-65. • Constantin, P. and Foias, C., 1988. Navier-stokes equations. University of Chicago Press. • Cressie, N. and Hawkins, D.M., 1980. Robust estimation of the variogram: I. Journal of the International Association for Mathematical Geology, 12(2), pp.115-125. • Deutsch, C.V. and Journel, A.G., 1992. Geostatistical software library and user’s guide. New York, 119, p.147. • Deutsch, C.V. and Wang, L., 1996. Hierarchical object-based stochastic modeling of fluvial reservoirs. Mathematical Geology, 28(7), pp.857-880. • Donoho, D.L., 2006. Compressed sensing. IEEE Transactions on information theory, 52(4), pp.1289-1306. • Efendiev, Y., Durlofsky, L.J. and Lee, S.H., 2000. Modeling of subgrid effects in coarse-scale simulations of transport in heterogeneous porous media. Water Resources Research, 36(8), pp.2031-2041. • Elahi, S.H. and Jafarpour, B., 2017, April. A Distance Transform Method for History Matching of Discrete Geologic Facies Models. In SPE Western Regional Meeting. Society of Petroleum Engineers. • Eldar, Y.C., Kuppinger, P. and Bolcskei, H., 2010. Block-sparse signals: Uncertainty relations and efficient recovery. IEEE Transactions on Signal Processing, 58(6), pp.3042-3054. • Elsheikh, A.H., Wheeler, M.F. and Hoteit, I., 2013. Sparse calibration of subsurface flow models using nonlinear orthogonal matching pursuit and an iterative stochastic ensemble method. Advances in Water Resources, 56, pp.14-26. • Elsheikh, A.H., Hoteit, I. and Wheeler, M.F., 2014. Efficient Bayesian inference of subsurface flow models using nested sampling and sparse polynomial chaos surrogates. Computer Methods in Applied Mechanics and Engineering, 269, pp.515-537. • Engl, H.W., Hanke, M. and Neubauer, A., 1996. Regularization of inverse problems (Vol. 375). Springer Science & Business Media. • Evensen, G., 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans, 99(C5), pp.10143-10162. • Evensen, G., 2003. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean dynamics, 53(4), pp.343-367. • Fang, J., Shen, Y., Li, H. and Wang, P., 2015. Pattern-coupled sparse Bayesian learning for recovery of block- sparse signals. IEEE Transactions on Signal Processing, 63(2), pp.360-372. • Feyen, L. and Caers, J., 2006. Quantifying geological uncertainty for flow and transport modeling in multi-modal heterogeneous formations. Advances in Water Resources, 29(6), pp.912-929. • Foglia, L., Mehl, S.W., Hill, M.C. and Burlando, P., 2013. Evaluating model structure adequacy: The case of the Maggia Valley groundwater system, southern Switzerland. Water Resources Research, 49(1), pp.260-282. • Gavalas, G.R., Shah, P.C. and Seinfeld, J.H., 1976. Reservoir history matching by Bayesian estimation. Society of Petroleum Engineers Journal, 16(06), pp.337-350. 203 • Gholami, A., 2015. Nonlinear multichannel impedance inversion by total-variation regularization. Geophysics, 80(5), pp.R217-R224. • Golmohammadi, A., Khaninezhad, M.R.M. and Jafarpour, B., 2015. Group-sparsity regularization for ill-posed subsurface flow inverse problems. Water Resources Research, 51(10), pp.8607-8626. • Golmohammadi, A. and Jafarpour, B., 2016. Simultaneous geologic scenario identification and flow model calibration with group-sparsity formulations. Advances in water resources, 92, pp.208-227. • Golub, G.H., Heath, M. and Wahba, G., 1979. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2), pp.215-223. • Gómez-Hernández, J.J. and Journel, A.G., 1993. Joint sequential simulation of multigaussian fields. In Geostatistics Troia’92 (pp. 85-94). Springer, Dordrecht. • Gómez-Hernánez, J.J., Sahuquillo, A. and Capilla, J., 1997. Stochastic simulation of transmissivity fields conditional to both transmissivity and piezometric data—I. Theory. Journal of Hydrology, 203(1-4), pp.162-174. • Grimstad, A.A., Mannseth, T., Nævdal, G. and Urkedal, H., 2003. Adaptive multiscale permeability estimation. Computational Geosciences, 7(1), pp.1-25. • Guardiano, F.B. and Srivastava, R.M., 1993. Multivariate geostatistics: beyond bivariate moments. In Geostatistics Troia’92 (pp. 133-144). Springer, Dordrecht. • Hansen, P.C., 1992. Analysis of discrete ill-posed problems by means of the L-curve. SIAM review, 34(4), pp.561- 580. • Hemmecke, R., Köppe, M., Lee, J. and Weismantel, R., 2010. Nonlinear integer programming. In 50 Years of Integer Programming 1958-2008 (pp. 561-618). Springer, Berlin, Heidelberg. • Herman, G.T. and Kuba, A. eds., 2012. Discrete tomography: Foundations, algorithms, and applications. Springer Science & Business Media. • Hu, L.Y. and Chugunova, T., 2008. Multiple-point geostatistics for modeling subsurface heterogeneity: A comprehensive review. Water Resources Research, 44(11). • Hurvich, C.M. and Tsai, C.L., 1989. Regression and time series model selection in small samples. Biometrika, 76(2), pp.297-307. • Jacquard, P., 1965. Permeability distribution from field pressure data. Society of Petroleum Engineers Journal, 5(04), pp.281-294. • Jafarpour, B. and McLaughlin, D.B., 2008. History matching with an ensemble Kalman filter and discrete cosine parameterization. Computational Geosciences, 12(2), pp.227-244. • Jafarpour, B. and McLaughlin, D.B., 2009. Reservoir characterization with the discrete cosine transform. SPE Journal, 14(01), pp.182-201. • Jafarpour, B., 2011. Wavelet reconstruction of geologic facies from nonlinear dynamic flow measurements. IEEE Transactions on Geoscience and Remote Sensing, 49(5), pp.1520-1535. • Jafarpour, B. and Tarrahi, M., 2011. Assessing the performance of the ensemble Kalman filter for subsurface flow data integration under variogram uncertainty. Water Resources Research, 47(5). • Jafarpour, B. and Khodabakhshi, M., 2011. A probability conditioning method (PCM) for nonlinear flow data integration into multipoint statistical facies simulation. Mathematical Geosciences, 43(2), pp.133-164. • Jain, A.K. and Dubes, R.C., 1988. Algorithms for clustering data. • Jenatton, R., Obozinski, G. and Bach, F., 2010, March. Structured sparse principal component analysis. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 366-373). • Jolliffe, I.T., 1986. Principal component analysis and factor analysis. In Principal component analysis (pp. 115- 128). Springer, New York, NY. • Kandel, E.R., Schwartz, J.H. and Jessell, T.M. eds., 2000. Principles of neural science (Vol. 4, pp. 1227-1246). New York: McGraw-hill. • Kashyap, R.L., 1982. Optimal choice of AR and MA parts in autoregressive moving average models. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2), pp.99-104. 204 • Khaninezhad, M.M., Jafarpour, B. and Li, L., 2012. Sparse geologic dictionaries for subsurface flow model calibration: Part I. Inversion formulation. Advances in Water Resources, 39, pp.106-121. • Khaninezhad, M. and Jafarpour, B., 2013, February. Bayesian history matching and uncertainty quantification under sparse priors: a randomized maximum likelihood approach. In SPE Reservoir Simulation Symposium. Society of Petroleum Engineers. • Khaninezhad, M.M. and Jafarpour, B., 2014. Prior model identification during subsurface flow data integration with adaptive sparse representation techniques. Computational Geosciences, 18(1), pp.3-16. • Khaninezhad, M.R. and Jafarpour, B., 2017, February. A Discrete Imaging Formulation for History Matching Complex Geologic Facies. In SPE Reservoir Simulation Conference. Society of Petroleum Engineers. • Khodabakhshi, M. and Jafarpour, B., 2011, January. Multipoint statistical characterization of geologic facies from dynamic data and uncertain training images. In SPE Reservoir Characterisation and Simulation Conference and Exhibition. Society of Petroleum Engineers. • Khodabakhshi, M. and Jafarpour, B., 2013. A Bayesian mixture-modeling approach for flow-conditioned multiple-point statistical facies simulation from uncertain training images. Water Resources Research, 49(1), pp.328-342. • Kitanidis, P.K., 1995. Quasi-linear Geostatistical Theory for Inversing. Water Resources Research, 31(10), pp. 2411-2419. • Kitanidis, P.K., 1997. Introduction to geostatistics: applications in hydrogeology. Cambridge University Press. • Koltermann, C.E. and Gorelick, S.M., 1996. Heterogeneity in sedimentary deposits: A review of structure- imitating, process-imitating, and descriptive approaches. Water Resources Research, 32(9), pp.2617-2658. • Kotsiantis, S.B., Zaharakis, I. and Pintelas, P., 2007. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, pp.3-24. • Kotz, S., Kozubowski, T. and Podgorski, K., 2012. The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Springer Science & Business Media. • Lafferty, J., McCallum, A. and Pereira, F.C., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. • Larose, D.T., 2005. k-nearest neighbor algorithm. Discovering Knowledge in Data: An Introduction to Data Mining, pp.90-106. • Lee, J. and Kitanidis, P.K., 2013. Bayesian inversion with total variation prior for discrete geologic structure identification. Water Resources Research, 49(11), pp.7658-7669. • Li, L. and Jafarpour, B., 2010. Effective solution of nonlinear subsurface flow inverse problems in sparse bases. Inverse Problems, 26(10), p.105016. • Li, L. and Jafarpour, B., 2010. A sparse Bayesian framework for conditioning uncertain geologic models to nonlinear flow measurements. Advances in Water Resources, 33(9), pp.1024-1042. • Liu, Y., 2006. Using the Snesim program for multiple-point statistical simulation. Computers & Geosciences, 32(10), pp.1544-1563. • Liu, X. and Kitanidis, P.K., 2011. Large-scale inverse modeling with an application in hydraulic tomography. Water Resources Research, 47(2). • Liu, E. and Jafarpour, B., 2013. Learning sparse geologic dictionaries from low-rank representations of facies connectivity for flow model calibration. Water Resources Research, 49(10), pp.7088-7101. • Lochbühler, T., Vrugt, J.A., Sadegh, M. and Linde, N., 2015. Summary statistics from training images as prior information in probabilistic inversion. Geophysical Journal International, 201(1), pp.157-171. • Lopez, S., 2003. Modélisation de réservoirs chenalisés méandriformes: une approche génétique et stochastique(Doctoral dissertation, École Nationale Supérieure des Mines de Paris). • Lukić, T., 2011, May. Discrete tomography reconstruction based on the multi-well potential. In International Workshop on Combinatorial Image Analysis (pp. 335-345). Springer, Berlin, Heidelberg. 205 • Mallat, S.G., 1989. A theory for multiresolution signal decomposition: the wavelet representation. IEEE transactions on pattern analysis and machine intelligence, 11(7), pp.674-693. • Mariethoz, G. and Caers, J., 2014. Multiple-point geostatistics: stochastic modeling with training images. John Wiley & Sons. • Marvasti, F., Azghani, M., Imani, P., Pakrouh, P., Heydari, S.J., Golmohammadi, A., Kazerouni, A. and Khalili, M.M., 2012, April. Sparse signal processing using iterative method with adaptive thresholding (IMAT). In Telecommunications (ICT), 2012 19th International Conference on (pp. 1-6). IEEE. • McLaughlin, D. and Townley, L.R., 1996. A reassessment of the groundwater inverse problem. Water Resources Research, 32(5), pp.1131-1161. • Michael, H.A., Li, H., Boucher, A., Sun, T., Caers, J. and Gorelick, S.M., 2010. Combining geologic-process models and geostatistics for conditional simulation of 3-D subsurface heterogeneity. Water Resources Research, 46(5). • Mohimani, H., Babaie-Zadeh, M. and Jutten, C., 2009. A Fast Approach for Overcomplete Sparse Decomposition Based on Smoothed $\ell^{0} $ Norm. IEEE Transactions on Signal Processing, 57(1), pp.289-301. • Mueller, J.L. and Siltanen, S., 2012. Linear and nonlinear inverse problems with practical applications (Vol. 10). Siam. • Murray, C.D. and Dermott, S.F., 1999. Solar system dynamics. Cambridge university press. • Nasrabadi, N.M., 2007. Pattern recognition and machine learning. Journal of electronic imaging, 16(4), p.049901. • Needell, D. and Tropp, J.A., 2009. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and computational harmonic analysis, 26(3), pp.301-321. • Oliver, D.S., He, N. and Reynolds, A.C., 1996, September. Conditioning permeability fields to pressure data. In ECMOR V-5th European Conference on the Mathematics of Oil Recovery. • Oliver, D.S., Cunha, L.B. and Reynolds, A.C., 1997. Markov chain Monte Carlo methods for conditioning a permeability field to pressure data. Mathematical Geology, 29(1), pp.61-91. • Oliver, D.S., Reynolds, A.C. and Liu, N., 2008. Inverse theory for petroleum reservoir characterization and history matching. Cambridge University Press. • Oliver, D.S. and Chen, Y., 2011. Recent progress on reservoir history matching: a review. Computational Geosciences, 15(1), pp.185-221. • Park, H., Scheidt, C., Fenwick, D., Boucher, A. and Caers, J., 2013. History matching and uncertainty quantification of facies models with multiple geological interpretations. Computational Geosciences, 17(4), pp.609-621. • Patankar, S., 1980. Numerical heat transfer and fluid flow. CRC press. • Peterson, A.F., Ray, S.L., Mittra, R. and Institute of Electrical and Electronics Engineers, 1998. Computational methods for electromagnetics (Vol. 2). New York: IEEE press. • Pyrcz, M.J. and Deutsch, C.V., 2014. Geostatistical reservoir modeling. Oxford university press. • Resmerita, E., 2005. Regularization of ill-posed problems in Banach spaces: convergence rates. Inverse Problems, 21(4), p.1303. • Renard, P. and Allard, D., 2013. Connectivity metrics for subsurface flow and transport. Advances in Water Resources, 51, pp.168-196. • Ripley, B.D., 2007. Pattern recognition and neural networks. Cambridge university press. • Riva, M., Panzeri, M., Guadagnini, A. and Neuman, S.P., 2011. Role of model selection criteria in geostatistical inverse estimation of statistical data-and model-parameters. Water Resources Research, 47(7). • Rousset, M.A.H. and Durlofsky, L.J., 2014, September. Optimization-based framework for geological scenario determination using parameterized training images. In ECMOR XIV-14th European Conference on the Mathematics of Oil Recovery. 206 • Rousset, M.A.H., 2015. Geological Scenario Determination Using Parameterized Training Images in a Bayesian Framework (Doctoral dissertation, Stanford University). • Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4), pp.259-268. • Sarma, P., Durlofsky, L.J., Aziz, K. and Chen, W.H., 2007, January. A new approach to automatic history matching using kernel PCA. In SPE Reservoir Simulation Symposium. Society of Petroleum Engineers. • Sarma, P., Durlofsky, L.J. and Aziz, K., 2008. Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Mathematical Geosciences, 40(1), pp.3-32. • Schölkopf, B. and Smola, A.J., 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press. • Schüle, T., Schnörr, C., Weber, S. and Hornegger, J., 2005. Discrete tomography by convex–concave regularization and DC programming. Discrete Applied Mathematics, 151(1-3), pp.229-243. • Schwarz, G., 1978. Estimating the dimension of a model. The annals of statistics, 6(2), pp.461-464. • Shen, H. and Huang, J.Z., 2008. Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis, 99(6), pp.1015-1034. • Shirangi, M.G. and Durlofsky, L.J., 2015. Closed-loop field development under uncertainty by use of optimization with sample validation. SPE Journal, 20(05), pp.908-922. • Snieder, R., 1998. The role of nonlinearity in inverse problems. Inverse Problems, 14(3), p.387. • Sutton, R.S. and Barto, A.G., 1998. Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT press. • Stojnic, M., Parvaresh, F. and Hassibi, B., 2009. On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Transactions on Signal Processing, 57(8), pp.3075-3085. • Strebelle, S., 2002. Conditional simulation of complex geological structures using multiple-point statistics. Mathematical geology, 34(1), pp.1-21. • Suzuki, S. and Caers, J., 2008. A distance-based prior model parameterization for constraining solutions of spatial inverse problems. Mathematical geosciences, 40(4), pp.445-469. • Talukder, K.H. and Harada, K., 2010. Haar wavelet based approach for image compression and quality assessment of compressed image. arXiv preprint arXiv:1010.4084. • Tarantola, A. and Valette, B., 1982. Generalized nonlinear inverse problems solved using the least squares criterion. Reviews of Geophysics, 20(2), pp.219-232. • Tarantola, A., 2005. Inverse problem theory and methods for model parameter estimation (Vol. 89). siam. • Tikhonov, A.N. and Arsenin, V.Y., 1974. Methods of solving incorrect problems. • Tropp, J.A. and Gilbert, A.C., 2007. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on information theory, 53(12), pp.4655-4666. • Vo, H.X. and Durlofsky, L.J., 2014. A new differentiable parameterization based on principal component analysis for the low-dimensional representation of complex geological models. Mathematical Geosciences, 46(7), pp.775- 813. • Vo, H.X. and Durlofsky, L.J., 2015. Data assimilation and uncertainty assessment for complex geological models using a new PCA-based parameterization. Computational Geosciences, 19(4), pp.747-767. • Vo, H.X. and Durlofsky, L.J., 2016. Regularized kernel PCA for the efficient parameterization of complex geological models. Journal of Computational Physics, 322, pp.859-881. • Vogel, C.R., 2002. Computational methods for inverse problems (Vol. 23). Siam. • Vrugt, J.A., Stauffer, P.H., Wöhling, T., Robinson, B.A. and Vesselinov, V.V., 2008. Inverse Modeling of Subsurface Flow and Transport Properties: A Review with New Developments. Vadose Zone Journal, 7(2), pp.843-864. • Witten, I.H., Frank, E., Hall, M.A. and Pal, C.J., 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. 207 • Xie, J., 2012. Applications of level set and fast marching methods in reservoir characterization. Texas A&M University. • Ye, M., Meyer, P.D. and Neuman, S.P., 2008. On model selection criteria in multimodel analysis. Water Resources Research, 44(3). • Yeh, W.W.G., 1986. Review of parameter identification procedures in groundwater hydrology: The inverse problem. Water Resources Research, 22(2), pp.95-108. • Zhang, Y., Brady, M. and Smith, S., 2001. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE transactions on medical imaging, 20(1), pp.45- 57. • Zhang, T., Switzer, P. and Journel, A., 2006. Filter-based classification of training image patterns for spatial simulation. Mathematical Geology, 38(1), pp.63-80. • Zhang, Z. and Rao, B.D., 2013. Extension of SBL algorithms for the recovery of block sparse signals with intra- block correlation. IEEE Transactions on Signal Processing, 61(8), pp.2009-2015. • Zhou, H., Gómez-Hernández, J.J. and Li, L., 2012. A pattern-search-based inverse method. Water Resources Research, 48(3). • Zhou, H., Gómez-Hernández, J.J. and Li, L., 2014. Inverse methods in hydrogeology: Evolution and recent trends. Advances in Water Resources, 63, pp.22-37. • Zimmerman, D.A., Marsily, G.D., Gotway, C.A., Marietta, M.G., Axness, C.L., Beauheim, R.L., Bras, R.L., Carrera, J., Dagan, G., Davies, P.B. and Gallegos, D.P., 1998. A comparison of seven geostatistically based inverse approaches to estimate transmissivities for modeling advective transport by groundwater flow. Water Resources Research, 34(6), pp.1373-1413. 208 APPENDICES Appendix A. IRLS Algorithm for 𝒍 𝟏 -Regularized Problems We use the IRLS algorithm to solve the 𝑙 # -norm regularized least-square minimization problem, that is: min 𝐯 ‖𝐝 stu − 𝐠(𝚽𝐯)‖ % % + 𝜆 % ‖𝐯‖ # (A1) At iteration 𝑛 of the IRLS algorithm, the 𝑙 # -norm is approximated using a weighted 𝑙 % -norm as follows: min 𝐯 (ç) Ü𝐝 stu − 𝐠»𝚽𝐯 ()) ½Ü % % + 𝜆 % ∑ 𝑤 - ()) 𝑣 - ()) % - (A2) where 𝑤 - ()) = # ¡ æ (çD) E ¾ ç . , (𝑛) stands for the iteration 𝑛, and 𝜖 ) is sequence of small numbers (that converge to zero with increasing n). Using this approximation of the objective function, and a first order Taylor expansion for 𝐠»𝚽𝐯 ()) ½, the objective function in (A2) takes the form: min 𝐯 (ç) 𝐝 stu − 𝐠»𝚽𝐯 ()¶#) ½−𝐆 𝐯 ()) »𝐯 ()) −𝐯 ()¶#) ½ % % + 𝜆 % ∑ 𝑤 - ()) 𝑣 - ()) E - (A3) Here, 𝐆 𝐯 ()) is the Jacobian matrix of 𝐠(.) with respect to 𝐯 at 𝐯=𝐯 ()¶#) . The updated solution at iteration 𝑛 can be easily found by taking the derivative of the above convex function w.r.t. 𝐯 ()) and setting it to zero. 209 Appendix B. An Iterative Approach for 𝒍 𝟏 /𝒍 𝟐 -Regularized Problems The objective function for group-sparsity regularization can be expressed as: min 𝐯 ‖𝐝 stu − 𝐠(𝚽𝐯)‖ % % +𝜆 % ∑ ‖𝐯 - ‖ % ÿ -1# (B1) , where the notations are discussed (mainly) in Chapter 2 and 3. At iteration 𝑛, using the Gauss- Newton method and the first-order Taylor series for 𝐠(𝚽𝐯), the linearized version of the above function takes the form: min 𝐯 (ç) 𝐝 stu −(𝐠»𝚽𝐯 ()¶#) ½+𝐆 𝐯 ()) (𝐯 ()) −𝐯 ()¶#) )) % % +𝜆 % ∑ ∑ 𝑣 - K (ç) % Ó æ K1# D E ÿ -1# (B2) , where 𝐆 𝐯 ()) is the Jacobian matrix of 𝐠(𝐯), and 𝑣 - K is the 𝑗 [\ basis in the 𝑖 [\ group. Denoting Δ𝐝 ()) = 𝐝 stu − 𝐠»𝚽𝐯 ()¶#) ½+𝐆 𝐯 ()) 𝐯 ()¶#) , (B2) can be simplified to: min 𝐯 (ç) Δ𝐝 ()¶#) − 𝐆 𝐯 ()) 𝐯 ()) % % +𝜆 % ∑ ∑ 𝑣 - K (ç) % Ó æ K1# D E ÿ -1# (B3) The derivative of the regularization term, i.e., ∑ ∑ 𝑣 - K (ç) % Ó æ K1# D E ÿ -1# , with respect to 𝑣 - K (ç) can be approximated as: ¡ æ < (ç) ∑ ¡ æ Ë (ç) E æ ËèD D E ≈ ¡ æ < (ç) ∑ ¡ æ Ë (çD) E æ ËèD ¾ æ (ç) D E (B4) , where 𝜖 - ()) is a small positive number that is used to avoid zero denominator. Note that 𝑣 - c (ç) in the denominator is approximated as 𝑣 - c (çD) . Choosing 𝜖 such that 0<𝜖 - ()) <𝜖 - ()¶#) and lim )→ 𝜖 - ()) = 0, it can be shown that this approximation does not change the solution of the original minimization problem. The iterative solution for (B3) can now be derived as: 210 (𝛼𝚲 ()) +𝐆 𝐯 ()) ð 𝐆 𝐯 ()) )𝐯 ()) = 𝐆 𝐯 ()) ð Δ𝐝 ()¶#) (B5) , where 𝛼 =2𝜆 % , and 𝚲 ()) is a diagonal matrix with diagonal entries # ∑ ¡ æ Ë (çD) E æ ËèD ¾ æ (ç) D E . Appendix C. Sparse PCA Algorithm To solve the optimization problem in Equation (2.12), with 𝐰 w K = 𝜌 K 𝐰 K , the optimization problem can be expressed as: min 𝐳 < ,𝐰 w < 𝐽»𝐳 K ,𝐰 w K ½ = Ü𝐑 K − 𝐳 K 𝐰 w K ð Ü = % +𝛼 Ü𝐰 w K Ü # (C1) After solving (C1), 𝐰 K and 𝜌 K can be calculated as 𝐰 K = 𝐰 w < Ü𝐰 w < Ü E and 𝜌 K =Ü𝐰 w K Ü % . Using an iterative solution approach, the optimization problem in (C1) can be divided into two subproblems, expressed (at iteration 𝑛+1) as: min 𝐳 < (çD) 𝐽𝐳 K ()¾#) = 𝐑 K − 𝐳 K ()¾#) 𝐰 w K ()) ð = % subject to 𝐳 K ()¾#) % =1 min 𝐰 w < (çD) 𝐽𝐰 w K ()¾#) = 𝐑 K − 𝐳 K ()¾#) 𝐰 w K ()¾#) ð = % +𝛼 𝐰 w K ()¾#) # (C2) The final solutions to these two optimization subproblems can be found from: 𝐳 K ()¾#) = 𝐑 < 𝐰 w < (ç) 𝐑 < 𝐰 w < (ç) E and 𝐰 w K ()¾#) = »𝐈+𝛼𝚲 K ()¾#) ½ ¶# 𝐑 K ð 𝐳 K ()¾#) (C3) , where 𝚲 K ()¾#) is a diagonal matrix obtained from 𝑤 - ()) in the IRLS formulation in Equation (A3), and 𝐈 is the identity matrix of the same dimension as 𝚲 K ()¾#) . 211 Appendix D: Pseudo Code for Model Selection Using Group- Sparsity Formulation The overall procedure for generating prior realizations, performing inversion and model selection is presented below: 1. Use the proposed 𝑝 prior conceptual models and a geostatistical simulation technique to generate sample 𝐿 realizations for each prior and collect the realizations from each prior scenario in a separate matrix, i.e., 𝐔 # =[𝐮 ## 𝐮 #% …𝐮 #ñ ],…, 𝐔 ÿ =.𝐮 ÿ# 𝐮 ÿ% …𝐮 ÿñ /, 2. Apply SVD on 𝐔 # , 𝐔 % ,…, 𝐔 ÿ , to construct the TSVD bases 𝚽 # ,…, 𝚽 ÿ for different scenario, and form 𝚽= [𝚽 # ,…,𝚽 ÿ ]. 3. Use 𝐔 # ,…,𝐔 ÿ to train the weights 𝐊 # ,…, 𝐊 ÿ (see Equation (3.2)). 4. Perform group-sparse model calibration following the formulation in Equation (3.2). 5. Identify the active groups 𝐯 # ,…, 𝐯 Ó≪ÿ in representing the solution based on the raking of their 𝑙 % -norm (these are the dominant geologic scenarios). 6. After analysis of the selected groups, if necessary, propose revised/refined prior scenarios and repeat steps 1-5. 7. If a single geologic scenario is identified, standard model calibration and uncertainty quantification with the selected prior scenario can be performed. Appendix E: k-SVD Dictionary Learning To make the content of the main text accessible without reading the original paper, in this Appendix, we briefly review this algorithm. The 𝑘-SVD is used to construct learned sparse dictionaries (expansion function) from a training dataset. In this context, sparsity refers to the 212 support of a vector (number of its non-zero elements) and a sparse dictionary consists of k linear expansion functions (or elements) such that only 𝑆 ≪𝑘 of them are needed to represent or approximate a given vectors of parameters. The 𝑘-SVD algorithm is similar to the 𝑘-Means clustering method and is designed to find a dictionary 𝚽∈ 𝐑 )×c containing 𝑘 elements that sparsely represent each of the training samples in a given dataset 𝐔= [𝐮 # ,…,𝐮 ñ ]. The 𝑘-SVD dictionary learning method seeks to construct a dictionary 𝚽 and the corresponding coefficients 𝐕=[𝐯 # ,…,𝐯 ñ ] by solving one of the following optimization problems: min [𝐯 D 𝐯 E …𝐯 ÷ ],𝚽 ||𝐯 - || G s.t. ∑ ||𝐮 - −𝚽 𝐯 - || % % ñ -1# ≤ 𝜖 , for 𝑖 ∈1:𝐿 (E1) min [𝐯 D 𝐯 E …𝐯 ÷ ],𝚽 ∑ ||𝐮 - −𝚽 𝐯 - || % % ñ -1# s.t. ||𝐯 - || 𝟎 ≤𝑆, for 𝑖∈ 1:𝐿 (E2) Here, ||𝐯 - || G denotes the number of non-zero entries in 𝐯 - , 𝑆 is the sparsity level, and 𝐕 c×ñ = [𝐯 # …𝐯 - …𝐯 ñ ] are the expansion coefficients corresponding to the training data 𝐔=[𝐮 # ,…,𝐮 ñ ]. Equations (E1) and (E2) are alternative formulations for dictionary learning. In Equation (E1), a maximum allowable representation error is used as a constraint while the sparsity level for each realization of the training data is minimized. In Equation (E2), the sparsity level is constrained while minimizing the approximation error in representing each realization. Table E1 summarizes the dictionary learning steps of the 𝑘-SVD algorithm. The application of the resulting 𝑘-SVD dictionary to model calibration is discussed in great detail in [Khaninezhad et al., 2012]. When the 𝑘-SVD dictionary is used to parameterize the spatial distribution or rock hydraulic properties, i.e., 𝐮= 𝚽𝐯, the resulting coefficients 𝐯 become sparse. To promote this sparsity in finding 𝐯, various forms of sparsity regularization functions can be used. A particularly effective approach for promoting sparsity in 𝐯 is minimizing its 𝑙 # -norm, i.e., ‖𝐯‖ # . 213 Table E1. 𝒌-SVD Algorithm Initialization: Initialize dictionary with 𝚽 (G) ∈ 𝐑 )×c . Set 𝑗 =1. REPEAT until stopping criteria is met Sparse Coding Step: - Using a pursuit algorithm (e.g. OMP) compute 𝐕 c×ñ (K) =[𝐯 # …𝐯 - …𝐯 ñ ] as the solution of 𝐕 (K) =argmin 𝐯 æ ||𝐮 - −𝚽 (K¶#) 𝐯 - || % % 𝑠.𝑡. ||𝐯 - || G ≤𝑆 𝑓𝑜𝑟 𝑖 =1,…,𝐿 Dictionary Update Step: - For each column 𝑐 =1,2,…,𝑘 in 𝚽 (K¶#) - - Define the group of prior model instances that use the element 𝜔 ` ={𝑖|1≤𝑖 ≤𝐿,𝐕 (K) (𝑐,𝑖)≠0} - - Compute the residual matrix 𝐄 ` =𝐔−∑ 𝝓 - 𝐯 ` ð -` , where 𝐯 ` ð is the 𝑐 [\ row of 𝐕 (K) - - Restrict 𝐄 ` by choosing columns corresponding to 𝜔 ` , i.e., find 𝐄 ` - - Apply rank-1 SVD decomposition 𝐄 ` =𝐀Δ𝐁 - - Update the dictionary element 𝝓 ` =𝐚 # , and the sparse representation 𝐯 ` by 𝐯 ` =Δ𝐛 # - END 214
Abstract (if available)
Abstract
Imaging inverse problems in subsurface environments are typically ill-posed and highly nonlinear. As a result, too many unknown parameters are desired to be estimated from limited response measurements. When the underlying images (i.e., parameters) form spatially complex connectivity patterns, classical covariance-based geostatistical techniques cannot describe the underlying connectivity structures. In addition to the complexities in representing connectivity patterns, uncertainty in geologic scenarios further complicates these problems. Since data limitations, modeling assumptions and subjective interpretations can lead to significant uncertainty in adopted geologic scenarios, flow and transport data may also be useful for constraining the uncertainty in proposed geologic scenarios. Constraining geologic scenarios with flow-related data opens an interesting and challenging research area, which goes beyond the traditional subsurface inverse modeling formulations where the geologic scenario is assumed given. ❧ In this work, we employ group-sparsity regularization as an effective formulation to constrain the uncertainty of prior geologic scenarios in subsurface imaging problems. We demonstrate that this regularization term can be adopted to eliminate inconsistent geologic scenarios that are not supported by the measurements used for inversion, e.g., flow and transport observations. ❧ We also introduce a novel supervised machine learning algorithm to enforce the feasibility constraints in subsurface imaging problems. To this end, we develop an inverse modeling framework, in which the solution is initially obtained on a parameterization domain. Then, a supervised machine learning algorithm is employed to learn the properties of the feasible set and enforce them onto the parameterized solution. Based on this formulation, the parameterized and feasible solutions are obtained iteratively, and they are gradually converged to the optimal solution. We demonstrate that adopting feasibility constraints can further enhance capability of groupsparsity regularization in selecting relevant geologic scenarios.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Sparse feature learning for dynamic subsurface imaging
PDF
Deep learning for subsurface characterization and forecasting
PDF
Subsurface model calibration for complex facies models
PDF
Deep learning architectures for characterization and forecasting of fluid flow in subsurface systems
PDF
Integration of multi-physics data into dynamic reservoir model with ensemble Kalman filter
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Deep learning for characterization and prediction of complex fluid flow systems
PDF
Efficient control optimization in subsurface flow systems with machine learning surrogate models
PDF
Stochastic oilfield optimization under uncertainty in future development plans
PDF
Optimization of CO2 storage efficiency under geomechanical risks using coupled flow-geomechanics-fracturing model
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
PDF
Structure learning for manifolds and multivariate time series
PDF
Signal processing for channel sounding: parameter estimation and calibration
PDF
Labeling cost reduction techniques for deep learning: methodologies and applications
PDF
Landscape analysis and algorithms for large scale non-convex optimization
PDF
Difference-of-convex learning: optimization with non-convex sparsity functions
PDF
Dynamics of CO₂ leakage through faults
PDF
Leveraging structure for learning robot control and reactive planning
PDF
Magnetic induction-based wireless body area network and its application toward human motion tracking
PDF
Physics-based data-driven inference
Asset Metadata
Creator
Golmohammadi, Azarang
(author)
Core Title
Feature learning for imaging and prior model selection
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
08/06/2018
Defense Date
03/02/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
feature learning,geologic scenario selection,learning multiple point statistics,machine learning,OAI-PMH Harvest,pattern recognition,subsurface parameter estimation
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Jafarpour, Behnam (
committee chair
), Ershaghi, Iraj (
committee member
), Moghaddam, Mahta (
committee member
)
Creator Email
agolmoha@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-54659
Unique identifier
UC11668916
Identifier
etd-Golmohamma-6656.pdf (filename),usctheses-c89-54659 (legacy record id)
Legacy Identifier
etd-Golmohamma-6656.pdf
Dmrecord
54659
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Golmohammadi, Azarang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
feature learning
geologic scenario selection
learning multiple point statistics
machine learning
pattern recognition
subsurface parameter estimation