Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient control optimization in subsurface flow systems with machine learning surrogate models
(USC Thesis Other)
Efficient control optimization in subsurface flow systems with machine learning surrogate models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Efficient Control Optimization in Subsurface Flow Systems with Machine Learning Surrogate
Models
by
Junjie Yu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Chemical Engineering)
August 2024
Copyright 2024 Junjie Yu
Dedication
Pursuing a Ph.D. is indeed a challenging endeavor, and I would like to express my sincere gratitude to
everyone who has provided immense support throughout this journey.
First and foremost, I want to extend my deepest gratitude to my academic and research advisor, Prof.
Behnam Jafarpour, for his exceptional and professional supervision of my work. Without his support, I
would not have been able to transition from having minimal knowledge of subsurface systems and machine
learning to developing significant expertise in the integration of these fields. I feel very privileged to have
been one of his students and eagerly look forward to maintaining a strong relationship with him in the
future.
I would also like to thank my defense and qualifying exam committee members—Prof. Iraj Ershaghi,
Prof. Birendra Jha, Prof. Felipe De Barros, and Prof. Kristian Jessen—for their constructive comments on
this dissertation. I am also grateful to Atefeh Jahandideh for her significant assistance, especially during
the early stages when I joined the lab and faced challenges in making progress.
Importantly, I also want to express my deepest gratitude and love to my long-time girlfriend and now
my wife, Kun Zhao. Her support was crucial in helping me face and go through all the difficulties of
my Ph.D. journey. As an international student, the disruptions and restrictions caused by the COVID-19
pandemic kept us apart for nearly a year and a half between China and the USA. Despite the difficulties of
this separation, I appreciate how it strengthened our relationship. My life has become much more colorful
and enjoyable since she came to the USA in 2021, even though we are still in different states.
ii
In addition, I want to thank all my family members for their unwavering support and for being my undeniable source of trust, regardless of my decisions. Their emotional support has been invaluable throughout this journey.
Finally, I want to thank all my colleagues in our lab and the staff in the Mork Family Department of
Chemical Engineering & Materials Science for their insightful support and contributions.
iii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Close-loop Reservoir Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Optimization for Improved Reservoir Operation . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Well Control Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Geological Uncertainty and Robust Optimization . . . . . . . . . . . . . . . . . . . 9
1.1.5 Surrogate Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Scope of Work and Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 2: Surrogate with Domain Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Introduction of Long Beach Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Surrogate with Sparse Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.2.1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.3.1 Synthetic Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.3.2 Surrogate Modeling of LBU . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.3.3 Nonlinear Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2.1 Support Region for Production Wells . . . . . . . . . . . . . . . . . . . . 52
2.3.2.2 Topology-Preserving Neural Network . . . . . . . . . . . . . . . . . . . . 57
2.3.2.3 Interpretation of Interwell Connectivity . . . . . . . . . . . . . . . . . . 59
2.3.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3.3.1 Prediction Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3.3.2 Effect of Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
iv
2.3.3.3 Interwell Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.3.3.4 Sensitivity Analysis for the Selected Window Size . . . . . . . . . . . . . 67
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Chapter 3: Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1.2 Adaptive Sampling with Response Surface Model . . . . . . . . . . . . . . . . . . . 76
3.1.3 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.1.4 Trust Region Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.1.5 Active learning framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2.1 Global/Passive/Offline Learning Strategy . . . . . . . . . . . . . . . . . . . . . . . . 82
3.2.2 Local/Active/Online Learning Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.1 Synthetic Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.2 Well Control Optimization in Waterflooding Operation . . . . . . . . . . . . . . . . 92
3.3.2.1 2D Example: SPE10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.2.2 3D Filed-Scale Example: Brugge Model . . . . . . . . . . . . . . . . . . . 95
3.3.3 Geological Carbon Sequestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.3.3.1 Introduction of Geological Carbon Sequestration . . . . . . . . . . . . . 101
3.3.3.2 GCS Reservoir Simulation Model . . . . . . . . . . . . . . . . . . . . . . 103
3.3.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter 4: Robust Optimization under Geological Uncertainty . . . . . . . . . . . . . . . . . . . . . 108
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2.1 Robust Production Optimiztaion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2.2 Gradient Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.2.3 Reduced Sampling for Efficient Objective Function and Gradient Approximation . 115
4.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3.1 Rosenbrock Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3.2 SPE10 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.3.3 Brugge Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Chapter 5: Conclusions and Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
v
List of Tables
3.1 Hyperparameter used for Branin function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Reservoir model parameters for the SPE10 model. . . . . . . . . . . . . . . . . . . . . . . . 94
3.3 Parameters in algorithms used for the SPE10 example . . . . . . . . . . . . . . . . . . . . . 94
3.4 Reservoir model parameters for Brugge model . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.5 Parameters in algorithms used for the SPE10 example . . . . . . . . . . . . . . . . . . . . . 100
4.1 Brief summary of achieved NPV and required simulation in the Brugge example . . . . . . 134
vi
List of Figures
1.1 Sketch of Closed Loop Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Geographic location of the Long Beach Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Simplified illustration of well configuration in the Ranger West part of the field (based on
wellhead locations). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Examples of production history for four injectors (Inj) showing the range of variability. . . 25
2.4 Statistical summary of injection rate among 281 injectors . . . . . . . . . . . . . . . . . . . 25
2.5 Statistical summary of BHP among 356 producers . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Statistical summary of well oil production total (WOPT) among 356 producers. FOPT:
Field Oil Production Total. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Example of the two-layer input and output neural network model with four injectors, five
producers, and two timesteps for the control trajectories. In this case, the dimension of
the input layer is Nx = (4 + 5) × 2 = 18, whereas the output layer represents the total
oil production with the dimension equal to the number of producers Ny = 5. . . . . . . . . 29
2.8 The two-stage workflow for building the neural network proxy model: the first stage
involves feature learning by promoting sparsity of the wights in a two-layer neural
network model, while the second stage uses the identified connectivity patterns in the first
stage to develop multi-layer neural networks models with nonlinear activation function
to predict oil production at each producer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.9 Performance of the ℓ1-norm minimization as a sparsity-promoting regularization for
feature selection: (a) the objective function during the training process; (b) the reference
and estimated parameters. The reference parameters have a dimension of 520 with only
20 elements being non-zero. The use of sparsity regularization leads to the elimination
of inactive connections and the selection of relevant connections in the reference
parameters. While the significant parameters are correctly identified their magnitudes are
underestimated. The underestimation effect is alleviated by the second stage of training
(re-estimating the significant coefficients through a least-squares minimization). . . . . . . 40
vii
2.10 Top: Training process. Bottom: the performance of the connectivity detection with
sparsity-promoting regularization: (a) the objective function during the training process;
(b) comparison of the reference and recovered connectivity maps. The reference map
shows a synthetic connectivity example with a total of 281 triangles and 356 circles. Each
circle is connected to 5-10 of its neighboring points. The estimated map on the right
shows the red lines in the background representing the reference connectivity map and
the recovered connectivity map in green, showing the exact recovery of the connectivity
map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.11 The connectivity patterns detected with regularization using the ℓ1-norm (left), ℓ2-norm
(middle), and covariance matrix (right). The recovery with the ℓ1-norm minimization
leads to perfect reconstruction of the reference map. . . . . . . . . . . . . . . . . . . . . . . 44
2.12 The results of two-stage workflow (linear proxy model): (a) prediction accuracy after
the first stage of training; (b) prediction accuracy after the second stage of training; (c)
recovered connectivity map after the first-stage of training; and (d) recovered connectivity
map after the second stage of training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.13 The results of two-stage workflow (linear proxy model): (a) prediction accuracy after
the first stage of training; (b) prediction accuracy after the second stage of training; (c)
recovered connectivity map after the first-stage of training; and (d) recovered connectivity
map after the second stage of training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.14 Comparison between results from different regularization techniques: (a),(d),(g) results
with no regularization; (b),(e),(h) results with ℓ2 regularization; and (c),(f),(i) results with
ℓ1 regularization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.15 Performance of the linear and nonlinear predictive models for different training data
sizes. FS_L: Linear proxy model. FS_NL: Nonlinear proxy model. Generally, the nonlinear
proxy model outperforms the linear model. As the amount of training data increases, the
performance of the linear and nonlinear models tends to converge. The spatial distribution
of R2
scores is also shown for 75, 200, and 500 training samples. The results indicate that
when data is limited, the nonlinear model tends to perform better. . . . . . . . . . . . . . . 49
2.16 Average RMSE of the linear (FS_L) and nonlinear (FS_NL) predictive models.The nonlinear
proxy model shows superior performance compared to the linear model, particularly
when fewer training data are available. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.17 Distribution of wells in a complex oil field showing an example of the region of influence
for a given (center) producer as well as the included and excluded support wells. . . . . . . 54
viii
2.18 Schematic of the random window selection and neural network structure. The figure
shows multiple choices of local models with a random window size that identifies the
region of influence for the center producer. To predict the performance of Producer P1,
three random windows with different sizes are generated. In the neural network model,
the windows are represented by a feature selection layer that extracts the response of the
wells within each window. Multiple local neural networks are constructed and trained
with the corresponding data for each case to represent the production performance of
the center producer. A final decision about the size of the region is made based on the
performance of each window size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.19 Multiple single-target regression models with random window sizes to represent the
uncertainty in the region of influence of each producer. . . . . . . . . . . . . . . . . . . . 57
2.20 General structure and linear simplification for one of the local networks. The general
neural network structure consists of multiple layers and nonlinear activation functions
to capture the complex input-output relationship. With a linear assumption, the model
complexity is reduced while its interpretability is increased because the linear operator
can be viewed as the inter-well communication strength or connectivity. . . . . . . . . . . 60
2.21 R2 score for each individual producer in the test data set. . . . . . . . . . . . . . . . . . . . 61
2.22 R2
score for each individual producer in the test data set. . . . . . . . . . . . . . . . . . . . 62
2.23 Histogram and statistical analysis for the selected window sizes, as well as the number of
adjacent wells and their density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.24 Histogram and statistical analysis for the selected window sizes, as well as the number of
adjacent wells and their density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.25 Average R2
score of the linear and nonlinear predictive models for different numbers
of training data (L = linear; NL = nonlinear). In general, the nonlinear proxy model
outperforms the linear model. However, the difference is very small (less than 0.01). . . . . 65
2.26 Average RMSE of the linear and nonlinear predictive models for different sizes of training
data (L = linear; NL = nonlinear). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.27 Recovered interwell connectivity pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.28 Geological evidence showing consistency between the recovered connectivity map with
the two existing faults in the geological structure. . . . . . . . . . . . . . . . . . . . . . . . 68
ix
2.29 Comparison of random window vs. fixed window sizes during training. For the fixed
window size method, the x-axis represents the value of window size that has been
manually defined. For the random window size method, the x-axis represents the
maximum window size from which different sizes are randomly sampled while the
minimum size is always fixed at a value of 10; for example, a value of 30 means window
size can change in the range of [10,30]. In the fixed window size method, the size of the
windows is kept constant during the training. The fixed window size method is very
sensitive to the choice of size value; on the other hand, the random window size method
is much more stable because of the selectability of window size—as long as a large enough
window size range is assigned, the prediction accuracy is robust. . . . . . . . . . . . . . . 69
2.30 Selected window sizes with different maximal window size value . . . . . . . . . . . . . . 71
2.31 Histogram and distribution of R2
score, selected window sizes, number of adjacent wells,
and well density for each window after changing the maximal window size from 20 to 50. 72
2.32 Compare different initializations of inter-well connectivity . . . . . . . . . . . . . . . . . . 73
2.33 Sensitivity analysis regarding number of training samples of two appraoches . . . . . . . . 74
3.1 Workflow for passive/offline training approach: Optimization is performed and completed
with the proxy model; the proxy model is retrained at the end of each optimization until
it reproduces similar results to the simulation model . . . . . . . . . . . . . . . . . . . . . . 83
3.2 Workflow of active/local learning strategy: The proxy model is updated at each iteration
of the optimization algorithm to ensure proximity of the proxy model to the simulation
models throughout the optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3 Contour plot for the Branin function with three global minima . . . . . . . . . . . . . . . . 88
3.4 Comparison of surrogate accuracy for active and passive learning . . . . . . . . . . . . . . 90
3.5 Results of applying active and passive learning to the Branin function . . . . . . . . . . . . 92
3.6 Well location and log-permeability map of one layer of the SPE10 model. . . . . . . . . . . 93
3.7 Evolution of the NPV with the number of simulations for different approaches (SPE10
example) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.8 Comparison of optimal control trajectories in each approach for SPE10 example . . . . . . 96
3.9 Comparison of the final saturation map after optimization with each approach . . . . . . . 97
3.10 Initial saturation and well configuration for Brugge model . . . . . . . . . . . . . . . . . . 99
3.11 Evolution of the NPV with the number of simulations for different approaches (Brugge
example) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
x
3.12 Comparison of optimal controls and achieved NPVs for Brugge example. Unit of NPV: 10 M 100
3.13 Carbon Storage Mechnisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.14 Considered reservoir model in this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.15 Compare optimization performance for GCS example . . . . . . . . . . . . . . . . . . . . . 104
3.16 Optimized injection strategies of the three different approaches . . . . . . . . . . . . . . . 105
3.17 Evolution of the CO2 plume based on optimized injection strategies . . . . . . . . . . . . . 106
4.1 Increased complexity of robust optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2 The evolution of the objective function during optimization with respect to the number of
iterations. (b) The evolution of the objective function during optimization with respect to
the number of function evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.3 Investigating the effect of different sample sizes ( Ns = 1, 5, 10). (a) The evolution of the
objective function during optimization with respect to the number of iterations. (b) The
evolution of the objective function during optimization with respect to the number of
function evaluations. Each experiment is repeated 10 times. . . . . . . . . . . . . . . . . . 121
4.4 Four samples of (a) log-permeability and (b) porosity maps for the SPE10 model. . . . . . . 123
4.5 Evolution of the objective function for reference case (N = 100) and reduced sampling
approach with three different selected sample sizes ( Ns = 1, 5, 10), from top to down. Two
different initial points are explored. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.6 Comparison of true objective function evolution during reduced sampling optimization
for three different numbers of sample sizes and two initializations (top and bottom). Black
dashed lines show the evolution of exact optimization, where all realizations are used
during optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.7 Statistical measure of the optimization procedure (SPE10 model, based on Initialization
2); each numerical experiment is repeated 10 times (with different random seeds that lead
to different paths when visiting the realizations). Top: At each iteration, we visualize the
true objective value within one standard deviation among 10 data points. Bottom: At each
iteration, we display the standard deviation (std) of the true objective value. . . . . . . . . 126
4.8 Number of reservoir simulaion runs during reduced sampling optimization for three
different sample sizes with two initializations (top and bottom). Black dashed lines
show the number of reservoir simulations for the exact optimization where the entire
realizations are used during optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.9 Optimal control trajectories for two injectors (injection rate) and four producers (total
liquid production rate) obtained from (a) reference case (robust optimization); (b) reduced
sampling approach with Ns = 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
xi
4.10 Log horizontal permeability distribution for 16 realizations . . . . . . . . . . . . . . . . . . 130
4.11 The evolution of the objective function with respect to the number of iterations for the
Brugge example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.12 The evolution of the objective function with respect to the number of reservoir simulations
for the Brugge example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
xii
Abstract
Well control optimization plays a crucial role in closed-loop reservoir management, aiding in decisionmaking to enhance both the profitability and safety of reservoir operations. Achieving effective optimization typically requires the use of computationally expensive reservoir simulators to provide reliable
predictions for specific control strategies. However, running a single forward simulation for a large-scale
field can take days, or even weeks, making the process highly resource-intensive.
A popular approach to mitigate the prohibitive computational demands is to develop surrogate models,
particularly those based on machine learning, which serve as fast predictive alternatives and can significantly accelerate the optimization process. However, several practical concerns must be addressed to ensure the reliability of surrogate-based optimization. These include ensuring that the model is interpretable
and consistent with domain knowledge, maintaining the model’s reliability throughout the optimization
path, and efficiently managing geological uncertainties.
This dissertation introduces innovative and tailored optimization algorithms and workflows specifically designed for subsurface systems, with a particular emphasis on addressing the aforementioned challenges. In brief, a collaborative project with California Resource Corporation illustrates the development of
a surrogate model that incorporates fundamental engineering insights. We also present an active learning
framework aimed at significantly reducing computational load while improving the control optimization
performance in applications such as waterflooding and carbon sequestration. Additionally, we design a
novel reduced sampling method to effectively account for geological uncertainties.
xiii
Chapter 1
Introduction
1.1 Literature Review
1.1.1 Close-loop Reservoir Management
Closed-loop reservoir management (CLRM) [1, 84, 124] is an advanced strategy in the field of petroleum
engineering aimed at optimizing the production of hydrocarbons from subsurface reservoirs. This approach integrates real-time data acquisition, continuous model updating, and automated decision-making
processes to maximize recovery and minimize environmental impact.
As shown in Figure. 1.1 [84], there are multiple components inside a CLRM flow.
1. Data Acquisition: Utilizing a variety of sensors and monitoring technologies, such as downhole
sensors, seismic imaging, and surface data collection, to gather real-time data about the reservoir.
2. Model Updating: Applying data assimilation techniques, such as the Ensemble Kalman Filter (EnKF)
[48], to integrate new data into reservoir simulation models, thereby reducing uncertainty and improving model accuracy.
3. Decision Support Systems: Leveraging advanced optimization algorithms to analyze simulation outputs and recommend optimal management strategies.
1
Figure 1.1: Sketch of Closed Loop Management fig:intro-closed-loop
4. Implementation: Adjusting operational parameters such as well pressures, flow rates, and injection
volumes based on the updated strategies.
Numerical optimization [43] constitutes a fundamental component of the Closed-Loop Reservoir Management (CLRM) workflow, serving as a critical tool for enhancing profitability and operational efficiency
in the development and management of hydrocarbon reservoirs. This advanced methodology supports
two primary functions within the CLRM framework: optimizing operational decisions to maximize economic returns and oil recovery, and calibrating reservoir models through history matching to improve
their predictive accuracy.
2
In reservoir management, optimization of operational decisions aims to systematically refine a range
of decision variables that influence the reservoir’s performance. These variables include, but are not limited to, injection rates, reservoir pressure, the spatial placement of wells, and the configuration of well
networks. The objective here is twofold: to maximize the recovery of oil and gas and to optimize economic returns from reservoir operations. This is achieved through the application of various optimization
algorithms that can handle the complex, often non-linear relationships between multiple operational parameters and the reservoir’s response.
The process involves setting up an optimization problem where the goal is to find the set of operational
parameters that lead to the highest net present value (NPV) or similar economic metrics. Techniques such
as gradient-based methods, genetic algorithms [89], or particle swarm optimization [90] are employed to
explore the feasible operational space and identify optimal solutions. Each option is evaluated based on a
simulation model that predicts how the reservoir will respond to changes in operational strategies. This
aspect of numerical optimization in CLRM is vital for making informed decisions that balance immediate
production targets with the long-term sustainability of reservoir resources.
The second crucial role of numerical optimization within CLRM is model calibration which involves
updating reservoir simulation models based on actual production data, a process known as history matching [124]. This method adjusts geophysical and flow-related properties of the reservoir model—such as
permeability and porosity—to ensure that the model accurately reflects observed production behaviors.
History matching is a complex inverse problem [166] where the objective is to minimize the difference
between the simulated results and actual data from the field.
Optimization techniques in history matching systematically alter model parameters to improve the
fit between the model outputs and historical data. This adjustment process is essential for maintaining
3
the accuracy and reliability of reservoir models over time, as it accounts for new data and insights obtained during the production phase. Enhanced model fidelity leads to better predictions of future reservoir
behavior, which in turn supports more effective decision-making in reservoir management.
1.1.2 Optimization for Improved Reservoir Operation
Well control optimization and well development optimization represent two principal categories for systematically enhancing performance in reservoir operations [80]. Well control optimization [189, 19, 4]
involves the strategic management of time-variant operational parameters such as flow rate or bottomhole pressure (BHP). These parameters are crucial for influencing the reservoir’s flow behavior, which
can significantly enhance processes such as sweep efficiency. Furthermore, economic metrics like the Net
Present Value (NPV) are crucial for quantifying the long-term profitability of a particular control strategy. Optimizing these parameters ensures that the reservoir management strategies not only maximize
hydrocarbon recovery but also align with economic objectives, thereby improving the overall efficiency
and sustainability of reservoir operations.
Well development optimization [125, 10, 128, 78], focuses on strategic decisions regarding well placement, drilling, and completion designs. These decisions often involve discrete variables, such as the coordinates of well locations, the scheduling of drilling activities (e.g., the year a new well will be drilled), and
the grid indices describing the well trajectories. Such optimization is critical as it determines the physical
configuration of wells in the reservoir, which can significantly affect the accessibility and recoverability of
hydrocarbon resources.
Significant scholarly efforts [15, 53, 77, 79, 103] have also explored the formulation of a joint optimization problem that integrates well placement and control strategies. This integration is motivated by the
recognition that the optimal locations of wells are inherently dependent on how the wells are operated.
4
By simultaneously considering well placement and control strategies, operators can achieve a more holistic optimization of reservoir performance, leading to enhanced recovery rates and improved economic
outcomes.
This dissertation primarily focuses on well control optimization and explores how machine learning
models can facilitate this aspect of reservoir management. For further details on well development optimization and the integration of well control and development strategies, readers are encouraged to consult
the extensive body of work available in this field. In the subsequent sections of this chapter, we delve into
the detailed formulation of well control optimization and explore a range of optimization methodologies
that are applied in this domain.
Two major approaches that have been used in well control optimization problems are the gradientbased approach and the derivative-free approach.
Gradient-based methods have been commonly used in optimizing well control, given their fast convergence rates when the gradient of the objective function with respect to the control variables is accessible.
These methods utilize gradient information to iteratively refine control settings, aiming to optimize operational efficiencies directly. Among the most effective techniques for obtaining gradients analytically
is the adjoint technique, which is rooted in optimal control theory [19, 83]. The adjoint method is particularly renowned for its computational efficiency—it typically requires computational efforts only about
double that of a single forward simulation run. This efficiency is vital in scenarios demanding real-time
decision-making. Commercial reservoir simulators such as CMG [33] and Eclipse [42], as well as opensource tools like Matlab’s MRST [105], often provide interfaces to access these adjoint gradients, easing
the implementation process.
However, when adjoint gradients are inaccessible or require significant effort to obtain, alternative
methods such as finite difference techniques [179, 108, 195], become alternative options. These methods
approximate the gradient by closely following its definition but can become computationally expensive
5
when the dimensionality of decision variables is high. To manage this, some approaches compromise on
the exact accuracy of the gradient to obtain an approximate version more efficiently. Techniques like Simultaneous Perturbation Stochastic Approximation (SPSA) [157, 10, 128] or other ensemble-based methods
[161] have been explored in literature for these cases, providing viable alternatives that balance computational demand with optimization precision.
One significant challenge with gradient-based approaches in reservoir optimization is the complexity
induced by reservoir heterogeneity, which often manifests as notably challenging optimization landscapes
characterized by their roughness and the prevalence of multiple local optima. To address these issues,
global optimization strategies [74] have been deployed, offering a way to mitigate the risk of converging
to local optima.
Genetic Algorithms (GAs) [72] are a prominent example of such global optimization techniques, widely
recognized for their effectiveness in determining the optimal locations for both vertical and unconventional
wells [45, 142, 10]. These algorithms utilize mechanisms inspired by natural evolution, such as selection,
crossover, and mutation, to explore the solution space extensively.
Another sophisticated global optimization method is the Covariance Matrix Adaptation Evolution
Strategy (CMA-ES) [65]. This strategy is particularly noted for its ability to adaptively tune its strategy parameters, which has shown to provide superior performance compared to GAs in various studies
focused on well placement optimization [18, 53]. By adjusting its own parameters during the optimization
process, CMA-ES enhances its efficiency and effectiveness in navigating complex optimization landscapes.
Particle Swarm Optimization (PSO) [90] also plays a significant role in this domain, drawing inspiration
from the social behavior patterns of animals such as flocks of birds or swarms of fish. This method has been
successfully applied to optimize the placement of both vertical and non-conventional wells [39], leveraging
the collective intelligence of the swarm to find optimal solutions.
6
Notably, these global optimization techniques are predominantly utilized in well placement optimization problems, primarily because well placement often involves discrete variables where gradients are not
naturally well-defined. While there is substantial literature supporting the use of these methods in well
placement, they have also been applied to well control optimization challenges [180, 178]. In such applications, although global optimization methods typically require more simulation runs to converge, they are
invaluable in scenarios where traditional gradient-based methods struggle to navigate the optimization
landscape effectively.
1.1.3 Well Control Optimization
As previously discussed, this thesis primarily focuses on the control optimization problem. Initially, we
introduce the general formulation of the optimization problem, which can be expressed as follows:
uˆ = argminu∈R J(u(t), m)
s.t. u
low ≤ u(t) ≤ u
up
gj (u(t), m) = 0 j = 1, 2, 3, . . . ne
ck(u(t), m) ≥ 0 k = 1, 2, 3, . . . ni
t = 1, 2, 3, ...NT
(1.1)
In the above formulation, J(·) represents a user-specified objective function. This function may correspond to various operational goals such as total oil production, net present value, or stored carbon dioxide. It serves as the quantifiable target that we aim to optimize. m represents properties that are used to
describe the reservoir model, such as permeability, porosity, and initial saturation. To ensure the practicality of control settings and mitigate operational risks, the control variable u(t) is constrained by lower
and upper bounds, u
low and u
up. For instance, Zheng et al. [197] constrain the maximal BHP to prevent
7
geomechanical failures. Additionally, more complex constraints can be applied to restrict the feasible solution space. The function gj (·) represents general equality constraints. A commonly used example is the
total injection over a specified time zone. For example, Shamshiri et al. [150] fix the total carbon injection
when performing control optimization to improve carbon trapping. The total injection amount can also be
predefined as a fraction of the total porous volume, as used in [84, 173]. The function ck(·) represents an
inequality constraint, which can significantly increase the computational burden, particularly when such a
constraint is nonlinear. This complexity is highlighted in [144], where the authors propose an approximate
feasible direction Nonlinear Programming Problem (NLP) algorithm. This algorithm utilizes the objective
function’s gradient and the combined gradient of the active constraints to optimize the solution efficiently.
In this dissertation, one of the major objectives we are considering for our well control optimization
problem is the Net Present Value (NPV), which has the following format when considering a waterflooding
scenario [152].
Ju = −NPV = −
X
K
k=1
PNP
j=1
roq
k
o,j − cw,pq
k
w,j
· ∆tk
−
PNI
i=1 cw,iq
k
w,i · ∆tk
(1 + b)
k
365
(1.2)
eq:NPV_equation
• q
k
o,j and q
k
w,j are the oil and water production rates from producer j at time k, respectively.
• qk,w,i is the water injection rate from injector i at time k.
• c denotes the cost associated with water injection (cw,i) and water production (cw,p).
• ro is the price of oil per unit volume.
• b is the annual interest rate.
• tk is the duration of the k-th time step.
8
• K is the total number of time steps, NP is the number of producers, and NI is the number of
injectors.
1.1.4 Geological Uncertainty and Robust Optimization
Closed-loop Reservoir Management (CLRM) systems critically depend on accurate and reliable reservoir
descriptions to achieve trustable predictions from simulations. However, the subsurface environment
poses a formidable challenge to reservoir management due to its inherent uncertainty. There are multiple sources that can lead to large geological uncertainty [110, 100, 160], as briefly summarized in the
following:
1. Limited Direct Observations: Measurements are obtained through indirect methods like drilling and
seismic surveys [168], which only provide data at specific locations, necessitating extensive reliance
on geostatistical tools for interpolation and extrapolation [31].
2. Large heterogeneity: Geological formations are inherently heterogeneous, with possible significant
change over short distances. However, conventional reservoir modeling typically employs relatively
coarse grid blocks that simplify this heterogeneity by treating it as a constant value.
3. Complex Geological Processes: The history of geological formations involves complex processes
such as sediment deposition, tectonic activity, and diagenetic changes that can vary spatially and
temporally. These processes impact the distribution and quality of reservoirs and are often not fully
understood.
4. Technological Limitations: The resolution of tools and techniques used to measure and model geological features (e.g., seismic imaging, well logging) can also introduce uncertainty. These technologies have limitations in accuracy and resolution that can lead to incomplete or ambiguous data
interpretation.
9
Recent advances in reservoir characterization techniques, such as integrating multiple types of data
(cores [167], borehole logs [32], seismic volumes [171, 131], and production performances [76, 123]), have
shown the potential to enhance the precision of reservoir predictions and reduce subsurface uncertainty.
The use of three-dimensional seismic data, coupled with advanced geological genesis analysis, allows for
a more detailed understanding of the depositional environments and subsequent diagenetic processes affecting reservoir quality.
Even though more advanced characterization techniques have been developed over the years, quantifying a subsurface system with high resolution still remains challenging. Probabilistic methods are widely
embraced to effectively manage the uncertainties inherent in reservoir modeling and to quantify and propagate uncertainty across various parameters. In practical application, this involves the utilization of sampling techniques and ensemble representations. Rather than relying on a single model, this approach
incorporates a multitude (typically several hundred) of realizations representing the range of uncertain
properties. These ensembles enable more robust predictions and optimization of production performance
by accounting for the diverse range of possible scenarios.
Many works have been done in the literature to discuss the effect and methodology of dealing with
geological uncertainty. Salas et al. [141] examines the complex problem of predicting sediment accumulation in reservoirs, which is influenced by various uncertain factors such as streamflow quantity and
sediment load. They utilized Monte Carlo simulation [119] and Latin hypercube sampling to quantify the
uncertainty of annual and accumulated reservoir sedimentation. A sensitivity analysis is also performed
to identify the most influential factors that influence sedimentation uncertainty. Refsgaard et al. [138]
discussed the challenges and methodologies for managing geological uncertainty in groundwater modeling, emphasizing the impact of geological structures and hydraulic parameters on modeling accuracy.
Various strategies such as multiple modeling, Monte Carlo analysis, and Bayesian Model Averaging are
10
explored to assess prediction uncertainties caused by geological uncertainty. The study highlights the importance of integrating different types of data, such as core samples, borehole logs, and seismic data, to
improve model precision and reduce subsurface uncertainties. Juang et al. [86] discussed the integration of
probabilistic tools to handle uncertainties in geotechnical analysis, showing that geological, ground, and
geotechnical models each have inherent uncertainties that can significantly affect project outcomes. By
utilizing probabilistic methods, engineers can better characterize, assess, and reduce these uncertainties,
improving decision-making and system reliability in geotechnical projects.
When considering an optimization problem under geological uncertainty, we refer to such a problem
as robust optimization. Robust optimization originated from the field of operations research [24]. It developed as a response to the need for optimization models that can handle uncertainty in parameters and
data, a common issue in many operational environments, making them more effective under varying and
unpredictable conditions typical in operations research scenarios.
Designing the objective function of a robust optimization problem is of paramount importance and typically hinges on the user’s risk tolerance. Commonly employed methodologies [56] include the min-max
formulation, min-max regret, and stochastic programming. The min-max formulation aims to minimize
the worst-case scenario of the objective function, ensuring performance remains resilient even under the
most adverse conditions. Meanwhile, the min-max regret approach is tailored to minimize the maximum
regret, quantified as the disparity between the outcome of the selected decision and the optimal decision
in hindsight. Lastly, stochastic programming entails optimizing the expected value of the objective function or other statistical metrics, where uncertain parameters are depicted as random variables governed
by known probability distributions. In this thesis, we mainly employ stochastic programming approach to
define our robust optimization objective.
11
In the context of applying robust optimization for well control problem, the optimization is typically
performed over many realizations of the model to maximize a statistical performance measure. For example, as shown below, instead of defining an objective function that relies on a single and fixed realization
m, multiple realizations mi
, i = 1, 2, 3, ...Ne are considered, and the objective of robust optimization is
defined as the expectation of the original objective function over mi
.
uˆ = argminu∈R
1
Ne
X
Ne
i
J(u(t), mi)
s.t. u
low ≤ u(t) ≤ u
up
gj (u(t), m) = 0 j = 1, 2, 3, . . . ne
ck(u(t), m) ≥ 0 k = 1, 2, 3, . . . ni
t = 1, 2, 3, ...NT
i = 1, 2, 3, ...Ne
(1.3)
A significant challenge in robust control optimization lies in the substantial increase in computational
load. This is primarily due to the fact that each evaluation of the objective function necessitates simulations
among multiple realizations. Furthermore, the iterative nature of optimization requires multiple calls to
the objective function, making it even more resource-intensive. To address this challenge, approximate
gradient approaches [29, 50, 172] have been employed to alleviate the computational burden associated
with robust control optimization. Additionally, researchers have extensively explored machine learningbased surrogates [62, 122] as rapid-response alternatives to simulators. These approaches will be further
discussed in detail in subsequent chapters, and we will introduce another simple but efficient strategy in
Chapter 4.
12
1.1.5 Surrogate Modeling
Reservoir simulation [129, 49] plays a critical role in forecasting the intricate spatiotemporal evolution of
reservoir conditions, encompassing parameters like pressure and saturation distribution, flow rates at well
locations, and geomechanical stress distribution. At its core, reservoir simulation involves solving a set of
coupled partial differential equations (PDEs) [47] that describe the complex multiphase flow and transport
phenomena occurring within the reservoir. These PDEs typically incorporate terms representing fluid flow,
phase behavior, heat transfer, and sometimes even chemical reactions. Given the nonlinear and coupled nature of these equations, analytical solutions are often infeasible, necessitating numerical methods for their
solution. The numerical solution process begins with the discretization of the reservoir domain into a grid
system. This involves subdividing the reservoir into a finite number of control volumes or grid cells, each
representing a small portion of the reservoir volume. The governing PDEs are then approximated within
these grid cells, resulting in a system of algebraic equations. This discretization transforms the original
nonlinear PDE system into a series of large-scale linear or nonlinear algebraic equations, depending on the
solution method employed. Common approaches include finite difference [162], finite volume [120], and
finite element methods[156], each with its own advantages and limitations. Once the discretized equations
are formulated, iterative solution techniques are employed to solve the resulting system. One widely used
method is the Newton-Raphson method [190], which iteratively updates the solution until convergence
criteria are met. Achieving convergence in reservoir simulation can be challenging due to the highly nonlinear nature of the governing equations, the presence of discontinuities such as fluid interfaces, and the
complex geometry and heterogeneity of reservoir formations. As a results, small time step size and high
resolution of grid system are preferred and sometimes inevitable [37], both largely increase the simulation
time, making reservoir simulation computationally very expensive, especially for large-scale and complex
reservoir models.
13
While a full-scale reservoir simulation offers a comprehensive analysis of reservoir status, it isn’t suitable for tasks requiring frequent simulations. For instance, optimizing well control involves systematically
seeking solutions with improved objective values at each iteration. In such cases, a detailed estimation of
pressure and saturation in every grid may not be necessary for quick decision-making.
To tackle this issue, a fit-for-purpose proxy or surrogate models [137] can be developed. These models
are customized to provide fast, approximate forecasts for critical reservoir outputs, like well production
responses to changes in controls. By focusing on key parameters and simplifying underlying physical
processes, these proxies offer a computationally efficient alternative to full-scale simulations.
In subsurface flow modeling, a large number of techniques can be used to construct proxy models.
These proxy models serve as simplified representations of the underlying physics, enabling faster computations while retaining essential aspects of the system’s behavior. Three primary categories of proxy
models are commonly employed: reduced physical models, reduced-order models, and data-driven models.
For the reduced-physics model, a certain level of simplification or approximation is typically required.
One commonly used approach is the streamline method [169, 14]. In this technique, the flow dynamics
within a reservoir are depicted by a network of streamlines, which delineate the trajectories followed by
fluid particles under the prevailing pressure and velocity conditions. Streamline simulation is often much
faster than conventional finite-difference methods, making it particularly effective for simulating large,
heterogeneous reservoirs. Another famous approach, the capacitance-resistance model [146, 71, 145],
treats the reservoir as a dynamic system, likening its behavior to an electrical circuit. It draws analogies
between fluid flow and electrical current, where injection rates into the reservoir correspond to currents,
and reservoir pressures or saturations act as voltage potentials. The reservoir’s ability to store fluid is
represented as a capacitance, while its resistance to flow is modeled as an electrical resistance. This model
captures how fluid injection rates affect reservoir pressures and production rates, providing insights into
reservoir behavior and facilitating the evaluation of recovery strategies.
14
Reduced-order models (ROMs) are powerful techniques designed to alleviate the computational burden associated with solving the original system of nonlinear fluid flow equations. This is achieved by
projecting these equations onto a lower-dimensional space where the computations can be performed
more efficiently. One of the widely used ROM techniques for nonlinear systems is the proper orthogonal decomposition (POD) [111, 73]. POD aims to identify coherent structures within dynamical systems
by decomposing the system’s states into orthogonal basis functions that capture the dominant patterns
in the data, then the state space of the simulation model to be effectively represented by a limited number of leading (most significant) basis functions. This can drastically reduce the number of unknowns
that need to be resolved, thereby enhancing computational efficiency. Numerous studies [68, 22, 57] have
demonstrated the practical applications of ROMs. Another variant of ROM is the trajectory piecewise
linearization (TPWL) method [181], this approach involves linearizing the governing equations and subsequently projecting the linearized form onto a low-dimensional space through orthogonal decomposition.
The TPWL approaches applied in reservoir simulation have been studied in multiple literature; see [67, 66,
21, 186] for more details.
Data-driven models, in contrast to traditional physics-based models, often bypass the detailed physical
processes and instead employ various mathematical and statistical techniques to establish the input-output
relationships observed in training data. These models can achieve fast predictions without explicitly modeling the underlying physics. For example, the response surface methodology (RSM) [38, 121] has been
studied as a simple but powerful method. It involves designing experiments to vary input parameters
systematically and then fitting a statistical model (usually polynomial regression) to the results. In reservoir simulation, the RSM can be used to model complex relationships between reservoir properties (such
as porosity, permeability, and fluid saturations) and production performance, providing a fast prediction
tool that can be used for control optimization [43], history matching [154], and uncertainty quantification
[101]. Recently, machine learning-based surrogates have garnered significant attention for their ability
15
to efficiently model complex systems[164]. Various approaches have been explored, including Artificial
Neural Networks (ANNs) [82], Support Vector Regression (SVR) [62], Random Forests [187], and Gaussian
Processes [9].
1.2 Scope of Work and Dissertation Outline
Since around 2012, when the ImageNet paper [99] was published and sparked the current wave of AI
revolution, the incorporation of machine learning and deep learning techniques has become a major trend
in both academia and various industries. These techniques have had a profound impact far beyond the
field of computer science, influencing numerous other disciplines and sectors, including geoscience and
subsurface systems.
One major application of machine learning techniques in the field of subsurface systems is surrogate
modeling. Surrogate models use a certain amount of training samples to develop a proxy that can serve as
a quick predictive alternative to traditional reservoir simulators. This approach can potentially accelerate
many workflows that involve computationally heavy tasks such as control optimization, history matching,
and uncertainty quantification.
However, several specific aspects need to be considered for the successful implementation of these
surrogates, which is summarized as below:
• Many powerful ML/DL approaches heavily rely on a large number of training samples. Unlike other
applications such as computer vision and natural language processing, obtaining training samples
for subsurface system applications usually requires running reservoir simulations, which is resourceintensive. This often limits the number of available training samples.
• The underlying physics and governing equations are implicitly embedded in the pattern of training
samples for subsurface system applications. Ideally, these should not be violated in the surrogate
16
model, but this is challenging because most ML/DL approaches operate as black-box systems. A
certain degree of interpretability in the surrogate model is desired to improve trustworthiness.
• Although the complexity of ML/DL models has significantly increased, it is not always true that more
complex models lead to better performance in subsurface systems. In some cases, simpler models
may actually perform better.
• The practical application of surrogate models often involves numerical optimization. Ensuring sufficient predictive accuracy of the surrogate throughout the optimization procedure is an important
consideration.
• Subsurface systems typically involve a significant amount of uncertainty, necessitating the consideration of multiple realizations to account for this variability. However, this can greatly increase the
required computational resources, potentially making the process prohibitive. Developing efficient
optimization approaches under geological uncertainty is critical for improving the practicality of
these models.
The key research objectives of this dissertation focus on developing efficient optimization algorithms
and workflows tailored for subsurface system applications, particularly addressing the aforementioned
considerations.
Chapter 2 presents a collaborative project with California Resource Corporation (CRC), which centers
on developing a surrogate model for rapid predictions in a large-scale, complex field located in Long Beach,
CA. This project highlights the importance of integrating even basic engineering insights to enhance the
reliability of the surrogate model. We explore how sparsity-promoting techniques and feature engineering
contribute to improving the model’s performance, particularly in estimating inter-well connectivity. Our
approach was applied in the field, yielding promising results that not only demonstrate sufficient prediction
accuracy but also align well with geological evidence.
17
Chapter 3 addresses practical challenges encountered during surrogate-based optimization. We propose an active learning framework that dynamically updates the surrogate model and the training sample
pool throughout the optimization process. This approach leads to improved optimization performance
and a significant reduction in computational load. We validate our framework through multiple applications, including field-scale waterflooding and Geological Carbon Sequestration (GCS), both consistently
demonstrating enhanced performance.
Chapter 4 tackles another critical and challenging aspect of subsurface system applications: geological
uncertainty. We introduce a reduced sampling approach inspired by stochastic gradient methods commonly used in machine learning. This chapter presents multiple case studies where the proposed approach
was applied, followed by a detailed discussion of its performance and effectiveness in addressing geological
uncertainty.
Chapter 5 provides a comprehensive summary of the key findings and conclusions of this dissertation.
Discussions on potential future research are also included, highlighting areas where further exploration
could be enhanced.
18
Chapter 2
Surrogate with Domain Insight
Reservoir simulation is essential for predicting and managing production performance, achieved by numerically solving coupled partial differential equations that adhere to established physical laws such as
mass balance and Darcy’s equation. These simulations are utilized in various stages of reservoir management, including production forecasting, history matching, uncertainty quantification, and production
optimization. Typically, executing these workflows requires running hundreds or even thousands of reservoir simulations, which are computationally intensive. The substantial computational load of reservoir
simulations is often the primary bottleneck for quick decision-making in reservoir management.
Although full-scale reservoir simulation offers a detailed and comprehensive approach for predictions,
many practical questions in daily field operations and management do not necessitate such extensive
simulations. Fit-for-purpose simulation models cater to the need for rapid decision-making by providing targeted answers to specific questions, thus reducing computational demands and streamlining the
decision-making process. In recent years, advances in machine learning and data science have garnered
significant interest across numerous scientific and engineering domains, including the petroleum industry
[153, 127, 117]. These data-driven techniques are often employed to develop predictive models, particularly in cases where traditional models are non-existent or too complex to construct. In many scientific
19
disciplines, well-established physical laws are embedded to govern the system behavior [23]. Key motivations for incorporating data science approaches include the incomplete understanding of physics and
physical properties, the technical complexity, cost, and expertise needed to develop physics-based models, and the substantial computational demands of large-scale simulations. Machine learning also offers
exciting potential for uncovering unknown phenomena within scientific fields [135, 140]. Data from physical systems inherently contain information about the underlying physics, and large datasets can expose
previously unidentified effects or relationships. These motivations are equally relevant for applying data
science methodologies to various aspects of reservoir engineering.
The use of data science techniques in developing non-physics-based surrogate models for reservoir
forecasting has shown promise due to their ability to extract and represent complex, nonlinear patterns
from data. A critical aspect of adapting data science and machine learning approaches to scientific and
engineering fields, including reservoir engineering, is the integration of domain knowledge and physical insights [11]. Algorithms that are purely data-driven and not customized for the specific application
may fail to realize the full potential of these methods, resulting in misleading or implausible predictions.
Another crucial factor in employing data-driven techniques for scientific applications is the availability,
quality, and content of data. Machine learning algorithms often depend heavily on data to predict system
behavior or outcomes, with the assumption that the data is adequate for capturing the necessary relationships and patterns. However, in some scientific and engineering applications, the cost and effort associated
with data acquisition often limit the available data. Conversely, incorporating domain insights and physical system constraints can significantly reduce the data requirements for machine learning algorithms,
enhancing their efficiency and effectiveness.
Another significant limitation of many machine learning models is their black-box nature and lack of
interpretability. Most machine learning techniques operate as black boxes, providing minimal insight into
the underlying behavior or properties of the system. This limitation can pose serious challenges when
20
deploying these methods in high-risk, real-world physical systems. The issue of interpretability remains
a significant hurdle in the adoption of machine learning models [26, 106, 58]. Incorporating physical constraints and domain knowledge can improve the formulation, design, and application of machine-learning
techniques in scientific domains, thereby enhancing their interpretability. Integrating prior knowledge
into the development of machine learning models can reduce the amount of data required and increase the
reliability of these methods. One of the pioneering works that incorporate physics (governing equation)
with neural networks is done by Raissi et al. [136], Inspired by their work, Physical Informed Neural Network (PINN) [36, 98] or Physically Informed Machine Learning (PIML) [88] have been extensively studied
and show promises in many field. This approach not only makes the models more transparent but also
ensures they align more closely with established scientific principles and domain-specific insights.
In this chapter, we studied a large-scale oil field in Long Beach, California. This field contains thousands
of active wells, presenting a significant challenge in building a surrogate model to assist with production
prediction. Drawing inspiration from the field’s local inter-well connectivity, we explored two approaches
to incorporate this domain expertise. The first approach employs sparse regularization as a soft constraint
to promote inter-well connectivity behavior. The second approach uses feature engineering to directly
impose the local connectivity characteristics of the field. In the following sections, we will detail how even
simple domain insights can substantially enhance the surrogate model’s predictability and interpretability,
while also reducing data requirements.
2.1 Introduction of Long Beach Unit
21
The Wilmington Oil Field (shown in Figure 2.1) [6, 69], discovered in 1936, is one of the most productive oil fields in California and is situated along the Newport-Inglewood Fault zone. Spanning a depth of
approximately 1,600 to 10,000 feet, the field primarily consists of Miocene/Pliocene formations. It originally had an oil gravity ranging from 13 to 35 API, indicative of varying oil quality within the field. Over its
extensive operational history, more than 7,000 wells have been drilled in the Wilmington Oil Field, making
it a significant contributor to California’s oil production.
The Wilmington Oil Field is divided into several units for better management, one of which is the
Long Beach Unit. This unit represents the eastern extension of the field and is a key area of focus due
to its prolific production capacity. Currently, the Long Beach Unit is operated as a mature waterflood
field, a common enhanced oil recovery method that involves injecting water into the reservoir to maintain
pressure and stimulate production.
In this case study, a proxy model is developed to predict the oil production performance in largescale mature fields with many wells. Specifically, the proxy model will provide a mapping from input
well controls to total oil production at each producer. The data required to train the model are obtained
from an existing numerical reservoir simulator for the field. The main objective is to replace the complex
reservoir simulator for such fields with fast proxy models that provide real-time predictions under new
production scenarios. The developed model can be deployed and implemented by field reservoir engineers
and operators to support day-to-day operation and management decisions. We focus on building a proxy
model that can achieve fast prediction for short-term (1-year) oil production in this field. We focus on the
Ranger West zone of the Long Beach Unit 2.1. The number of wells in this zone that we considered is
637, including 281 water injectors and 356 producers. The simplified spatial distribution of all the wells is
shown in Figure 2.2.
22
Figure 2.1: Geographic location of the Long Beach Unit fig:LBU_loc
The training data in this example come from an Eclipse E300 simulation model and consist of well
production quantities pertaining to injectors [rate control (STB/D)] and producers [BHP control (psi)].
The target outputs are the oil production at each of the 356 producers.
The following rules are applied in generating the training data:
1. All the generated training data honor the production history of each well, and the annual changes of
well controls are used to generate well control values for future timesteps. For example, Figure 2.3
shows the injection history of four of the injectors (in blue) and the perturbation range to generate
new training data (red).
23
Figure 2.2: Simplified illustration of well configuration in the Ranger West part of the field (based on
wellhead locations). fig:LBU_well_loc_simplified
2. For producers, an upper limit is established for the BHP values to ensure they remain below reservoir
pressure. Shut-in and reactivation periods are excluded, and it is assumed that no new wells will be
drilled during the 1-year simulation period.
3. The injection rates and production BHPs are used as the input (Figure 2.4 and Figure 2.5), whereas
the well oil production total (Figure 2.6) is the output. The injection rates exhibit different mean
and variance values in different wells, which is consistent with the historical data. In general, large
annual changes in the historical data (injection rates or production BHPs) lead to larger variance.
24
Figure 2.3: Examples of production history for four injectors (Inj) showing the range of variability. fig:LBU_injection_history
Figure 2.4: Statistical summary of injection rate among 281 injectors fig:training_injection_rate
25
Figure 2.5: Statistical summary of BHP among 356 producers fig:training_production_bhp
Figure 2.6: Statistical summary of well oil production total (WOPT) among 356 producers. FOPT: Field Oil
Production Total. fig:training_production_rate
26
2.2 Surrogate with Sparse Neural Network
2.2.1 Overview
In this chapter, we present a workflow for developing a neural network-based proxy model to predict
the production response of complex well configurations in mature reservoir models, incorporating reservoir engineering insights. Although the physical laws governing fluid flow in porous media are wellestablished, explicitly incorporating these laws into a neural network model is challenging. Instead, we
demonstrate that even general insights into subsurface flow systems can significantly enhance the training and prediction performance of neural network proxy models. Our approach consists of two stages: (1)
characterization of interwell connectivity using sparsity-promoting regularization and (2) development of
a multilayer neural network proxy model using the identified interwell connectivity information.
In the first stage of the workflow, a two-layer fully connected neural network is trained using data from
a simulation model. The trained weights of the resulting network represent the sensitivity of production
well responses to changes in injection controls and can be used to interpret the connectivity strength between injection and production wells. Training is performed by minimizing a conventional least-squares
data mismatch penalty function, augmented with a sparsity-promoting ℓ1-norm term applied to the network weights. The effectiveness of ℓ1-norm minimization for promoting sparsity has been established
through both theoretical and practical advances [20, 41, 44]. Applications of this technique in reservoir
engineering have been discussed in several history-matching papers [81, 102, 112]. In our problem, the
sparsity term, motivated by reservoir engineering insight, represents the fact that many available connections between wells are expected to be inactive (have zero weights) since any given production well is
supported by only a small number of nearby injection wells. The regularization term facilitates the training
process by constraining the well connection weights to be sparse. While inter-well connectivity represents
communication between injection and production wells, the connectivity inferred from our network model
27
describes the sensitivity of oil production to injection well controls. Therefore, in our examples, inter-well
connectivity is interpreted with reference to oil production.
Once the first stage of training is completed, the nonzero weights of the resulting fully connected
neural network define the network topology, which identifies crucial communication and connectivity
information within the field. In the second stage, we use the network topology identified in the first
stage to select the well control parameters that exhibit strong connectivity to each producer. The selected
supporting wells of each producer are then used to define the input feature space to predict the production
from the related producer. We then build a multilayer neural network model with nonlinear activation
functions.
1. Gaining insight into interwell connectivity and using this information to predict well production
responses.
2. Offline training computation (using simulated data from a reservoir simulation model, which constitutes the main computational overhead of the framework).
3. Fast well production predictions for field management, without the need for the technical skills and
computational demand required to run full reservoir simulations.
For a better explanation, we will first introduce a brief history of compressed sensing and sparsity
regularization. Next, we will use a synthetic example to demonstrate the effectiveness of the developed
workflow. Finally, we will apply the proposed approach to the Long Beach Unit Field, performing a detailed
analysis to showcase the developed model’s prediction performance and its capability to characterize interwell connectivity.
28
Figure 2.7: Example of the two-layer input and output neural network model with four injectors, five
producers, and two timesteps for the control trajectories. In this case, the dimension of the input layer is
Nx = (4 + 5) × 2 = 18, whereas the output layer represents the total oil production with the dimension
equal to the number of producers Ny = 5.
fig:input_output_sketch
2.2.2 Methodology
In this work, without any loss of generality, the input data correspond to flow rates at the injection wells
and bottomhole pressure (BHP) at production wells from different timesteps. The corresponding outputs
at each timestep include BHP at the injectors and oil and water flow rates at the producers. The main
prediction target in this work is the total oil production after 1 year of production at each producer.
Figure 2.7 illustrates a two-layer fully connected neural network representing the input and output
variables of the system. For instance, when four injectors and five producers are considered for two
timesteps, eight input nodes for the injectors and ten input nodes for the producers are used to represent
the injection/production controls in the input layer. The output layer will have five nodes corresponding
to the total production (over two timesteps) from each producer. The weights assigned to each connection
between an input node and an output node are the parameters of the network that must be tuned during
training (optimization).
29
2.2.2.1 Regularization
A central challenge in developing a neural network model that can generalize beyond the training data is
overfitting. Overfitting occurs when the trained model performs well during training but fails to generalize
to new, unseen data. A common approach to mitigate overfitting is regularization, where the unknown
parameters (weights) are constrained to conform to a prescribed prior model.
In regression (inverse) problems related to physical systems, regularization functions are typically inspired by prior knowledge about the properties of the solution and are used to improve solution feasibility.
For example, in a history-matching problem, where model parameters such as permeability are adjusted
to reproduce the past production response of a reservoir, the permeability values have physical meaning.
Consequently, regularization functions that constrain the estimated permeability map are inspired by its
expected attributes (often specified by geologists). For instance, when the permeability map is expected
to be smooth (exhibiting gradual changes in space), a spatial derivative function can be minimized along
with the data mismatch to promote solution smoothness.
However, many parametric machine learning models, including neural networks, are typically viewed
as black boxes. Therefore, the learnable parameters do not have any physical meaning or any expected
property to inform a corresponding regularization function. Instead, the primary function of regularization
in training neural networks is to alleviate overfitting issues, as the parameters of the network often tend to
be overly redundant. Some popular regularization methods in machine learning include data augmentation
[114], dropout [159], and early stopping [25], in addition to conventional regularization penalty functions
such as ridge regression [70], Tikhonov [59], and total variation regularization [126] that are used in inverse
problems.
In this work, the regularization approach is inspired by domain knowledge, and its primary function is
to eliminate implausible well-connectivity information. For this purpose, in the first stage of the workflow,
we formulate a two-layer fully connected neural network to extract the relevant input-output connections
30
or topology as shown in Figure 2.7. To this end, we augment the training data mismatch objective function
with a sparsity-promoting regularization function (ℓ1-norm of the solution) that minimizes the support
(nonzero elements) of the solution. To illustrate the efficacy of this regularization, we compare it with
the ℓ2-norm regularization, which minimizes the solution length. Below, we briefly review the ℓ1- and
ℓ1-norm regularization forms. It is important to note that the objective of this first stage is not to describe
the exact production dynamics but to use a linear approximation that enables the detection of important
input-output connections. The identified connections will then be used in the second stage to develop a
more complex proxy model to capture the production dynamics.
One of the simplest and most common forms of regularization is the ℓ2-norm minimization, which minimizes the length of the solution and is known as ridge regression or zero-order Tikhonov regularization.
The objective function of a general ℓ2-norm regularized formulation takes the form:
J˜(w; X, y) = J(w; X, y) + α∥w∥2 (2.1)
where X and y are the inputs and outputs in the training data, respectively; w represents the weights
of the network and J(w; X, y) stands for the data mismatch term of the objective function. The notation α
refers to the regularization parameter, which is used to control the importance of each term in the objective
function.
Changing the penalty function from ℓ2 to ℓ1 norm, that is,
J˜(w; X, y) = J(w; X, y) + α∥w∥1 (2.2)
this results in a significant change in the behavior of the solution. Minimizing the ℓ1-norm of the solution
promotes solution sparsity, by minimizing the support of the solution. The ℓ1-norm regularization is also
known as LASSO Regression (Least Absolute Shrinkage and Selection Operator) [170]. Theoretically, to
31
promote sparsity, one needs to minimize the ℓ0-norm (a loose reference to norm as ℓ0 is not e legitimate
norm) of the solution, that is,
J˜(w; X, y) = J(w; X, y) + α∥w∥0 (2.3)
where ∥w∥0 computes the number of non-zero elements in w. The use of ℓ0-norm leads to a very complex
and intractable NP-hard optimization problem. An extensive literature exists that provides theoretical
results as to why and under what conditions ℓ1-norm minimization (known to have sparsity-promoting
properties) provides identical solutions to those obtained by ℓ0-norm minimization [20, 41, 44]. The use of
ℓ1-norm instead of ℓ0-norm results in a convex approximation and transforms the original problem from
an intractable optimization to a straightforward linear programming problem that can be solved efficiently.
In the event that the data mismatch function is also a convex function (e.g., linear inverse problems), the
resulting function becomes convex and has a unique solution.
The motivation for promoting a sparse solution in this work stems from the reservoir engineering
insight that, in a complex field configuration with many wells, it is unlikely that injection wells located
far from a given producer will support its production or exhibit meaningful connectivity. Conversely, it
is far more likely for neighboring wells to communicate with a production well. Therefore, it is clear
that many of the injection wells distant from a given producer should not have connectivity with it. By
promoting a sparse connectivity pattern in the network while matching the training data, the objective
is to identify which subset of wells actively contributes to a particular producer. It is important to note
that if the network is not provided with information about the sparsity of the weights, it will have to
learn the sparse nature of the weights solely from the training data. This process is not trivial and would
require a substantial increase in the amount of training data. Promoting sparsity in the connectivity pattern
helps to overcome this challenge, making the network more efficient in learning the relevant inter-well
connections.
32
In waterflood operations of mature reservoirs, producers are likely to show strong connections with a
few surrounding wells and weak connections with distant wells. If the well configuration is represented
as a graph, where each vertex (V ) represents a well, and each edge (E) stands for one possible connection,
the number of connections will increase quadratically as the number of wells grows, that is E = O(V
2
).
From domain expertise, it is more likely that a given producer is supported by its neighboring wells. At the
same time, the existence of large-scale channels and other geologic features may violate this assumption.
However, in a large oilfield with hundreds of wells, even when large-scale geologic features are present
the spatial density of the wells limits the communication mainly to nearby wells. This implies that the
vector of weights assigned to the connections is sparse and only a small fraction of the connections have
non-zero weights.
In the next section, it is shown that promoting a sparse solution improves the performance of the network in inferring the inter-well connectivity patterns and prediction accuracy. The recovered connectivity
patterns from the network weights tend to identify the transmissibility trends and possible flow barriers in
the reservoir. The reconstructed inter-well connectivity patterns can be interpreted based on the distance
and fault barriers, and the method is extendable to other applications such as infill drilling.
2.2.2.2 Problem Formulation
Considering a data set D of N instances:
D = {(u
(1), y(1)),(u
(2), y(2)), ...(u
(N)
, y(N)
)} (2.4)
with
u
(k) = [u
(k)
inj , u
(k)
prod], u
(k)
inj ∈ R
(NIT)×1
, u
(k)
prod ∈ R
(NP T)×1
, y(k) ∈ R
NP ×1
33
where u
(k)
inj represents the vectorized injection rate in all injectors, and u
(k)
prod stands for the vectorized BHP
for all producers, and NI , NP are the total number of injectors and producers, respectively. The notation
T refers to the number of time steps during reservoir simulation and y
(k)
represents the vectorized total
oil production of NP producers.
Each instance (u
(k)
, y(k)
)represents a (feature,label) pair of training data samples generated by a reservoir simulator that captures the relationship between input well controls and the corresponding total oil
production of each well. The objective of the neural network is to establish a mapping f that approximates this relationship. To find the mapping function f, an optimization problem is solved to minimize a
least-square error term, defined as the difference between the output data (labels) y
(k)
, and neural network
model predictions f(u
(k)
), that is
min X
N
k=1
∥f(u
(k)
) − y
(k)
∥
2
2
(2.5)
In order to evaluate the accuracy of the generated model, the data set D is divided into three parts: D =
{Dtrain, Dval, Dtest}, where Dtrain is used to fit the model, Dval is the validation dataset used for unbiased
evaluation of the model while tuning the model hyperparameters (weights), and Dtest is the test dataset
to evaluate the performance of the trained model when applied to new unseen data. A successful model
should have a small error in predicting both seen and unseen data.
In the first-stage of training, using a linear relationship between u and y we can write the mapping
function f as a linear operator G, where G ∈ R(NIT +NP T)×NP , and the minimization problem can be
written as
min X
N
k=1
∥Gu(k) + b − y
(k)
∥
2
2
(2.6)
34
A linear assumption about the mapping may not be sufficient for capturing the input-output relationship
exactly, especially when such relations are known to be nonlinear (based on the physics of the problem). However, a linear assumption is still effective as a first-order approximation to detect important
features/connections, which is the objective of the first stage, and can be helpful for interpretation purposes. The entry Gij within the linear operator G could potentially explain the communication strength
between well pairs. Therefore, we can visualize inter-well connectivity by interpreting the entries of G
based on reservoir engineering insights.
To promote sparse solutions, the ℓ1-norm of the solution is added to the optimization objective function
as a regularization term, resulting in
min
G
X
N
k=1
∥Gu(k) + b − y
(k)
∥
2
2 + α∥G(:)∥1 (2.7)
where G(:) denotes a vectorized version of matrix G(:) and is expected to have very few non-zero (significant) entries. During training, the above optimization problem is minimized by tuning the weights of the
neural network (elements of G). The resulting solution G is informative and can be used to characterize
the important inter-well connections that control the total oil production. By setting a threshold value ε,
we can transfer G to an adjacency matrix A, as follows
Aij =
Aij = 1, |Gij | > ε
Aij = 0, |Gij | < ε
(2.8)
The adjacency matrix A is a sparse matrix with binary elements that can be viewed as a mask that defines
the topology of the neural network layer and, therefore, removes redundant connections between input
and output nodes. The adjacency matrix A can be used to describe the set of support wells (SW) for each
individual producer.
35
After identifying the significant connections in the first stage, the second stage of the workflow focuses on developing a proxy model based on these connections. The simplest form of this proxy model
involves retraining the same model after fixing the identified topology by removing inactive connections.
This retraining step is essential to mitigate the well-known shrinkage issue associated with ℓ1-norm regularization. A key difference between the ℓ0 and ℓ1 norms is that the ℓ0 norm does not depend on the
magnitude of its argument’s entries, whereas the ℓ1 norm does. This dependence can lead to underestimated coefficients when minimizing the ℓ1 norm. The extent of this underestimation is influenced by the
regularization weight, which must be carefully adjusted. To address this problem, the significant weights
are re-estimated by solving the following least-squares problem:
min
G
X
N
k=1
∥(M ◦ A)X(k) + b − y
(k)
∥
2
2
(2.9)
Here, (M◦A)represents the Hadamard product of M and A, A is a fixed (topology) mask. After estimating
the weights in M, the resulting predictive model is yˆ = (M ◦ A)x + b.
However, the linear assumption in describing the well dynamics in the second stage can be improved
by adopting a more general nonlinear neural network model. Neural network models with multiple hidden
layers and nonlinear activation functions can capture the non-linearity in injection-production dynamics.
The adjacency matrix A identifies the set of support wells for each producer, which can be used to develop a
nonlinear proxy model to describe production behavior for that well. In this approach, the proxy model for
predicting oil production for producer Pj uses a vector (uj ) that includes a subset of control trajectories
supporting that producer, instead of using the original control trajectories u. The resulting regression
problem for producer Pj is then expressed as:
Pˆ
j = f(uj ; θ) (2.10)
36
Figure 2.8: The two-stage workflow for building the neural network proxy model: the first stage involves
feature learning by promoting sparsity of the wights in a two-layer neural network model, while the second
stage uses the identified connectivity patterns in the first stage to develop multi-layer neural networks
models with nonlinear activation function to predict oil production at each producer. fig:Two_stage_training
where Pˆ
j is the predicted oil production for producer Pj , uj is the vector of control trajectories supporting
Pj , and θ represents the parameters of the nonlinear neural network model.
m
θj
in X
N
k=1
∥fj (u
(k)
j
); θj ) − y
(k)
j
∥
2
2
(2.11)
The subscript j represents the set of support wells for producer Pj , and fj represents the nonlinear ANN
proxy model for producer Pj , with learnable parameters θj . The resulting workflow is shown in Figure
2.8, where a separate ANN model is constructed for each producer. These ANNs are conveniently trained
in a parallel architecture.
37
Before presenting the results from our numerical experiments, we provide a brief account of our training process, including hyperparameter tuning for the neural network. The neural network model in this
work was developed using the Python package Keras [61], and the Adam algorithm [93] was used as the
optimization approach for training. The main hyperparameters include the ℓ1-norm regularization parameter, the activation function (with possible candidates being “relu,” “tanh,” and “sigmoid”), the number
of hidden layers, the number of neurons in each layer, the learning rate defined in Adam, the batch size,
and the number of epochs. To tune these parameters, the GridSearchCV package [130] was used with
the K-fold (K = 5) cross-validation approach [55] to measure the average performance of each candidate
model.
2.2.3 Numerical Experiments
2.2.3.1 Synthetic Example
In this section, we present an illustrative example to explain the selection capability of sparsity-promoting
ℓ1-norm regularization. This example is further extended to demonstrate its application in identifying
inter-well connectivity patterns.
Consider the linear relationship
y = mT x (2.12)
A sparse vector m of dimension Nx = 512 with sparsity level K = 20 indicating it has only 20 nonzero elements) is depicted in Figure 2.9 on the left. The task here is to reconstruct m using T = 120,
which is less than Nx = 512, observational data points. These data points make up the training set D =
{(x
(1), y(1)),(x
(2), y(2)), . . . ,(x
(T =120), y(T =120))}, where each xi
is generated from a normal distribution
and each yi
is calculated from y = mT x, using the true values of m. The problem can be modeled using a
straightforward two-layer neural network, where the input layer has 512 dimensions and the output layer
38
has 1 dimension, utilizing a linear activation function f(x) = x without any bias terms. In this model, m
serves as the weights for the connections between nodes and is estimated during the training process with
the objective function specified.
min
m
Xn
i=1
∥mT xi − yi∥
2
2 + ∥m∥1 (2.13)
One important observation from the solution is the shrinkage in the magnitude of estimated parameters compared to their correct values. While the correct indices of non-zero entries are identified, the
contributions of these non-zero entries are underestimated. This can be attributed, in part, to the choice of
regularization parameter and/or to the small T value (number of measurements). As stated in the previous
section, the main difference between ℓ0 and ℓ1 functions is that the latter depends on the magnitude of the
entries of the vector it operates on, whereas the former only depends on the support of the vector.
To address the above issue, a two-stage workflow is proposed. In the first stage, the previous formulation is applied to identify the sparse solution with possibly underestimated entries. In the second stage,
the identified sparsity is fixed by eliminating insignificant elements/weights. This step is equivalent to
retaining only important connections in the neural network, which define a final network topology. The
refined network is then trained without any regularization. A simple way to describe the two steps is to
consider the first step as selecting the parameters (network connections) and the second stage as estimating
or tuning the selected parameters.
The result of the two-stage workflow is shown in Figure 2.9 are identified and the contributions of each
element (connection weights) are accurately estimated.
39
Figure 2.9: Performance of the ℓ1-norm minimization as a sparsity-promoting regularization for feature
selection: (a) the objective function during the training process; (b) the reference and estimated parameters. The reference parameters have a dimension of 520 with only 20 elements being non-zero. The use
of sparsity regularization leads to the elimination of inactive connections and the selection of relevant
connections in the reference parameters. While the significant parameters are correctly identified their
magnitudes are underestimated. The underestimation effect is alleviated by the second stage of training
(re-estimating the significant coefficients through a least-squares minimization). fig:Synthetic_1D
40
Now we consider a 2D example, in which the dimension of y is increased from 1 to Ny. The feature
selection mechanism of ℓ1-norm minimization is exploited to infer the connectivity pattern in a many-tomany regression problem. The new model has the form:
y = MT x (2.14)
In this instance, the true M is derived by creating a distributed set of circles (symbolizing production
wells) and triangles (symbolizing injection wells). Random connections are then generated from each
circle (producer) to H neighboring triangles, where H is sampled from a uniform distribution between 5
and 10. The resulting map resembles an inter-well connectivity map and is depicted in Figure 2.11 (left).
This map can be conceptualized as a graph with a vertex set V , and an edge set E, where eij = 1 if an edge
exists between vi and vj , otherwise, eij = 0. Additionally, we define eii = 1 and assign the true values of
the weights as follows:
mij =
mij = wij , if eij = 1
mij = 0, if eij = 0
(2.15)
For this example, T = 140 training data points are generated, represented as
D = {(x
(1), y(1)),(x
(2), y(2)), . . . ,(x
(T =140), y(T =140))}
where each xi
is generated from a Normal distribution and each yi
is calculated using y = MT x. The
dimensions are Nx = 637 and Ny = 356.
To model this system using a neural network, a simple two-layer neural network is employed, with
input and output layer dimensions of 637 and 356, respectively. As with the previous setup, the activation
41
function is linear, f(x) = x, and no bias term is included. The optimization problem to tune the weights
of the neural network is formulated as follows:
min
M
Xn
i=1
∥Mxi − yi∥
2
2 + ∥M∥1 (2.16)
The training process and the recovered connectivity map are shown in. The connectivity pattern is recovered accurately (see Figure 2.10) even though the problem is heavily underdetermined (the number of
parameters is around 0.2 million while only 140 training data are used).
In the training process, the neural network assigns connectivity strength between input neurons and
output neurons. To underscore the effectiveness of ℓ1 regularization, the solution is compared with those
obtained from ℓ2 regularization and from the input-output cross-covariance information, Figure 2.10. The
solution from ℓ1 regularization reconstructs the connectivity in the true model by eliminating insignificant connections. On the other hand, while capturing some of the important connections, the results
from ℓ2 regularization cannot eliminate irrelevant connections and results in an inaccurate solution. The
solution obtained from the cross-covariance method is also non-sparse. The poor performance of the
cross-covariance approach can be attributed to the small number of samples (140), which is not sufficient
for calculating the exact cross-covariance matrix between inputs and outputs.
2.2.3.2 Surrogate Modeling of LBU
In this section, we demonstrate the performance of the proposed approach using 500 training data points
and a linear proxy model, utilizing the R2
score to measure the prediction performance. Since we have
multiple targets (producers), the average R2
score of all producers is used to gauge the overall prediction
accuracy. The colors in Figure 2.12 (a) and Figure 2.12 (b) represent different producers, totaling 356. The
R2
score after the first stage of training is 0.937, which increases to 0.959 after the second stage. The recovered connectivity patterns after the first and second stages of training are shown in Figure 2.12 (c) and
42
Figure 2.10: Top: Training process. Bottom: the performance of the connectivity detection with sparsitypromoting regularization: (a) the objective function during the training process; (b) comparison of the
reference and recovered connectivity maps. The reference map shows a synthetic connectivity example
with a total of 281 triangles and 356 circles. Each circle is connected to 5-10 of its neighboring points. The
estimated map on the right shows the red lines in the background representing the reference connectivity
map and the recovered connectivity map in green, showing the exact recovery of the connectivity map. fig:Synthetic_2D
43
Figure 2.11: The connectivity patterns detected with regularization using the ℓ1-norm (left), ℓ2-norm (middle), and covariance matrix (right). The recovery with the ℓ1-norm minimization leads to perfect reconstruction of the reference map. fig:Synthetic_2D_compare
Figure 2.12 (d), respectively. The connectivity information reveals that most producers are primarily supported by their neighboring injectors and exhibit decreased connectivity with increasing distance. During
the training process, no information regarding the well coordinates or their spatial configuration was provided. Nevertheless, after training, the proxy model reveals important inter-well connectivity information
extracted from the data.
Since the exact inter-well connectivity map is unknown, it is challenging to validate the results. However, geological evidence, including two prominent faults within the Ranger West region Figure 2.13, can
be used to assess the results. Figure 2.13 shows the connectivity map detected by the proxy model. The
two highlighted areas in the inter-well connectivity map correspond to the two faults identified in the
reservoir. The model successfully captures the lack of connectivity based on the input-output training
data, without any prior information about the faults or the spatial configuration of the reservoir.
To underscore the effectiveness of ℓ1-norm regularization, the results are compared against those obtained from applying no regularization and ℓ2 regularization. Figure 2.14 presents a summary of this
comparison. When no regularization is applied, the model exhibits strong overfitting, and the recovered
44
Figure 2.12: The results of two-stage workflow (linear proxy model): (a) prediction accuracy after the first
stage of training; (b) prediction accuracy after the second stage of training; (c) recovered connectivity map
after the first-stage of training; and (d) recovered connectivity map after the second stage of training. fig:LBU_R2_and_Interwell
45
Figure 2.13: The results of two-stage workflow (linear proxy model): (a) prediction accuracy after the first
stage of training; (b) prediction accuracy after the second stage of training; (c) recovered connectivity map
after the first-stage of training; and (d) recovered connectivity map after the second stage of training. fig:Sparse_Geo_Evidence
46
connectivity map is overly dense and non-physical. With ℓ2 regularization, overfitting is slightly improved,
but the recovered connectivity map still includes distant connections that do not appear to be realistic, leading to very low prediction accuracy (R2
score = 0.459). In contrast, the result from the proposed approach
using ℓ1 regularization shows the best performance among the three methods.
2.2.3.3 Nonlinear Effect
In this section, we compare the performance of linear and nonlinear proxy models across various training
data sizes. Figure 2.15 displays the R2
score results for different training data sizes, up to 500 samples.
Overall, the nonlinear proxy model outperforms the linear model, with the difference becoming more
pronounced when the training data is limited (fewer than 200 samples in this example). The average R2
score provides a summary measure but does not reveal the details of prediction accuracy for individual
producers. The accompanying maps in the figure illustrate the spatial distribution of individual R2
scores
for cases with 75, 200, and 500 training samples. In general, the majority of producers achieve an R2
score
higher than 0.90. Additionally, the maps highlight the differences between the linear and nonlinear proxy
models, showing the locations of producers with larger discrepancies. It is evident that as the number of
training samples increases, the advantage of the nonlinear model diminishes. However, even with larger
training data sizes, a few producers still exhibit improved R2
scores in the nonlinear case.
In addition to the R2
score, we also evaluated the prediction performance using the root mean square
error (RMSE) metric. Figure 2.16 shows the average RMSE comparison for all producers. A similar trend
to the R2
score curve is observed, with the nonlinear proxy model outperforming the linear proxy model.
Figure 2.16 plots the individual RMSE values for each producer.
Our results suggest that incorporating nonlinearity can enhance the prediction performance of the
proxy model, particularly when training data is limited. However, with sufficiently large datasets, the performance of the linear model appears to converge with that of the nonlinear model. This could be related to
47
Figure 2.14: Comparison between results from different regularization techniques: (a),(d),(g) results with
no regularization; (b),(e),(h) results with ℓ2 regularization; and (c),(f),(i) results with ℓ1 regularization. fig:LBU_compare
48
Figure 2.15: Performance of the linear and nonlinear predictive models for different training data sizes. FS_-
L: Linear proxy model. FS_NL: Nonlinear proxy model. Generally, the nonlinear proxy model outperforms
the linear model. As the amount of training data increases, the performance of the linear and nonlinear
models tends to converge. The spatial distribution of R2
scores is also shown for 75, 200, and 500 training
samples. The results indicate that when data is limited, the nonlinear model tends to perform better. fig:Sparse_R2_Linear_vs_nonlinear
the specific examples considered, such as a mature field with a large number of producers currently experiencing high watercut or the short-term nature of the prediction, where the pressure and saturation map
of the reservoir may not change significantly during the prediction period. Nonetheless, existing literature
suggests that linear input-output models can explain a significant portion of the injection-production dynamics in a reservoir model with a fixed well configuration, particularly in production control optimization
scenarios.
49
Figure 2.16: Average RMSE of the linear (FS_L) and nonlinear (FS_NL) predictive models.The nonlinear
proxy model shows superior performance compared to the linear model, particularly when fewer training
data are available. fig:Sparse_RMSE_linear_vs_nonlinear
2.3 Feature Engineering
2.3.1 Overview
In the last section, we explored applying sparse regularization as a soft constraint to encourage each producer is only supported by very limited wells. Another common approach to incorporating specific domain
knowledge is through feature engineering [196]. Feature engineering is a comprehensive process to prepare the input of ML models in a way achieving improved prediction performance. Feature selecting is
one major component in feature engineering where the features are first selected/filtered from raw input
instead of preserving all of them. In many practical situations, especially when building a fit-for-purpose
model, numerous features may exist, but only a small subset may be relevant [64]. Feature selection focuses
on constructing and distilling subsets of features that are useful for building a good predictor. For example,
50
to predict the production performance of a particular producer in a large-scale reservoir with hundreds
of wells, providing information on all other wells may not be necessary. This is because inter-well connectivity strength between distant wells is usually weak and negligible. Therefore, including a subset of
wells with strong communication with a given producer may suffice for predicting its production behavior.
Selecting this subset of wells for predicting production performance is a feature selection problem that can
significantly enhance the efficacy of the training process and predictive model development.
This process automatically eliminates implausible solutions that cannot be supported by engineering
knowledge about the system, tailoring the solution to the application of interest. The three main objectives
in feature selection are improving prediction performance, increasing computational efficiency, and improving the interpretability of the predictive model [63]. Feature selection can be done either automatically
or manually. Automatic feature selection relies on algorithms designed to extract relevant features from
data, a process known as feature learning. An example is learning the dominant geological patterns from
a set of prior model realizations as training data and using the learned features in inference and model
calibration problems [91]. When available data is limited and insufficient to support automatic feature
learning (which is also the situation in this study), domain knowledge and expertise can be used to perform feature selection manually. In this case, expertise and prior knowledge are used to identify the most
relevant features. In the following section, we will introduce the idea of a ’window’ to perform feature
selection in our LBU case; it is designed to only include wells within a certain distance for each producer;
details will be presented later.
Although incorporating domain knowledge is key to maximizing the utility of ML/DL models, it is
also important to acknowledge, quantify, and incorporate the uncertainty in the domain knowledge itself,
which can result from imperfect understanding and characterization of a system. For instance, knowing
that a particular producer in an oil field is more likely to have strong communication only with its neighboring wells can help eliminate many distant wells when predicting its production. Yet, it may not be clear
51
which nearby wells exhibit strong communication. Moreover, due to geological complexity, a distant well
may, at times, exhibit stronger communication with a producer than nearby wells. This phenomenon is
particularly evident in geothermal applications, where fluid flow can be fracture-dominated [13]. Therefore, while integrating domain knowledge can significantly enhance the performance of ML/DL models,
it is crucial to remain aware of the uncertain nature of such knowledge and the risk of introducing bias
or error if the uncertainty is not properly accounted for. To mitigate the bias introduced by the size of
our predefined window, we refined our approach to include multiple windows for each producer. This
treatment is inspired by and similar to ensemble learning, where multiple predictors are trained and combined to improve prediction accuracy. By using multiple windows, we capture a more comprehensive set
of support wells, enhancing the robustness of our predictions.
2.3.2 Methodology
2.3.2.1 Support Region for Production Wells
We now consider the prediction of total oil production from a particular producer yi
, while i denotes the
index of the target producer. From reservoir engineering insight, the production response of the producer is
more likely influenced by its neighboring wells than by wells that are far away. Hence, it is not necessary to
include all connections and their related controls u in predicting the performance of producer yi
. Therefore,
only a small subset of the global connections and their well controls u that fall within a certain distance
from the target producer may be sufficient for predicting yi
. This implies that with unlimited training data,
if the global feature set U = (u1, u2, . . .) is replaced with a small subset Us, a good approximation of the
predictive model can be achieved.
52
If the subset of features Us is known exactly, the amount of data needed for training is significantly
reduced. However, this optimal subset of connectivity is problem-specific and depends on field/well configuration, the geologic setting, and reservoir properties. Therefore, the prior knowledge about the optimal subset of connectivity features to include in predicting the production from a given well is uncertain.
Moreover, the number of possible subsets grows exponentially with increasing dimensionality of the input
features, making an exhaustive search for such an optimal subset a computationally prohibitive combinatorial problem. Nonetheless, an approximate subset can be sought to significantly reduce the number of
features, and hence the data needed and the computational demand of the method.
In this work, instead of discovering the optimal subset of features (i.e., the network topology or active
well connections), we infer it by combining domain knowledge and the available training data. A suitable
set of candidates for the optimal subset of connections can be obtained based on domain knowledge. In this
case, the guiding principle is that each producer is expected to have stronger communications with nearby
wells than distant wells. More specifically, in large fields with many wells, a large number of connections
to a particular producer can be excluded with a high level of confidence. Exceptions to this principle
include the presence of faults that can result in a lack of communication between nearby wells on two
sides of a fault or large-scale geological features (fluvial channels) that can result in strong communication
between distant wells. In the presence of such geological features, caution should be exercised when using
distance as the only metric to inform potential communication strength. With this assumption, one can
define a region of influence (or support) around each producer and only include connections between the
producers and the wells within that region. An immediate question in this approach is how to select the
distance/region of influence, as this parameter is generally unknown and different for each producer. We
address this issue by including uncertainty in defining the support region for each producer and using the
training data to estimate it for each producer.
53
Figure 2.17: Distribution of wells in a complex oil field showing an example of the region of influence for
a given (center) producer as well as the included and excluded support wells. fig:selection_window
As shown in Figure 2.17, we first transform each well location into a point in a 2D coordinate system
based on Surface Hole Locations (SHLs), then for a particular producer, we draw a square (window) that
is centered at that producer. All wells located inside the window are defined as adjacent wells and on the
contrary, the outside wells are defined as nonadjacent wells and would not be used for the production
prediction for that particular producer. Once we generate a window, it extracts part of the features from
the original feature space; only control trajectories that come from adjacent wells are kept, and others are
removed. We use an operator W(u) to represents the filtering operation of the window. For example, if
NIadj of injectors and NP adj of producers are inside the window, then W(u) ∈ RNIadjT +NP adjT
. Introducing such a window largely decreases the dimension of input space and, therefore, could reduce the data
needed to train a good predictive model.
54
By introducing a series of windows to account for the uncertainty in the support region around each
producer, we transform the original multitarget regression problem into multiple single-target regression
problems. In each of these problems, information from adjacent wells is used to predict the producer’s
production performance at the region’s center. We refer to each single-target regression problem as a task.
Since each task focuses on predicting the production of one producer, the total number of tasks is NP .
Each task solves the following minimization problem.
T ask(i) : min X
N
k=1
∥fi(Wi(u
(k)
)) − y
(k)
i
∥
2
2
(2.17)
Where Wi represents generated window for the ith task (or producer),and i = 1, 2, ...NP .
The size of the generated windows determines the quality of the predictive models. If the window
size is too large, the input space will contain a large amount of redundant data, potentially introducing
unnecessary noise to the learning algorithm. Conversely, a window size that is too small may exclude
important relevant wells, thereby decreasing prediction performance. Additionally, due to the complexity
of the geological structure and the distribution of wells, an optimal window size can vary for different
producers. Thus, fixing the window size for all producers is likely not a good choice. To mitigate the
risk of unsuitable window sizes and provide variability for different producers, we generate windows in a
stochastic manner. Multiple windows are generated for each producer, as shown in Figure 2.18. For each
task, instead of having a single window Wi
, we have Wij , where i stands for the producer index and j
stands for the window index for that producer. Consequently, for a single producer, an ensemble of models
(or learners) is constructed, expressed as follows:
T ask(i), Learner(j) : min X
N
k=1
∥fij (Wij (u
(k)
)) − y
(k)
i
∥
2
2
(2.18)
55
Figure 2.18: Schematic of the random window selection and neural network structure. The figure shows
multiple choices of local models with a random window size that identifies the region of influence for the
center producer. To predict the performance of Producer P1, three random windows with different sizes
are generated. In the neural network model, the windows are represented by a feature selection layer that
extracts the response of the wells within each window. Multiple local neural networks are constructed and
trained with the corresponding data for each case to represent the production performance of the center
producer. A final decision about the size of the region is made based on the performance of each window
size. fig:feature_selection_workflow
As shown in Figure 2.19, we start with a multitarget regression problem that directly learns a mapping
from u to y. Recognizing that the resulting regression problem involves too many unnecessary/redundant
connections, we introduce a window around each producer to constrain the feature space in predicting the
production from each producer [e.g., production from Producer 1 (y1) should only be affected by the inputs
u1 and u3 as shown in the middle panel of Figure 2.19]. Although introducing a window eliminates a large
number of unnecessary features, if not done properly, it can lead to the exclusion of important supporting
injectors.
To mitigate this risk, we generate multiple windows with randomly selected sizes, as shown in the right
part of Figure 2.19, where the model resulting from each window is called a learner. With several learners
56
used to predict the production from each well, the final prediction is obtained by selecting the learner with
the best prediction performance. The main goal of this strategy is to incorporate useful domain knowledge
while hedging against the uncertain nature of such knowledge.
Figure 2.19: Multiple single-target regression models with random window sizes to represent the uncertainty in the region of influence of each producer. fig:regression_sketch
Now, for each task, multiple learners are generated, each being a separate predictive model. Therefore,
we obtain several predictions for each producer. To measure the prediction accuracy of each model, we
first split the original dataset into a training set and a validation set. The R2
score on the validation set is
used to measure the prediction performance of each model.
2.3.2.2 Topology-Preserving Neural Network
In the previous section, we proposed decomposing the original multi-target regression into multiple singletarget regression problems. However, training these ’small’ regression models sequentially would be very
inefficient. To address this issue, we use a modified neural network with a customized topology to enable
57
parallel training. We add a feature selection layer after the input layer to represent different models with
random window sizes. Unlike the global fully connected layer, we predefine the way that neurons are
connected between the input layer and the second layer. As a result, a topological structure is provided
to this layer to incorporate all the local neural network connections into a single large neural network for
training purposes.
As shown in Figure 2.18, three wells (I1, I2, P1) have been captured by Window 1, and then we draw
three corresponding connections (three green lines in Figure 2.18) in the feature selection layer. After
implementing the same process for all the windows, the feature selection layer is finalized, and we can
construct multiple single-target regression problems. Note that each neuron in the second layer corresponds to a window. For example, the first three neurons belong to Window 1, the next four belong to
Window 2, and the next seven neurons belong to Window 3, etc. Each window has a central producer,
which is the prediction target for that window. In Figure 2.18, the three windows are centered at the location of P1. Hence, we use P1-1, P1-2, and P1-3 to distinguish them. Three subproblems are therefore
formulated as follows:
T ask(1), Learner(1) : min X
N
k=1
∥f11(W11(u
(k)
)) − y
(k)
1
∥
2
2
(2.19)
T ask(1), Learner(2) : min X
N
k=1
∥f12(W12(u
(k)
)) − y
(k)
1
∥
2
2
(2.20)
T ask(1), Learner(3) : min X
N
k=1
∥f13(W13(u
(k)
)) − y
(k)
1
∥
2
2
(2.21)
f11, f12, f13 represent three small neural networks that will be trained during training. Each neuron
in the third layer represents the output for a learner, learning a mapping function from its corresponding
neurons in the second layer to its target producer. We call such a learner one branch of the neural network
structure. During the training process, each branch is processed independently, allowing our designed
neural network structure to solve all the sub-problems simultaneously. Finally, each branch is evaluated
58
using the R2
score through K-fold cross-validation [17] after training. Under a winner-takes-all strategy,
only the learner that performs best is preserved, while the others are removed. We reformulate the large
neural network by removing non-selected branches. In the end, we have trained a proxy model that maps
the vectorized control parameters of all wells (u) to the vectorized total oil production of NP producers
(y).
2.3.2.3 Interpretation of Interwell Connectivity
The general schematic of the neural network model used in this paper is illustrated in Figure 2.20. Each
model’s mapping function fij is designed as a multilayer neural network that takes well controls u as input
and predicts the corresponding oil production at the producers y. If the mapping function fij is anticipated
to exhibit nonlinearity, the network can accommodate this by incorporating nonlinear activation functions
at each layer. During training, the model learns this mapping by using (u, y) data pairs to optimize the
network weights through the minimization of a loss function, which quantifies the prediction error, such
as:
T ask(i), Learner(j) : min X
N
k=1
∥Gij (Wij (u
(k)
)) + bij − y
(k)
i
∥
2
2
(2.22)
Using a linear model can enhance the interpretability of the developed model by revealing inter-well
communication and connectivity strength. For example, as shown in Figure 2.20, a linear operator F can
be employed to elucidate the communication strength between well pairs. This approach allows the model
to provide insights into well connectivity and production variables in the field, rather than being a black
box that only approximates the input-output relationship.
However, in subsurface flow systems, a linear model may not always suffice due to the complex and
nonlinear nature of the problem. Therefore, a nonlinear approximation model is also constructed (with
its general structure shown in Figure 2.20) and explored in this paper. The nonlinearity is captured by
59
Figure 2.20: General structure and linear simplification for one of the local networks. The general neural
network structure consists of multiple layers and nonlinear activation functions to capture the complex
input-output relationship. With a linear assumption, the model complexity is reduced while its interpretability is increased because the linear operator can be viewed as the inter-well communication strength
or connectivity. fig:window_linear_assumption
incorporating nonlinear activation functions throughout the network, as detailed in various experiments.
More details about the effect of linear simplification will be discussed later.
2.3.3 Numerical Results
2.3.3.1 Prediction Performance
In this study, we focus on developing a proxy model for rapid prediction of short-term (1-year) oil production in the LBU unit. The dataset for training the proxy model is limited to 100 samples, with an additional
150 samples reserved for testing the model’s performance. Importantly, the test data is not exposed to the
model during training. For each producer, five different window sizes are randomly selected, with each
window uniformly sampled from 10 to 50 cells, where each cell is a cubic block of 50 f t3
. The window
size that delivers the best prediction performance, as determined by the K-fold (K = 5) validation method,
is chosen.
60
Figure 2.21: R2 score for each individual producer in the test data set. fig:Feauture_selection_appraoch_R2_results
The goal of the training process is to predict the total oil production from 356 producers. The overall
model performance is illustrated in Figure 2.21. The prediction results for the test data closely align with
the unit-slope line, and the average R2 score across all 356 producers is 0.941.
An alternative visualization of the prediction performance is presented in Figure 2.22. The top panel of
the figure shows the predicted (in red) and actual (in black) oil production values for each of the 356 wells.
The bottom panel illustrates the prediction errors for these wells. While a few producers exhibit relatively
large errors, approaching 3,000 STB, the majority of errors are within 500 STB for a 1-year prediction,
which is minor compared to the total production from each well.
The histogram and the statistical analysis of the R2
score, selected window sizes, number of adjacent
wells, and density of wells are shown in Figure 2.23. Many of the selected window sizes are below 30 cells
(1,500 ft), while some of them use relatively large, e.g., 40–50 cells (2,000–2,500 ft). Similarly, Figure 2.24
illustrates the R2
scores of the predicted oil production using a color map, with the selected window sizes
represented by the diameters of the circles. The map shows the spatial distribution of the wells, where the
61
Figure 2.22: R2
score for each individual producer in the test data set. fig:Feauture_selection_appraoch_error_results
62
majority of producers achieve R2
scores greater than 0.9. It is important to note that the producers are not
uniformly distributed across the field. As a result, the same window size in different areas may encompass
varying numbers of support wells.
2.3.3.2 Effect of Nonlinearity
The previous analysis presented the results obtained from the general (nonlinear) structure of the model.
Additionally, we conducted numerical experiments using a simplified linear version of the model. In this
section, we compare the performance of the linear and nonlinear proxy models across various training data
sizes. Figure 2.25 shows the R2
score results for different training data sizes, up to 500 samples. Overall,
the nonlinear proxy model outperforms the linear model, though the difference is minimal (less than 0.01).
In addition to using the R2
score as a statistical metric, we evaluated the prediction performance based
on the root mean square error (RMSE) measure. The average RMSE comparison for all producers is shown
in Figure 2.26. A similar trend to the R2
score curve is observed when RMSE is plotted, revealing a slightly
superior result for the nonlinear proxy model compared to the linear model.
For this particular example, the linear approximation appears to provide predictions with accuracy
comparable to those generated by the nonlinear model. This observation is not surprising and can be
attributed to specific field conditions, such as the maturity of the field, with a large number of producers
experiencing high water cut, and the short-term nature of the prediction. These factors suggest that the
pressure and saturation maps of the reservoir may not change significantly during the prediction period.
Furthermore, existing evidence from the production optimization literature also suggests that a linear
input-output approximation model can effectively capture a significant portion of the injection-production
dynamics in a reservoir with a fixed well configuration (Brouwer and Jansen 2002; Albertoni and Lake 2003;
Chen et al. 2009; Wang et al. 2009).
63
Figure 2.23: Histogram and statistical analysis for the selected window sizes, as well as the number of
adjacent wells and their density fig:Window_selection_statistics
64
Figure 2.24: Histogram and statistical analysis for the selected window sizes, as well as the number of
adjacent wells and their density fig:window_size_bubble_plot
Figure 2.25: Average R2
score of the linear and nonlinear predictive models for different numbers of training data (L = linear; NL = nonlinear). In general, the nonlinear proxy model outperforms the linear model.
However, the difference is very small (less than 0.01). fig:Feature_selection_R2_linear_vs_nonlinear
65
Figure 2.26: Average RMSE of the linear and nonlinear predictive models for different sizes of training data
(L = linear; NL = nonlinear). fig:Feature_selection_RMSE_linear_vs_nonlinear
2.3.3.3 Interwell Connectivity
If we explore the detailed information of the trained model with linear simplification, the parameters
(weights) of the neural network may be used to interpret the inter-well connectivity strength by combining
the connectivity matrix with the spatial well locations, similar to what we do with the sparse regularization
approach. The resulting inter-well connectivity map is shown in Figure 2.27. It is evident that all producers
are only connected to their neighboring wells within a certain distance. This is because the windowing
process has filtered out all long-distance wells.
In practice, validating the accuracy of the recovered connectivity map is challenging since the true
inter-well connectivity map is not available and is not even well defined. However, geologic evidence,
such as the presence of two faults within the study area in our example, can be used to evaluate the results.
Figure 2.28 depicts the detected and existing faults in the geological model. The two highlighted areas in
the inter-well connectivity map correspond to these faults in the reservoir. The inferred connectivity map
indicates that the wells on either side of each fault lack connectivity, even though they are geographically
close. Remarkably, the model successfully captures the absence of connectivity across the faults purely
66
Figure 2.27: Recovered interwell connectivity pattern fig:Feature_selection_connectivity_map
based on the input-output training data, without any prior information about the faults or the spatial
configuration of the reservoir.
2.3.3.4 Sensitivity Analysis for the Selected Window Size
An important aspect of the proposed method is the use of random window sizes before selecting a final
region of influence for each producer, as opposed to using a fixed window size for all producers. This
approach addresses the uncertainty surrounding this parameter and helps mitigate errors that could arise
from poor parameter choices. As shown in Figure 2.29, the performance of the trained model with a fixed
window size is highly sensitive to the chosen size. If a favorable value (e.g., 30) is selected, the performance
is acceptable. However, if a suboptimal window size (e.g., 50) is chosen, the prediction accuracy can degrade
significantly. In contrast, the proposed approach assigns each producer a specific window size after a few
67
Figure 2.28: Geological evidence showing consistency between the recovered connectivity map with the
two existing faults in the geological structure. fig:Feature_selection_geological_faults
trials, leading to better overall performance. Figure 2.29 illustrates the benefits of selecting producerspecific window sizes as opposed to a fixed window size for all producers. Even when the fixed window
size is optimized, the general prediction accuracy is not as high as that achieved with variable window
sizes tailored to each producer.
In the proposed method, a range of window sizes is used to perform training for each choice. Generally, the appropriate range for the window size may not be immediately obvious and can be treated as a
tuning parameter. However, experience and knowledge about the field—such as well density and geologic
setting—can guide the selection of a reasonable range. In our numerical experiments, we found that window size ranges of (10, 50) cells or (500, 2500) feet are suitable. The minimum window size of 10 is greater
than the minimum well spacing in the field and ensures that the majority of producers have adjacent wells
68
Figure 2.29: Comparison of random window vs. fixed window sizes during training. For the fixed window
size method, the x-axis represents the value of window size that has been manually defined. For the
random window size method, the x-axis represents the maximum window size from which different sizes
are randomly sampled while the minimum size is always fixed at a value of 10; for example, a value of
30 means window size can change in the range of [10,30]. In the fixed window size method, the size of
the windows is kept constant during the training. The fixed window size method is very sensitive to the
choice of size value; on the other hand, the random window size method is much more stable because of
the selectability of window size—as long as a large enough window size range is assigned, the prediction
accuracy is robust. fig:Fixed_window_vs_random_window
(this assumption can be adjusted if evidence suggests that some wells lack communication with neighboring wells). Specifying the maximum window size is less straightforward because well communication and
interaction are difficult to ascertain. Due to this uncertainty, it is generally recommended to adopt a relatively large (overestimated) value for the maximum window size to avoid excluding potentially significant
wells.
Figures 2.30 and Figure 2.31 show the performance of the random window size method as the maximum
window size increases in our example. When the maximum window size increases from 20 to 30, many
selected window sizes in different regions also increase, leading to a change in the average window size
69
from 15.7 to 21.3. The distribution in Figure 2.31 reveals that many producers, previously limited to window
sizes less than 20, now select sizes between 20 and 30. This increase in the maximum window size results
in an improvement in the average R2
score from 0.91 to 0.934, suggesting that a maximum window size of
20 may be too small to capture well communications in several regions. As the maximum window size is
further increased to 50, the method selects window sizes between 30 and 50 for some producers, although
the majority still maintain window sizes in the range of 10 to 30. Additionally, the average R2
score slightly
improves from 0.934 to 0.937 as the maximum window size increases from 30 to 50, indicating that there
are limited interactions between production wells and wells located more than 30 cells away.
70
Figure 2.30: Selected window sizes with different maximal window size value fig:Feature_selection_maximal_window_bubble_plot71
Figure 2.31: Histogram and distribution of R2
score, selected window sizes, number of adjacent wells, and
well density for each window after changing the maximal window size from 20 to 50. fig:Feature_selection_maximal_window_statistics
2.4 Conclusion
In this chapter, we consider a realistic large-scale reservoir as a case study to demonstrate how domain
knowledge can influence surrogate modeling for reservoir simulators. The reservoir under consideration
includes a large number of wells, resulting in an intrinsically sparse and localized inter-well connectivity
pattern. Leveraging this insight, we design two approaches to incorporate this understanding into our
neural network surrogate model.
In the first approach, we apply LASSO regularization as a soft constraint to encourage sparser interwell connectivity. In the second approach, we manually perform feature selection by using a windowing
technique to include only neighboring wells when developing the surrogate model for a specific producer.
Both approaches can improve the model’s prediction accuracy while preserving interpretability, thereby
enhancing the model’s trustworthiness.
72
Both methods rely on the same insight: production performance is primarily influenced by neighboring
wells, with distant wells having minimal impact. The main difference between the two approaches is
illustrated in Figure 2.32. In the first approach, we initialize the inter-well connectivity by considering all
possible connections and then apply LASSO regularization to sparsify the initially dense connectivity map.
In contrast, the second approach directly excludes distant wells for each producer, effectively eliminating
a large number of infeasible connections from the start. This exclusion can potentially reduce the number
of training samples needed since fewer samples are required to learn that distant wells are irrelevant based
on the input-output relationship.
Sensitivity analysis of the two approaches, as shown in Figure 2.33, reveals that the second approach
performs better than the LASSO approach, especially in extreme cases where only a very limited number
of training samples are available.
Figure 2.32: Compare different initializations of inter-well connectivity fig:Compare_LASSO_Window
73
Figure 2.33: Sensitivity analysis regarding number of training samples of two appraoches fig:Compare_LASSO_Window_R2
74
Chapter 3
Active Learning
3.1 Introduction
3.1.1 Overview
In the previous chapter, we developed surrogate models as fast alternative predictive tools capable of
delivering fit-for-purpose predictions. One of the key advantages of developing such surrogate models
is their applicability in optimization workflows, significantly accelerating the process by replacing the
time-consuming reservoir simulation with a much faster surrogate model.
However, we found that directly replacing the reservoir simulator with a fit-for-purpose surrogate
and expecting it to consistently perform well in an optimization workflow is not practical. This is not
only because the surrogate model may lack the necessary accuracy to effectively support the optimizer
but, more importantly, because the surrogate model is built on a training dataset and is, therefore, highly
biased toward it. In other words, when the instance to be predicted exhibits a pattern that is significantly
different from those in the training dataset, the prediction accuracy tends to decrease. This is because most
machine learning models excel at interpolation but cannot generally extrapolate effectively.
In this chapter, we introduce the concept of active learning, an effective approach to dynamically
update the surrogate model during the optimization process. We will demonstrate that by employing
75
active learning, we can improve the performance of surrogate-based optimization while also reducing the
number of training samples required.
Before introducing our proposed active learning-inspired surrogate-based optimization workflow, it
is helpful to discuss several relevant topics that will assist readers in understanding important literature
with similar concepts.
3.1.2 Adaptive Sampling with Response Surface Model
Before the rise of complex machine learning and neural network-based surrogate modeling techniques
in the last twenty years, classical approaches like the Response Surface Methodology (RSM) [92] were
extensively studied, particularly for problems involving computationally intensive analysis or simulation
processes. RSM is a well-established technique in the literature of Design of Experiments (DOE) [134, 7],
primarily used to approximate an unknown function when only a limited number of computed values are
available. The goal of RSM is to develop an approximate model, typically a polynomial, that represents the
relationship between input variables and output responses. Despite its widespread use, RSM has inherent
challenges, particularly when applied to complex or high-dimensional problems. One of the main issues is
the large data requirements, as RSM often needs a substantial number of data points to effectively cover the
entire input/output space. This can become prohibitively expensive or time-consuming, especially when
dealing with simulations that require significant computational resources.
Two main directions have been explored to address this challenge. The first approach focuses on
improving the experiment planning scheme. Rather than relying on uniform sampling to generate training
samples, more sophisticated Design of Experiments (DOE) techniques have been investigated. For instance,
factorial design [30] and central composite designs [8] are frequently employed to systematically explore
the input space. Other sampling strategies include Taguchi’s method [87], which is effective in robust
76
design, D-optimal design [2], which aims to maximize the statistical efficiency of the experiments, and
Latin Hypercube designs [165, 188], which ensure a more comprehensive coverage of the input space.
The second approach, which is the focus of this work, emphasizes the management of the approximation model during the optimization process in an adaptive manner. A pioneering exploration in this
direction is the Adaptive Response Surface Methodology (ARSM) [175]. Unlike traditional Response Surface Methodology (RSM), which typically follows a fixed experimental plan, ARSM dynamically adjusts the
experimental design based on real-time data, enabling the model to more accurately capture the complexities of systems characterized by non-linearities or multiple optima. By sequentially refining the response
surface model as new data is gathered, ARSM minimizes the number of required experiments, thereby
conserving time and resources while enhancing the accuracy of the optimization process.
Inspired by the concept of ARSM, numerous subsequent works have emerged, exploring a variety of
surrogate models, updating mechanisms, and sampling strategies. Researchers have experimented with
different types of surrogates, such as Kriging [194], radial basis functions [115], and neural networks [60],
to better capture the underlying complexities of the systems being studied. Additionally, various updating mechanisms have been proposed to improve the efficiency and accuracy of these surrogate models,
allowing them to adapt more effectively as new data becomes available during the optimization process.
Different sampling strategies have also been explored to enhance the model’s ability to explore and exploit the input space, ensuring that the surrogate model remains relevant and accurate throughout the
optimization [174]. A good summary can be found in [176].
3.1.3 Bayesian Optimization
The ARSM framework adaptively adds new samples and updates the surrogate model as the optimization
process progresses. The selection of new samples is primarily driven by the goal of including those that
are expected to have higher objective values. However, a crucial aspect of any optimization problem is
77
balancing exploration and exploitation [35]. When determining which new samples to include in the next
iteration, it is essential to consider not only the potential objective value but also the uncertainty associated
with the estimated values for the new samples. For instance, if a data point with a high objective value is
already known, a nearby point is likely to also have a high objective value. However, adding such a point
might be redundant and uninformative, particularly if we assume that the objective function is relatively
smooth and does not change abruptly. Bayesian Optimization (BO) [149, 75, 54, 155], is a classical approach
that offers a flexible framework for balancing exploration and exploitation in optimization tasks.
The core idea of Bayesian optimization is to build a probabilistic model, typically a Gaussian Process
(GP) [113, 182], to approximate the unknown objective function. This model is updated iteratively based
on the outcomes of previous evaluations. By modeling the uncertainty in the objective function, Bayesian
optimization selects the next point to evaluate based on an acquisition function, which balances exploration
(sampling where the model is uncertain) and exploitation (sampling where the model predicts high values).
An acquisition function, such as Expected Improvement (EI), Probability of Improvement (PI), or Upper
Confidence Bound (UCB), guides the search for the optimum by suggesting the next point to evaluate. This
strategy ensures that the search is both efficient and effective, focusing computational resources on the
most promising and informative areas of the search space.
Bayesian optimization is a classical, resource-effective framework that has recently gained significant
attention in the machine learning literature, particularly for hyperparameter tuning [183, 155]. Its ability
to efficiently explore and exploit the search space makes it ideal for optimizing complex, expensive-toevaluate functions, such as those encountered in machine learning models. Recognizing its utility, many
modern machine learning and deep learning frameworks have integrated Bayesian optimization into their
toolsets [107, 95, 12], providing convenient packages that make it easier for practitioners.
78
3.1.4 Trust Region Approach
Another important topic related to the approach proposed in this thesis is the trust region approach [34,
192]. The primary motivation behind the previously introduced ARSM and Bayesian Optimization (BO)
methods is to selectively acquire new samples, thereby avoiding the need to collect an impractically large
number of samples to construct a satisfactory surrogate model. However, in cases where the input-output
relationship becomes exceedingly complex, constructing a surrogate model with sufficient accuracy over
the entire input space can become prohibitively challenging.
The trust region approach offers a solution to this issue by focusing on local approximations rather
than attempting to model the entire space at once. By defining a trust region around the current point and
constructing a simpler, reliable model within this region, the trust region method allows for more manageable and accurate optimization. This localized approach helps mitigate the difficulties associated with
modeling complex relationships over a broad range, making it a valuable technique for scenarios where
surrogate model construction is otherwise infeasible. Trust-region Optimization (TRO) involves constructing a simpler model (often a quadratic approximation) of the objective function within this region. The
optimization process then focuses on finding the minimum of this model, but only within the boundaries of
the trust region. If the model predicts the objective function well, the region may be expanded; otherwise,
it is contracted. This iterative process continues until convergence criteria are met.
One major difference between the Trust Region Optimization (TRO) approach and Adaptive Response
Surface Methodology (ARSM) or Bayesian Optimization (BO) lies in how they handle previously collected
samples. In classical TRO, the approach can be considered a short-memory strategy. For instance, when
employing gradient-based optimization with a quadratic approximation, only a few samples near the current solution are used at each iteration. All prior samples are discarded, which means that the optimization
focuses solely on the local region around the current point. This can be limiting if the previous samples
contained valuable information that could be useful in refining the model. In contrast, ARSM/BO takes an
79
accumulative approach by retaining all previously collected samples. This method continually updates and
refines the surrogate model by incorporating new samples while preserving the information from earlier
ones.
Approaches that combine the trust region framework with the preservation of previous samples have
been explored to harness the benefits of both methodologies. These hybrid approaches aim to leverage
the locality-focused nature of Trust Region Optimization (TRO) while retaining valuable historical data,
as seen in Adaptive Response Surface Methodology (ARSM) and Bayesian Optimization (BO).
The Sequential Approximation Optimization (SAO) [94] is a notable method in optimization that addresses complex problems by breaking them down into more manageable subproblems. This approach
involves decomposing the original optimization problem into a sequence of smaller subproblems, each
confined to a specific subregion of the design space. In SAO, the size and location of these subregions
are dynamically adjusted based on the accuracy of the surrogate model. The updates are guided by the
sequential approximate optimization strategy, as detailed by [3]). This iterative refinement allows the optimization process to focus on areas where the surrogate model is most accurate, improving efficiency and
effectiveness. Recently, local Bayesian optimization has also been proposed from deep learning literature
[46]. They adapt the traditional Bayesian optimization framework for local optimization, making it more
scalable and effective for specific problems encountered in deep learning.
The trust region approach has been extensively studied and integrated into various adaptive frameworks, demonstrating promising results. Concentrating on developing local surrogate models rather than
global proxies can effectively reduce the number of required training samples while still maintaining strong
performance in surrogate-based optimization.
80
3.1.5 Active learning framework
In this thesis, we introduce a framework inspired by ARSM, Bayesian Optimization (BO), and the Trust
Region Optimization (TRO) approach. Our method is specifically tailored for the well control optimization
problem studied here. We employ the concept of active learning to emphasize how we adaptively and
dynamically update our training sample pool and surrogate model throughout the optimization process.
It’s important to note that, in the machine learning community, "active learning" [96, 148, 133] typically
refers to a framework where the learning algorithm iteratively selects the most uncertain or informative
samples from a large pool of unlabeled data and queries an oracle (often a human annotator) for their
labels. While this traditional definition may not perfectly align with our approach, we find it suitable for
our context. Other related terms like "metamodeling" [176], "incremental learning" [184] and "continual
learning" [193] also share similarities. However, for simplicity, we will continue using the term "active
learning" in this thesis, without distinguishing between the differences in these terminologies.
In the following chapter, we will first introduce our active learning framework, with a focus on the differences between global and local proxies, as well as the distinction between gradient-based and derivativefree optimization methods. We will present several examples, including synthetic cases, to demonstrate
the advantages of our proposed approach convincingly.
3.2 Methodology
The main contribution of the proposed approach is to enhance the accuracy and computational efficiency
of machine learning-based proxy models used in gradient-based well control optimization problems. To
achieve this, we employ active learning to develop a novel method for the online training of proxy models
by integrating the training process with local optimization algorithms. In this new approach, training data
are generated during the optimization process, aligned with the evolution of the solution at each iteration.
81
The proxy model is updated iteratively based on labeled data within the local proximity of the current
iteration, ensuring it remains accurate and relevant to the optimization path. This method reduces the
computational demands typically associated with standard offline learning techniques while improving
the accuracy and consistency of the proxy model throughout the optimization process. To underscore
the significance of this approach, we compare its performance to that of offline training, followed by
retraining after convergence. Applying this method to well control optimization demonstrates the superior
performance of online learning, achieving better optimization results with more accurate predictions and
reduced computational overhead.
3.2.1 Global/Passive/Offline Learning Strategy
The general workflow for the global strategy is illustrated in Figure 3.1. This approach begins by generating
a fixed number of samples that are evenly distributed across the entire search space. A commonly used
technique for this step is Latin Hypercube Sampling (LHS), which serves as a space-filling method to
maintain uniformity within the design space. An Artificial Neural Network (ANN)-based proxy model is
then trained using these generated samples, following hyperparameter tuning to ensure optimal prediction
performance. The trained ANN proxy model is subsequently employed to solve the optimization problem.
After obtaining the solution, it is crucial to assess the validity of the proxy model around the identified
solution. Often, the optimization solution may be located far from the samples used during training. To
address this, the model can be retrained using a new set of training data sampled near the obtained solution.
This process may be repeated several times until the stopping criteria—such as a computation budget or an
improvement threshold in the objective function—are met. The detailed algorithm for the global strategy
is presented in the pseudocode of Algorithm 1.
82
Figure 3.1: Workflow for passive/offline training approach: Optimization is performed and completed with
the proxy model; the proxy model is retrained at the end of each optimization until it reproduces similar
results to the simulation model fig:Global_framework
83
ActiveLeawrning_Algorithm1
3.2.2 Local/Active/Online Learning Strategy
A global strategy may suffer from over-exploration when too many samples are generated in regions with
little potential for improving the solution. A local or active strategy can mitigate this issue by generating
training data and constructing the ANN proxy model around the optimization path. This approach not
only improves the accuracy of the ANN model for local search methods but also reduces the computational
load of optimization, distributing the sampling and simulation computation throughout the optimization
iterations. Unlike the global strategy, active learning adapts to the local nature of optimization by focusing
only on the regions of the feasible space visited during the optimization process.
The primary motivation for adopting an active learning strategy is that developing an accurate global
proxy model is often unnecessary and computationally expensive, especially as the dimensionality of decision variables and the objective function’s complexity increase. Adding training data samples far from
84
the local optimization path can be wasteful and may degrade the proxy model’s quality in regions traversed by the optimization algorithm. Furthermore, the well control optimization problem often features
multiple local solutions with similar objective function values, making a local search strategy more appropriate, particularly given the computational demands of a global search method for high-dimensional
control problems.
The general workflow of the local strategy is depicted in Figure 3.2 This approach begins by using the
initial optimization solution to define a hypercube, within which a modest set of samples for training is
generated using the Latin Hypercube Sampling (LHS) method. A local surrogate model is then trained on
this initial dataset and used to perform a small number of iterations, denoted by NMaxIter (with a default
value of 1). It is advisable to limit the number of iterations NMaxIter to a relatively small value. A key
distinction from the passive approach is that the optimization problem is not fully solved; instead, only
a few iterations are performed before retraining the ANN model. This restriction ensures that the local
surrogate-based optimization does not proceed for too many iterations without verifying the model’s prediction accuracy, as the local ANN proxy model, trained on a limited dataset, may behave differently from
the true simulation model.
The local strategy can be seen as a stochastic optimization approach, where the main goal of the local
surrogate is not to perfectly model the input-output relationship within a subregion but rather to provide
a reasonable gradient to guide the update of the solution. Once the surrogate model suggests an update
for the current iteration, its prediction of the updated solution is compared to the true response from
the simulation model. This validation requires one function evaluation (reservoir simulation) per NMaxIter
iterations. In addition to monitoring the proxy model’s validity, the resulting simulation provides valuable
labeled data at the current solution, which can be added to the training dataset to improve the model’s
prediction accuracy around the current iteration.
85
Figure 3.2: Workflow of active/local learning strategy: The proxy model is updated at each iteration of the
optimization algorithm to ensure proximity of the proxy model to the simulation models throughout the
optimization. fig:Local_framework
When a surrogate-based iteration is accepted, the current solution is added to the training dataset, and
the surrogate model is updated through retraining. It’s important to note that the computational cost of
retraining is negligible compared to a single reservoir simulation, so repeated retraining does not significantly impact the overall computational complexity of the algorithm. Because the active learning approach
continuously adds "relevant" data samples to the training dataset, the surrogate model is consistently refined along the optimization path. Adding points that are close to existing data points in the training set
is unlikely to significantly alter the proxy model, suggesting that optimization can proceed with the current model. This observation introduces a new hyperparameter, NMaxIter, which determines how many
86
iterations should occur before retraining is required. Setting NMaxIter = 1 eliminates the need to tune this
hyperparameter and ensures the proxy model continuously improves with each optimization iteration.
The prediction accuracy of the resulting local surrogate model is expected to be limited to the immediate neighborhood of the current iteration. However, the model is never used to predict outputs for regions
far from the current iteration, except when an unusually large step size is taken in some optimization iterations. To account for such scenarios, the algorithm includes a mechanism to validate the surrogate model
and reduce the step size if predictions are deemed unreliable. For stopping criteria, either the total number
of simulation runs NMaxSim or the number of unsuccessful improvements NImp is used. The pseudocode
for the local strategy optimization is provided as Algorithm 2.
ActiveLeawrning_Algorithm2
87
Figure 3.3: Contour plot for the Branin function with three global minima fig:Branin_contour
3.3 Numerical Examples
3.3.1 Synthetic Example
In this section, we use an analytical objective function to explore the behavior of the active learning algorithm and compare it with the traditional passive/offline training approach. The objective function is the
Branin or Branin-Hoo function [118], which is expressed as:
f(x) = a
x2 − bx2
1 + cx1 − r
2
+ s (1 − t) cos(x1) + s, (3.1)
The following commonly used parameter values are: a = 1, b =
5
4π2 , c =
5
π
, r = 6, s = 10, t =
1
8
.
Considering a square interval x1 ∈ [2, 5], x2 ∈ [0, 15], there are three global optima in the feasible
space: x
∗ = (2.275, 9.42478),(2.475, 9.42478),(12.275, 2.275) with the same objective value f(x) =
0.397887. Figure 3.3 shows the contour plot within the considered region.
8
The advantage of examining this 2D example is to visualize the contour plots of both the true model and
the proxy model, as well as to monitor the iteration of the optimization algorithms. Figure 3.4 illustrates a
total of 40 samples generated within the specified interval, with higher density in the top-left region. The
two ANN-based proxy models are trained as follows:
1. Global/Offline Learning (middle): All 40 samples are used as training data to train a global proxy
ANN.
2. Active/Online Learning (right): Only 12 samples located in the focused region (black dashed line)
are used to train the proxy model.
The top three figures display the contour plots of both the true model and each of the proxy models.
The bottom three plots zoom into the top-left portion of the domain (dashed box) from the first-row plots.
A comparison with the true contour plot reveals that while the global proxy captures the general trend of
the true function, its contour plot deviates significantly from the reference contour. In contrast, the local
proxy model, which uses only the samples from the marked region, aligns more closely with the reference
contour and exhibits better local accuracy (R2
score of 0.93 vs. 0.82). Additionally, the optimal point for
the local proxy model (marked with a red star) is closer to the true optimum (marked with a green star).
These observations highlight the advantage of local approximation and the potential errors introduced by
including samples from outside the region of interest.
Next, we implement the active and passive learning algorithms to generate an approximate model for
minimizing the Branin function. The parameters used in both algorithms are summarized in Table 3.1.
Each algorithm is implemented and repeated five times to support the conclusions with higher confidence.
Additionally, the implementation is carried out using two different initial points: one located relatively
close to one of the true optima, and another located relatively far from all three optima.
89
Figure 3.4: Comparison of surrogate accuracy for active and passive learning fig:Synthetic_GS_LS_Compare
As we show in Figure 3.5, the ANN model developed using the local strategy (active learning) converges
to the solution faster, with all numerical experiments finding a solution very close to the true optima
within the specified function call budget. In contrast, the ANN model trained globally shows reasonable
improvements in the early stages of optimization but struggles in the final iterations, as it attempts to
converge to a solution near the true optimum. Consequently, a noticeable gap is observed between the
obtained solution and the true optimum. This observation aligns with previous remarks that a global
model tends to sacrifice local accuracy in approximating the reference function, while a local strategy
progressively improves the local accuracy of the approximation as it converges to the solution.
Another important observation from these numerical experiments is that the local strategy converges
to the closest true local optimum solution, whereas the global strategy sometimes approaches different
90
Parameter Algorithm 1 Algorithm 2
NMaxSim 120 120
NImp 5 5
ϵImp (%) 1 1
δ - 0.2
Ns 15 15
Nr 10 10
γ1 - 2
γ2 - 0.25
NMaxIter - 1
∆Init - 0.2
∆M in - 0.001
∆Max - 0.2
NDmax - 40
Table 3.1: Hyperparameter used for Branin function table:branin_active_parameters
solutions rather than the one closest to the initial point. This discrepancy is partly due to the larger approximation error in the global proxy model, which can shift the solution to a different region of the
objective function. Conversely, the local proxy model in active learning is adapted to the optimization
iterations and is continuously monitored and updated, making it more likely to behave similarly to the
true objective function. However, it is important to note that, due to its approximate nature, even the
active learning strategy may not converge to the exact solution that the true objective function achieves,
especially when many local solutions exist in the objective function. Generally, the information encoded
in the local surrogate model provides a good direction for local search methods. If the optimization strategy shifts from local to global, the learning strategy to develop a proxy model should also be adjusted
accordingly. For the well control optimization problem considered in this work, local search techniques
are widely applied as an efficient approach to improve production performance.
91
Figure 3.5: Results of applying active and passive learning to the Branin function fig:Synthetic_example_compare_optimization
3.3.2 Well Control Optimization in Waterflooding Operation
3.3.2.1 2D Example: SPE10
We now apply the proposed approach to a well control optimization problem using a waterflooding example based on one layer of the SPE10 reservoir model [139]. The model is discretized into 60 × 220
gridblocks and includes two injectors and four producers, with their locations depicted on the heterogeneous permeability map in Figure 3.6. The simulation spans 10 years and is conducted using the MATLAB
Reservoir Simulation Toolbox (MRST). The control step size is set to 1 year, resulting in a total of 60 control variables (10 per well). The optimization objective is to maximize the Net Present Value (NPV) of the
project by adjusting the injection rates of the two injectors and the Bottom Hole Pressures (BHP) of the
four producers. Further details about the reservoir model and simulation parameters are provided in Table
3.2.
Both active and passive learning approaches are implemented using the corresponding ANN models to
solve the well control optimization problem. The parameters for each approach are summarized in Table
3.3. Since the adjoint-based gradient is not available in this example, we use the Ensemble Optimization
92
Figure 3.6: Well location and log-permeability map of one layer of the SPE10 model. fig:SPE10_perm
(EnOpt) algorithm, which approximates the gradient from an ensemble of control perturbations. For details
on the EnOpt approach, refer to [40] and [50]. In our implementation, the EnOpt algorithm is executed
using full-scale reservoir simulations with an ensemble size of 10. For all experiments, the initial solution
for all variables is set to u0 = 0.5.
The results comparing active learning, passive learning, and the EnOpt algorithm are shown in Figure 3.7. Each approach is tested five times, and the results for each trial are presented along with the
“mean curve” (thick solid lines in Figure 3.7). The findings demonstrate that active learning consistently
outperforms the other methods, achieving higher average NPV values with fewer simulations. It is important to emphasize that the primary comparison in this paper is between the active and passive learning
approaches, with the EnOpt method included as an additional optimization strategy for reference. Notably, the passive learning approach yields a lower NPV compared to the EnOpt method. This supports
the hypothesis that generating a global surrogate model can lead to reduced prediction accuracy and hinder continuous improvement, especially as the optimization progresses towards a solution where local
information is essential for refinement.
93
Parameter Value
Number of the grid cells 220×60×1
Grid cell dimensions 30×20×50 ft
Fluid phases Oil/water
Simulated reservoir life cycle 10 years
Injectors control mode Total water injection rate
Injector rate bounds 100–700 STB/D
Producers control mode BHP
Producer BHP bounds 500–4,500 psia
Oil price 50 $
bbl
Water injection cost 6 $
bbl
Water disposal/recycling cost 6 $
bbl
Discount rate 10
Table 3.2: Reservoir model parameters for the SPE10 model. table:SPE10_reservoir_parameters
Parameter Algorithm 1 Algorithm 2
NMaxSim 300 300
NImp 5 5
ϵImp (%) 1 1
δ 0.2 0.2
Ns 200 15
Nr 10 10
γ1 - 2
γ2 - 0.25
NMaxIter - 1
∆ - 0.5
∆M in - 0.001
∆Max - 0.2
NDmax - 40
Table 3.3: Parameters in algorithms used for the SPE10 example tabel:SPE10_active_parameters
Figure 3.7: Evolution of the NPV with the number of simulations for different approaches (SPE10 example) fig:SPE10_NPV_optimization_comapre
94
Figure 3.8 and 3.9 illustrate the comparison of optimal control trajectories and final saturation maps for
each case after optimization. The results from different optimization approaches indicate that the control
trajectories vary, leading to different NPV values. This aligns with the behavior of the control optimization
problem, which often exhibits multiple local optima with similar objective function values. However, a
common pattern in the optimal trajectories is the tendency for all methods to assign larger injection rates
to Injector 1. This is partly because Injector 1 is situated in a relatively low-permeability area compared
to Injector 2.
Regarding the saturation maps, all methods show significant improvements in sweeping efficiency
compared to the case without optimization. The final saturation maps appear similar primarily due to
the fixed geological setting and well locations. By optimizing the control trajectory, the methods manage
to increase oil production from Producer 2, thereby enhancing sweeping efficiency, particularly around
Producer 2.
It is valuable to mentioned that our proposed active learning framework includes several hyperparameters that can significantly affect its efficiency and correctness in practice. In this thesis, we omit a detailed
discussion of these parameters and refer readers to our published paper [191] for a comprehensive understanding.
3.3.2.2 3D Filed-Scale Example: Brugge Model
In this example, we apply the proposed approach to a field-scale case study using the Brugge reservoir
model, a complex 3D benchmark developed by the Netherlands Organization for Applied Scientific Research (TNO) [132]. The Brugge model represents an undersaturated oil reservoir with a total of 30 wells:
20 producers and 10 injectors. Table 3.4 provides a summary of the reservoir simulation and economic
parameters used for this field. The original dataset comprises 104 model realizations that capture geological uncertainty. For this study, we use the first realization as our reference reservoir model. The model
95
Figure 3.8: Comparison of optimal control trajectories in each approach for SPE10 example fig:Optimal_control_different_appraoch
96
Figure 3.9: Comparison of the final saturation map after optimization with each approach fig:SPE10_saturation_compare
97
Parameter Value
Number of the grid cells 139×48×9
Fluid phases Oil/water
Simulated reservoir life cycle 10 years
Injectors control mode Total water injection rate
Injectors rate bounds 2,000–4,000 STB/D
Producers control mode BHP
Producers BHP bounds 725–1,500 psia
Simulation time 10 years
Oil price 80 $
bbl
Water injection cost 5 $
bbl
Water disposal/recycling cost 5 $
bbl
Discount rate 10
Table 3.4: Reservoir model parameters for Brugge model tabel:Brugge_Reservior_parameter
includes four geological zones and nine reservoir simulation layers, with each simulation layer discretized
into a grid mesh of size 139 × 48. Figure 3.10 presents the initial saturation map of a sample geological
realization.
We consider production optimization over the first 10 years with annual updates to the control variables. The initial solution for all variables is specified as u0 = [0.5, 0.5, 0.5, . . . , 0.5].
The comparisons between active learning, passive learning, and EnOpt (with an ensemble size of 10)
are presented in Figure 3.11. The parameters used for each approach are reported in Table 3.5. Both active
and passive learning approaches are executed three times in this example.
As shown in Figure 3.11, the active learning approach demonstrates faster convergence and higher NPV
values compared to passive learning. In the best case, the active learning approach achieves an NPV value
of $1559.3 million within 600 simulations, representing approximately a 45% reduction in computation
compared to the EnOpt approach, which requires more than 1100 simulations to reach a slightly lower
NPV of $1540.2 million. Additionally, the passive learning approach achieves an average NPV of $1468.5
million in this example.
Figure 3.12 compares the optimal control solutions from different approaches. Due to the local nature
of the algorithms and the presence of several local solutions with similar NPV values, the control solutions
98
Figure 3.10: Initial saturation and well configuration for Brugge model fig:Brugge_initial_saturation
Figure 3.11: Evolution of the NPV with the number of simulations for different approaches (Brugge example) fig:Brugge_Optimization_compare
99
Parameter Algorithm 1 Algorithm 2
NMaxSim 1,200 1,200
NImp 5 5
ϵImp (%) 1 1
δ 0.2 0.2
Ns 400 50
Nr 20 20
γ1 - 2
γ2 - 0.25
NMaxIter - 1
∆Init - 0.5
∆M in - 0.001
∆Max - 0.2
NMaxIter - 5
NDmax - 60
Table 3.5: Parameters in algorithms used for the SPE10 example table:Brugge_active_parameters
from different methods are relatively close to each other, featuring low BHP values on several producers (P14, P15, P18, and P19). The Brugge example reinforces the advantages of using active learning for
optimization problems where local search solutions are employed.
Figure 3.12: Comparison of optimal controls and achieved NPVs for Brugge example. Unit of NPV: 10 M fig:Brugge_Control_Compare
100
Figure 3.13: Carbon Storage Mechnisms fig:CO2_storage_mechnism
3.3.3 Geological Carbon Sequestration
3.3.3.1 Introduction of Geological Carbon Sequestration
In this section, we also applied our proposed approach to the Geological Carbon Sequestration (GCS) problem. GCS is recognized as a promising approach to mitigate climate change by reducing greenhouse gas
emissions. In this method, supercritical CO2 is injected into deep saline aquifers or depleted oil reservoirs
for long-term storage. Over the past few decades, as shown in Figure 3.13 the process of CO2 trapping has
been extensively studied, identifying four key storage mechanisms [116]: structural and stratigraphic trapping under impermeable caprocks, residual trapping within rock pore spaces, solubility trapping through
CO2 dissolution in brine, and mineral trapping via chemical reactions with rock minerals. These mechanisms are typically ranked by their contribution to storage security.
After injection, CO2 tends to rise in the aquifer due to buoyancy forces. Without the barrier of impermeable caprocks, this upward movement could lead to the risk of CO2 leakage into shallow groundwater
or even the atmosphere. Generally, structural and stratigraphic trapping are considered less secure due to
101
potential leaks through caprock or other pathways. Residual trapping is deemed safer, as CO2 is immobilized in isolated rock pores during drainage and imbibition. Solubility trapping is also regarded as secure
since dissolved CO2 remains stable unless a significant drop in pressure occurs. Mineral trapping offers
the highest level of security, as CO2 reacts with formation minerals to form solid precipitates, thereby
permanently trapping it. However, this process can take an extended time, often spanning hundreds to
thousands of years, and is difficult to predict due to the complex nature of long-term thermodynamics
and reaction kinetics. Enhancing the contact between the injected CO2 and the resident brine and rock
formations can improve these trapping mechanisms.
Increasing the immobilization of CO2 through residual and solubility trapping enhances the overall
security of the storage. Therefore, reducing the amount of mobile or free CO2 is crucial in lowering the
risk of leakage. This can be accomplished through careful engineering design combined with optimization
strategies. A key factor influencing the distribution of free and immobilized CO2 is the heterogeneity
of the aquifer, which dictates the flow patterns and the evolution of the CO2 plume post-injection [150].
Consequently, the ability to safely trap CO2 depends on this heterogeneity. This study focuses on applying
optimization workflows to reduce the volume of free CO2 by designing strategies that allocate injection
rates among various injectors based on field heterogeneity.
Predicting the performance of GCS accurately necessitates the numerical simulation of complex flow
and transport processes, which can be computationally demanding, especially for large-scale field applications. The complexity increases further when coupled flow and geomechanics models are included [197].
Standard workflows like well control optimization may require hundreds or thousands of simulation runs
to evaluate objective functions while searching the feasible space for optimal solutions. The computational
resources needed for such workflows can be prohibitive in practical scenarios.
102
Figure 3.14: Considered reservoir model in this thesisfig:GCS_reservoir_model
3.3.3.2 GCS Reservoir Simulation Model
We conduct our numerical experiments using a three-dimensional reservoir model. As shown in Figure
3.14, the reservoir comprises 20 layers, each containing 2,500 grid blocks arranged in a 50 × 50 mesh, with
each block measuring 200 m × 200 m × 5 m. The aquifer features four distinct rock types with heterogeneous facies distribution, generated using the sequential indicator simulation (SISIM) [147] method. The
permeability values for the four lithofacies are 100 mD, 20 mD, 5 mD, and 1 mD, respectively. A large
surrounding aquifer region is included to dissipate the pressure induced by CO2 injection.
The reservoir is equipped with four CO2 injection wells, perforated at the 18th layer, where CO2 is
injected over a period of 20 years. The control step size is set to one year, resulting in a total of 80 decision
variables. The total CO2 injection rate is fixed at 2.7 million tons per year.
103
3.3.3.3 Results
We evaluate our proposed active learning framework using the described reservoir model and compare
its performance against passive learning and the traditional Ensemble Optimization (EnOpt) approach.
The optimization process begins with an injection strategy that evenly distributes the total CO2 injection
volume across the four wells. The optimization results are illustrated in Figure 3.15. Both the active and
passive learning frameworks demonstrate faster convergence compared to the EnOpt method. Specifically,
active learning achieves a 57.8% (from 1440 to 607) reduction in the number of required simulations, while
passive learning achieves a 36.5% reduction (from 1440 to 914).
Figure 3.15: Compare optimization performance for GCS example fig:GCS_optimization_results
The well control solutions (allocation of injection rates) for each approach are presented in Figure
3.16, with the corresponding CO2 plumes, visualized as CO2 saturation, shown in Figure 3.17. Despite
achieving similar reductions in free CO2, the solutions exhibit distinct allocation strategies. A consistent
pattern across all three approaches is the assignment of significantly higher injection rates to Injector 3.
104
This trend can be partially attributed to the lateral spread of the CO2 plume near Injector 3, which is more
pronounced compared to the other injectors.
The CO2 plume around Well 3 exhibits greater interaction with the aquifer and its resident brine,
enhancing residual and solubility trapping mechanisms. In contrast, the CO2 plumes from the initial,
evenly distributed injection strategy show that the plumes near Injectors 2 and 4 predominantly migrate
upwards, resulting in limited lateral spread. This upward migration mainly occurs in the top layers, further
restricting the extent of lateral CO2 distribution.
Figure 3.16: Optimized injection strategies of the three different approaches fig:GCS_well_control_compare
3.4 Conclusion
The development of proxy models to accelerate standard workflows in reservoir engineering and management has been extensively explored in the literature. Typically, the use of data-driven surrogate models
in reservoir engineering involves generating numerous realizations of simulated (labeled) data to cover a
wide range of input/output (feature-label) samples. However, generating such a large dataset for model
training is computationally expensive, posing a significant barrier to the practical application of these
105
Figure 3.17: Evolution of the CO2 plume based on optimized injection strategies fig:GCS_plume_compare
techniques in field scenarios. Therefore, it is crucial to develop proxy models that are both reliable and
require minimal data and computational resources.
In this dissertation, we introduced a novel active learning framework aimed at enhancing the computational efficiency of existing approaches that rely on offline training and retraining. By integrating the
development of proxy models with the iterations of local optimization techniques, our proposed model
not only improves optimization results but also reduces the computational burden associated with passive
learning. The application of active learning for online/adaptive training of surrogate models in well control optimization allows for continuous monitoring of the proxy model’s validity, ensuring its fidelity is
maintained throughout the optimization process.
106
The advantages of this framework are illustrated and discussed through applications to analytical optimization objective functions and well control optimization problems in both waterflooding and geological
carbon sequestration applications. Specifically, we demonstrate that rather than developing an Artificial
Neural Network (ANN)-based proxy model offline—frontloading the computational effort—it is more efficient and consistent to solve the optimization problem using an active learning scheme. In this approach,
the training is adapted to the optimization iterations, yielding a more accurate local proxy model. The
primary advantage of this method is that the proxy model is constructed around the optimizer, providing
a precise approximation only for the points encountered during optimization. As a result, training (simulated) samples are generated online, as needed, to enhance local accuracy and reduce the computational
complexity of the workflow.
Comparative results between active learning and traditional passive learning approaches reveal that
an actively learned, locally accurate proxy model leads to faster convergence and improved final solutions
(i.e., objective function values). The active learning strategy proposed in this thesis is novel and represents
a significant advancement towards algorithmic development that minimizes data requirements while enhancing the performance of proxy models. When employed in gradient-based optimization, the proxy
model can be fine-tuned to generate accurate predictions around the optimization iterations, as only local
information is necessary to progress towards the solution. Leveraging this characteristic, the developed
active learning approach provides a highly accurate and computationally efficient proxy model, tailored
for purpose, that outperforms traditional offline training approaches.
107
Chapter 4
Robust Optimization under Geological Uncertainty
4.1 Introduction
Reservoir simulator is critical in modern reservoir management for Reservoir simulators are essential in
modern reservoir management for aiding decision-making but are often time-consuming. In previous
chapters, we discussed using surrogate models as a faster alternative to reservoir simulators and introduced
novel approaches to enhance the efficiency and practicality of these models for real-world applications. In
Chapter 2, we explored how incorporating domain knowledge can improve the accuracy and interoperability of surrogates, while in Chapter 3, we presented an active learning approach to make surrogate-based
optimization workflows more reliable. However, another crucial aspect that has not been addressed is
geological uncertainty, which can significantly increase computational demands. In this chapter, we will
discuss geological uncertainty and propose a simple yet effective strategy for mitigating the computational
load it induces: the reduced sampling strategy.
In real-world applications, due to data limitations, reservoir descriptions are inherently uncertain,
making a stochastic approach to the optimization problem necessary (usually represented by an ensemble of realizations). A practical method for representing and quantifying uncertainty in these scenarios is
the Monte Carlo simulation, which generates multiple samples (realizations) of the uncertain parameters.
These samples are then propagated through the fluid flow dynamics (reservoir simulation) to quantify the
108
Figure 4.1: Increased complexity of robust optimization fig:Sketch_Difficulty_Robust_Optimization
uncertainty in the response measures. Performing optimization with uncertainty of parameters is usually
referred as robust optimization [16]. However, a significant challenge in implementing this robust optimization approach in large-scale field problems is the computational complexity of Monte Carlo simulations, especially given the time-consuming nature of large-scale reservoir simulations. The computational
demand is further exacerbated by the iterative nature of robust optimization, which requires numerous
iterations to converge on a solution as shown in Figure 4.1 .Several examples of robust optimization approaches can be found in the literature, including [172], [84], [177], and [132].
In this chapter, we also consider well control optimization for waterflooding, using Net Present Value
(NPV) as our objective function. This problem has been extensively studied in the literature and adjointbased gradient methods has been applied to solve the problem. For instance, Sudaryanto and Yortsos [163]
applied an adjoint formulation to maximize the displacement efficiency of waterflooding operations by
controlling the injection rate policy through a first-order gradient-based method. Brouwer and Jansen [19]
109
employed the gradient descent method with an adjoint model to maximize NPV in waterflooding, optimizing both bottom-hole pressure (BHP) and rate controls. Additionally, Kraaijevanger et al. [97] combined a
generalized reduced gradient method with the adjoint approach to optimize well control settings.
Although adjoint-based implementations offer accurate and efficient gradient information for optimization, developing an adjoint model is complex, particularly when using commercial flow simulation
software, as it requires access to the reservoir simulation source code. This challenge has led researchers
to explore alternative techniques for gradient approximation that do not necessitate direct access to the
reservoir simulation codes. One such technique is Ensemble Optimization (EnOpt) [161, 109], inspired by
the Ensemble Kalman Filter [48].
Chen [29] proposed a new formulation for implementing the EnOpt algorithm to solve robust optimization problems by using an ensemble of randomly perturbed control vectors to approximate the objective function, where the size of the control ensemble matches that of the geological realizations. In
this approach, each control vector is paired with a corresponding realization from the ensemble of reservoir models. Subsequent studies, such as those in [28, 27] successfully applied a similar idea as EnOpt to
large-scale waterflooding optimization problems.
More recently, the accuracy of the estimated gradients has been enhanced by modifying the covariance
structure and determining the optimal ensemble size for the perturbed vectors, as shown in the work of
[51, 52] and [143]. Fonseca et al. [50] introduced an improved version of EnOpt called Stochastic Simplex
Approximate Gradient (StoSAG), which will be detailed in the methodology part.
Both EnOpt and StoSAG are designed to approximate gradients based on a limited number of samples while using the full ensemble to compute the objective function. However, recent studies have also
explored approximating the objective function itself to reduce the computational burden of robust optimization [177, 103, 149].
110
Wang et al. [177] approached the well placement problem as a sequence of optimization subproblems with an increasing number of realizations. Initially, the number of realizations is small, gradually
increasing through the optimization sequence, with each subproblem initialized using the solution from
the previous one. Li et al. [103] employed the simultaneous perturbation stochastic approximation (SPSA)
algorithm for field development optimization under geological uncertainty, approximating the cost function at each iteration using a limited number of random realizations. They examined the effect of ensemble
size, reporting similar final NPV values regardless of the number of realizations. Shirangi and Durlofsky
[149] utilized a clustering algorithm to select a representative subset of realizations to approximate the
objective function for field development and production optimization. Jesmani et al. [85] took a different
approach, using a small subset of randomly selected model realizations at each iteration to approximate the
objective function for well placement optimization under geological uncertainty using the SPSA algorithm
[157].
In this work, we introduce a novel approach using random sampling of reservoir model realizations not
only to approximate the objective function but also to provide an approximate estimate of the gradient for
robust optimization. This formulation is inspired by the implementation of stochastic gradient descent [5,
104, 93] in large-scale machine learning applications, where both the objective function and its gradients
are approximated using a subset of realizations, thereby reducing the computational burden of robust optimization. We present and discuss computational results for three examples, with the last two focusing on
field-scale implementation. The computational efficiency and optimization performance of the proposed
algorithm are compared with the traditional robust production optimization approach, where the entire
ensemble of model realizations is used during optimization iterations.
111
4.2 Methodology
4.2.1 Robust Production Optimiztaion
The rock properties, such as permeability and porosity, which control the flow behavior, are often highly
uncertain. Probabilistic methods are commonly used to quantify and propagate this uncertainty. In practice, this involves sampling techniques and ensemble representations, where instead of relying on a single
model, a set of realizations (typically a few hundred) of these uncertain properties is employed to predict and optimize production performance. Robust optimization refers to incorporating the uncertainty in
reservoir properties into the well control optimization problem. As a result, optimization is carried out over
multiple model realizations to maximize a statistical performance measure, such as the expected value of
the objective function. In ensemble-based reservoir modeling and production forecasting, the corresponding optimization objective function can be defined based on the point statistics of the NPV function. A
common approach is to use the expected value of the NPV across the ensemble of model realizations, which
can be expressed as:
J(u, m) = Em[f(u)] = 1
N
X
N
i=1
f(u, mi), (4.1)
Where Em[·] denotes the statistical expectation operator, and N represents the number of model realizations (ensemble size) used to compute the objective function. While the expected NPV is a widely used
statistical measure for optimization, alternative metrics such as the P10, P50, and P90 of NPV can also be
used to reflect the user’s risk preferences. In this study, we adopt the expected NPV (with a negative sign)
as our minimization objective function.
Robust well control optimization is computationally very demanding due to the following reasons:
1. Performing well control optimization on a single model is a computationally demanding task since
obtaining the objective function value f(u) requires a reservoir simulation run, which involves the
112
numerical solution of complex partial differential equations over millions of gridblocks. For large
and complex simulation models used in practical applications, the time required to complete a single
forward reservoir simulation could span days or weeks.
2. When geological uncertainty is considered, the computational load can increase significantly. To
obtain the objective function 1
N
PN
i=1 fi(u, mi), N reservoir simulation runs are required, rather
than just one.
3. Gradient-based methods have gained popularity in well control optimization, particularly due to
the availability of adjoint models that enable efficient gradient computation. The adjoint method,
derived from optimal control theory, provides efficient and accurate estimates of the gradient of the
objective function (or well responses) with respect to well control variables. The gradient of the
expected Net Present Value (NPV) is calculated as the expected gradient over the realizations:
uk =
∂
∂uk
1
N
X
N
i=1
fi(uk, mi)
!
=
1
N
X
N
i=1
∂fi
∂uk
(uk, mi). (4.2) eq:average_gradient
Here, uk represents the decision variables at iteration k. When the adjoint-based gradients ∂fi
∂uk
(uk, mi)
are available, Equation (5) can be used to obtain the analytical gradient for robust optimization. A
simple gradient descent strategy is then applied to update the decision variables as follows:
uk+1 = uk − αkgk, (4.3)
where αk is the step size. Obtaining the adjoint gradient information for a single realization requires
additional computation time (usually proportional to a forward simulation run). Consequently, approximately N reservoir simulation runs are necessary to obtain the gradient for robust optimization.
113
4. Implementing the adjoint in a reservoir simulation code is time-consuming and requires access to
and modifications of the source code, which is not trivial for commercial simulation software. Moreover, the computational load required for this problem can become extremely intensive when the
analytical gradient is not available. For example, with the number of decision variables Nu = 60 and
the number of realizations N = 100, a one-sided finite difference approximation of the gradients in
Eq. 4.2 requires approximately Nu × N + N = 6100 simulation runs. If the number of iterations is
around 100, then approximately 6100×100 = 0.61 million simulation runs are needed, which is impractical. The resulting computational load can hinder the practicality of robust optimization and,
therefore, necessitates efficient techniques to calculate or approximate the gradients. Alternative
approaches to approximate the gradients have been developed in the literature.
4.2.2 Gradient Approximation
In this work, when the adjoint model for gradient calculation is not available (which is the assumption in
our third example), we use the StoSAG approach to approximate the gradients. The StoSAG procedure for
gradient approximation is as follows:
1. At the iteration k with the current decision variables uk, for each realization, generate N independent control perturbations:
uˆk,j,i ∼ N (uk, CU ), j = 1, 2, . . . , Np, i = 1, 2, . . . , N. (4.4)
2. For the generated control perturbations u
(k)
j,i , run the simulator and obtain the corresponding objective as f(u
(k)
j,i , mi).
114
3. Then, the gradient can be approximated as:
g˜k =
1
N
X
N
i=1
1
Np
X
Np
j=1
(ˆuk,j,i − uk) (f(ˆuk,j,i, mi) − f(uk, mi)). (4.5) eq:StoSAG
Once the approximate gradient is calculated, a gradient descent algorithm can be applied to update
the decision variables. For example:
uk+1 = uk − αkg˜k. (4.6)
The StoSAG approach serves as the reference case for evaluating the computational efficiency of our
proposed reduced sampling technique.
4.2.3 Reduced Sampling for Efficient Objective Function and Gradient Approximation
In robust optimization, calculating the objective function and the approximated gradients across all different realizations is computationally intensive, especially when a high ratio of control perturbations per
model realization is employed. In this study, we introduce a reduced sampling strategy. At each optimization iteration, we approximate both the objective function and the gradients using a smaller subset, Ns, of
randomly chosen realizations from the total N realizations. This approach anticipates the repeated use of
each model realization to approximate the objective function and gradients throughout the iterations.
Random sampling for approximating the objective function and gradients is a well-established method
in machine learning. Techniques such as stochastic gradient descent (SGD) and its variant, mini-batch
gradient descent, are commonly utilized for the backpropagation step in training neural networks. SGD
simplifies the computational load by substituting the actual gradient, which should be computed from the
full dataset, with an estimate derived from a smaller, randomly selected subset of data. We apply a similar
concept in our robust production optimization approach, using one or more randomly selected reservoir
115
models to estimate the objective function and its gradients. Furthermore, we explore how the size of the
selected sample influences the accuracy, stability, and convergence rate of the optimization algorithm.
Each iteration of our proposed method involves estimating the objective function and its gradients over a
subset of realizations (Ns << N).
J˜ =
1
Ns
X
Ns
i=1
fi(u, m
Rk
i
), (4.7)
g˜ =
1
Ns
X
Ns
r=1
∇fi(u, m
Rk
i
), (4.8)
where Rk represents the subset of randomly selected realizations of uncertain model parameters at iteration k, and m
(i)
Rk
denotes a sample realization from that subset. Larger values of Ns will result in increased
accuracy and computational cost of the approximation. The main idea behind the reduced random sampling approach is to balance the sampling error and computational complexity of the robust optimization
algorithm. In particular, we hypothesize that a small subset of samples from a large ensemble of model
realizations is likely to provide reasonable descent directions that can be used to minimize the original
objective function. The optimization iterations in this case take the form:
uk+1 = uk − αkg˜k, (4.9)
where g˜k represents the approximate gradient that is computed from a small subset of randomly selected
model realizations.
116
4.3 Numerical Experiments
4.3.1 Rosenbrock Functions
In this example, we evaluate the performance of our approach by applying it to the Rosenbrock function
[151]. The original Rosenbrock function is expressed as:
f(x, y) = (a − x)
2 + b(y − x
2
)
2
, (4.10)
where the parameters are a = 1 and b = 100. Instead of using fixed values for a and b, we model a and b
as uncertainties following normal distributions a ∼ N (1, 1) and b ∼ N (100, 40). We can then define the
objective function as follows:
f(x, y) = Ea∼N (1,1),b∼N (100,40)[f(x, y; a, b)]. (4.11)
Using Monte Carlo simulation, we generate an ensemble of realizations for these a and b parameters to
compute a corresponding realization of f and use them to approximate the expected value of the function:
fi(x, y; ai
, bi) = (ai − x)
2 + bi(y − x
2
)
2
, (4.12)
f(x, y) = 1
N
X
N
i=1
fi(x, y; ai
, bi). (4.13)
For this function, the analytical gradient is available and can be expressed as:
∇f =
2
N
X
N
i=1
ai(x) + 2bi(x)(y − x
2
),
2
N
X
N
i=1
bi(y − x
2
)
!
. (4.14)
117
We now minimize f(x, y) using simple gradient descent optimization techniques to update the decision
variables.
The motivation behind our reduced sampling approach is to use fewer samples, Ns < N, to compute a
noisy version of the expected objective function by randomly selecting a different subset at each iteration.
Although the reduced sample size is expected to introduce sampling error, the resulting approximated
objective function and gradients could still prove effective in determining a reasonable descent direction
for minimization. The approximate objective function and its gradient can be expressed as:
f(x, y; a, b) ≈
1
Ns
X
Ns
i=1
ai(x)
2 + bi(y − x
2
)
2
, (4.15) eq:approx_objective
∇f ≈
2
Ns
X
Ns
i=1
ai(x) + 2bi(x)(y − x
2
),
2
Ns
X
Ns
i=1
bi(y − x
2
)
!
. (4.16)
Figure 4.2 shows a comparison between the reduced sampling strategy with Ns = 10 (red line) and
the reference case with N = 100 (black line). Figure 4.2 (a) and Figure 4.2 (b) display the evolution of the
objective function during optimization as a function of the number of iterations and the number of function evaluations, respectively. When implementing the reduced sampling approach, we initialize with Ns
randomly selected samples and perform one iteration of minimization using the approximated objective
function (Eq. 4.15). For each subsequent iteration, different randomly selected Ns samples are used to approximate the objective function and its gradient. Figure 4.2 (b) visualizes the change in selected samples at
each iteration, with the thinner red lines in the background showing the evolution of the approximate objective function as implemented in the algorithm. Sudden shifts occur when the random samples from the
previous iteration are replaced with new ones. The evolution of the corresponding true objective function
for each iterate of the reduced sampling approach is also shown, following a smoother decline (note that
this function is not computed during minimization with reduced sampling and is shown to illustrate the
11
behavior of the algorithm). Although the approximate objective function exhibits local fluctuations, it follows a similar declining trend as the true objective function and can be considered a noisy approximation
of it.
The computational efficiency of the reduced sampling approach is illustrated in Figure 4.2 (b). After
approximately 300 function evaluations with Ns = 10, the reduction in the objective function value is
comparable to that achieved with the reference case (with N = 100) after about 3,000 function evaluations. Figure 4.3 shows the results of a sensitivity analysis with respect to Ns, similar to Figure 4.2 but
for Ns = 1, 5, and 10. As the sample size increases, the reduced sampling algorithm exhibits less variability. However, even with a sample size of Ns = 1, which is often used in stochastic gradient descent
algorithms for training neural networks, the objective function shows a rapid initial decline. This example clearly demonstrates the trade-off between computational efficiency and sampling error: reducing the
sample size improves computational efficiency but leads to noisier behavior of the approximated objective
function. It is important to note that in many practical applications with high-dimensional and complex
parameters, the sample size used for robust optimization is typically constrained by available computational resources. The goal of the reduced sampling strategy in this work is to show that even very small
sample sizes, when randomly selected, can significantly reduce computational costs while still providing
reasonable minimization behavior.
An important characteristic of the reduced order strategy with random sampling is the fluctuation
in the objective function caused by changes in the selected samples at each iteration. This behavior is
less pronounced in the early stages of minimization, where a significant decline in the objective function
is observed. However, these fluctuations become more noticeable in later iterations as the optimization
nears convergence. An effective strategy is required to terminate the optimization process. One simple
approach is to use the moving average of the objective function value and stop the optimization when it
falls below a predefined threshold. Given the behavior of the reduced sampling strategy, another approach
119
Figure 4.2: The evolution of the objective function during optimization with respect to the number of
iterations. (b) The evolution of the objective function during optimization with respect to the number of
function evaluations. fig:Rosenbrock_opt_comparison
120
Figure 4.3: Investigating the effect of different sample sizes ( Ns = 1, 5, 10). (a) The evolution of the objective
function during optimization with respect to the number of iterations. (b) The evolution of the objective
function during optimization with respect to the number of function evaluations. Each experiment is
repeated 10 times. fig:Rosenbrock_sensitivity121
is to use adaptive sample sizes Ns, employing smaller samples in the early stages of optimization when
large changes in the objective function are anticipated, and larger samples in later iterations as convergence
is approached.
4.3.2 SPE10 Model
We also use the SPE10 reservoir model as a 2D synthetic case study, which was introduced in Chapter
3. The model is discretized into 60 × 220 gridblocks, and N = 100 permeability and porosity maps are
used to represent the uncertainty in the reservoir model. Figures 4.4 (a) and 4.4 (b) display four samples
of log-permeability and porosity maps, respectively, generated using Sequential Gaussian Simulation. The
simulation model is developed for a waterflooding scenario, including two injectors located in the middle
of the domain and four producers positioned at the corners. The simulation and optimization are carried
out over a 10-year period. We then optimize the injection rates of the two injectors and the total liquid
production rates of the four producers. The control step size is set to 1 year, resulting in a total of 60
control variables (10 per well). Each element in the matrix representing the control variables corresponds
to a specific well at a particular time interval.
For robust production optimization, an optimal well-control strategy is obtained by minimizing the
expectation of the negative NPV as represented in Eq. 1.1.3 over N = 100 realizations.
A comparison between the approximated and true objective function values is illustrated in Figure 4.5.
The results indicate that when Ns = 1, in addition to a significant level of variability, the final solution
tends to be underestimated. This effect occurs because when only a single realization is randomly selected
at each iteration and geological uncertainty is significant, an update to the control trajectory based on
a particular realization may result in a lower NPV when applied to another realization. Thus, Ns = 1
appears to be too small given the relatively large geological uncertainty. However, increasing the sample
122
Figure 4.4: Four samples of (a) log-permeability and (b) porosity maps for the SPE10 model. fig:SPE10_PERM_PORO
sizes to Ns = 5 or Ns = 10, stabilizes the behavior of the objective function, and deviations from the true
values become smaller.
We also observed that the final achieved NPV based on the reduced sampling technique could be
slightly lower compared with the reference case. This suggests that, at later stages of optimization, the
approximated gradients may not be capable of further improving the true objective function even with
Ns = 10. To avoid potential loss in NPV, a full ensemble may be necessary for computing gradient information closer to convergence. This strategy is applied in our subsequent complex 3D example. For this
123
Figure 4.5: Evolution of the objective function for reference case (N = 100) and reduced sampling approach
with three different selected sample sizes ( Ns = 1, 5, 10), from top to down. Two different initial points are
explored. fig:SPE10_optimization_compare_true_approx
example, we mainly focus on analyzing the general behavior and testing the robustness of the optimization
procedure when only a subset of realizations is considered at each iteration.
The reduced sampling technique involves randomly selecting a subset of realizations to approximate
the gradients used for updating the control trajectory. However, in theory, this approach does not guarantee that the calculated gradients will consistently improve the true objective function value. It is possible
that a direction of update, which enhances the NPV for a specific realization, could actually cause a significant decrease in the expected NPV across the entire ensemble. An example of this can be seen in Figure 4.6
124
(bottom plot), where at Iteration 32, there is a noticeable sharp decline in the true objective function. Increasing the sample size can help prevent such occurrences. As with the Rosenbrock example, we repeated
this experiment 10 times. The results, shown in Figure 4.7, clearly demonstrate that increasing the sample
size can stabilize the optimization process, particularly in the later stages. In the case of Ns = 1, significant fluctuations remain in the later stages, indicating that a single realization is insufficient to accurately
represent the behavior of the entire ensemble.
Figure 4.6: Comparison of true objective function evolution during reduced sampling optimization for
three different numbers of sample sizes and two initializations (top and bottom). Black dashed lines show
the evolution of exact optimization, where all realizations are used during optimization. fig:SPE10_optimization_sensitivity_iteration
The main idea of the proposed reduced sampling strategy is to sacrifice a certain degree of accuracy for
both the expected gradient and expected objective value, but we can significantly reduce the computational
load related to run simulation over the entire realization set. Fig. 8 shows the trade-off between the solution accuracy and the computational burden of different implementations. With fewer samples (smaller
Ns), the computational demand is less; however, the objective function does not show stable behavior.
125
Figure 4.7: Statistical measure of the optimization procedure (SPE10 model, based on Initialization 2); each
numerical experiment is repeated 10 times (with different random seeds that lead to different paths when
visiting the realizations). Top: At each iteration, we visualize the true objective value within one standard
deviation among 10 data points. Bottom: At each iteration, we display the standard deviation (std) of the
true objective value. fig:SPE10_optimization_statistics
For sufficiently large sample sizes (larger Ns), the approximation accuracy is improved, and the objective
function’s behavior is more stable. In this example, with a sampling size of Ns = 10, the proposed approach
achieves the same level of NPV after around 200 simulation runs, compared with 1,000 simulation runs for
the reference, indicating a significant computational saving for the reduced sampling approach.
The main motivation of the proposed reduced sampling strategy is to trade off a certain degree of accuracy in both the expected gradient and the expected objective value in exchange for a significant reduction
in the computational load associated with running simulations over the entire set of realizations. Figure
4.8 illustrates the trade-off between solution accuracy and computational burden for different implementations. With fewer samples (smaller Ns), the computational demand decreases; however, the objective
126
function tends to exhibit less stable behavior. Conversely, with sufficiently large sample sizes (larger Ns),
the approximation accuracy improves, resulting in more stable behavior of the objective function. In this
example, with a sample size of Ns = 10, the proposed approach achieves the same level of NPV after approximately 200 simulation runs, compared to 1,000 simulation runs for the reference case, demonstrating
a significant computational saving with the reduced sampling approach.
Figure 4.8: Number of reservoir simulaion runs during reduced sampling optimization for three different
sample sizes with two initializations (top and bottom). Black dashed lines show the number of reservoir
simulations for the exact optimization where the entire realizations are used during optimization. fig:SPE10_optimization_simulation_runs
Another observation is that although the reduced sampling approach can achieve a similar level of
NPV compared to the reference case, the corresponding control strategies may differ. Figure 4.9 presents
a comparison between the final control strategies after optimization. Figure 4.9 (a) displays the optimal
127
Figure 4.9: Optimal control trajectories for two injectors (injection rate) and four producers (total liquid
production rate) obtained from (a) reference case (robust optimization); (b) reduced sampling approach
with Ns = 1.0 fig:SPE10_compare_control
control trajectory for the reference case, while Figure 4.9 (b) shows the trajectory for the reduced sampling
approach with 10 samples (Ns = 10). Despite the similar final NPV values for both approaches (as shown
by the red and black dashed lines in Figure 3.7), their optimal solutions differ significantly. This difference
arises from a well-known characteristic of control optimization problems: multiple local solutions with
similar NPV values are often present. Additionally, the stochastic nature of the reduced sampling approach
leads to an optimization path that may diverge from that of the reference case.
4.3.3 Brugge Model
In this example, we utilize a field-scale model (Brugge Model) to evaluate the performance of the reduced
sampling strategy in a larger and more complex system. This model was also introduced in Chapter 3, but
128
here, instead of considering a single realization, we incorporate multiple realizations to account for the
uncertainties in rock properties.
The original data set includes 104 model realizations to capture the geological uncertainty within the
reservoir (we use the first 100 realizations to maintain consistency with our previous two examples). The
models are composed of four geological zones and nine reservoir simulation layers, with each layer discretized into a grid mesh of size 139 × 48. Since the model is designed for closed-loop reservoir management, the first 10 years of production data are provided to calibrate the initial realizations. These calibrated
models are then used for production optimization over the following 20 years. However, in this study, we
focus exclusively on the production optimization for the first 10 years, applying annual adjustments to the
control variables, without addressing the model calibration process. The total number of decision variables
is 300 (= 30 × 10).
The uncertain parameters in this problem include water saturation (sw), horizontal (kx, ky) and vertical
(kz) permeabilities, porosity (ϕ), and net-to-gross ratio (NTG), which represents the fraction of the reservoir volume occupied by hydrocarbons. Consequently, the total number of uncertain model parameters for
each realization is equal to the number of active gridblocks (i.e., 44,550) multiplied by six (corresponding
to the six uncertain parameters: sw, kx, ky, kz, ϕ, and NTG). For example, Figure 4.10 the logarithm of
horizontal permeabilities (log kx) across 16 realizations.
To further increase the complexity of the problem, we assume that an adjoint model is not available,
requiring the gradient information to be computed or approximated numerically. This scenario is not uncommon in practice, as obtaining adjoint-based gradients necessitates access to and modification of the
simulator source code, which is not always available in commercial simulators. The computation of gradients often significantly increases the computational burden of the optimization problem. As discussed in
the previous section, we employ the StoSAG approach for gradient approximation. In StoSAG, the number
of control perturbations assigned to each geological realization (Np) is a critical hyperparameter. For this
129
Figure 4.10: Log horizontal permeability distribution for 16 realizations fig:Brugge_16_perm
example, each iteration requires 600 reservoir simulations: 500 simulations (= 100 × 5) for gradient estimation and an additional 100 simulations for validating the objective value based on the updated decision
variables.
The efficiency of our reduced sampling approach comes from the fact that, at each iteration, it is not
necessary to include all the realizations because the objective is to make an acceptable improvement. In
other words, a descent direction is still possible even when approximate objective functions and gradients
are used. Consequently, at each iteration, we randomly select a subset Ns from the 100 realizations and
replace N in Eq. 4.5 with Ns, leading to the following formulation:
g˜k =
1
Ns
X
Ns
i=1
1
Np
X
Np
j=1
(ˆuk,j,i − uk) (f(ˆuk,j,i, mi) − f(uk, mi)). (4.17)
With only Ns×5+Ns reservoir simulation runs required to update the decision variables for one iteration, the reduced sampling approach significantly reduces the number of reservoir simulations compared
to the reference case, particularly for small values of Ns.
130
Figures 4.11 and 4.12 display the evolution of the objective function during optimization with respect
to the number of iterations and reservoir simulation runs, respectively. Each plot includes two vertical
lines that divide the optimization process into three stages:
Stage 1 (left section of the plots in Figure 4.12) employs the reduced sampling strategy to approximate
both the gradient and the objective value, resulting in Ns ×5 + Ns reservoir simulation runs per iteration.
The objective value is approximated (labeled as “Approx”) since only randomly selected Ns realizations are
used to calculate the expected value of NPV. This approach offers the greatest computational reduction but
may introduce fluctuations that make it difficult to monitor the improvement of the true objective function.
To address this, the moving average (labeled as “MA”) with a window size of 10 of the approximated
objective function curve is used to provide a coarse measurement of the true objective value. Stage 1 is
terminated when the moving average curve does not show any increment for NPat iterations (where NPat =
10 in this example). The true objective (labeled as “True”) in Stage 1 is not known and is only visualized for
analysis and discussion purposes. Comparing the moving average curve with the true objective function
curve, the moving average provides a better indication of the trend in the true objective.
Stage 1 uses the moving average to monitor the improvement in the true objective function value.
However, as the optimization approaches convergence and the improvement per iteration diminishes, the
moving average becomes less accurate in reflecting the true objective function. In Stage 2 (middle section),
we continue with the reduced sampling strategy to approximate the gradient, but the objective function is
computed using the full ensemble size. This approach ensures that optimization (with the approximated
gradient) proceeds effectively. In this case, each iteration requires Ns × 5 + 100 reservoir simulation runs.
Stage 2 is concluded when the relative improvement in the true objective value (which is now observable)
falls below ϵ (where ϵ = 0.1%).
Stage 3 (rightmost section) follows the original StoSAG (reference case) approach, where all realizations
are utilized in computing both the gradient and the objective function. Each iteration in Stage 3 requires
131
Figure 4.11: The evolution of the objective function with respect to the number of iterations for the Brugge
example. fig:Brugge_optimization_true_approx
600 (100 × 5 + 100) reservoir simulation runs. This stage is initiated near convergence to maximize the
improvement in the objective function as the solution is approached. Essentially, the reduced sampling
approach can be seen as a preconditioner or initialization step before executing the full ensemble-based
optimization.
Three different sample sizes Ns = 1, 5, and 10 are utilized in this example. Table 4.1 summarizes
the number of simulations and the resulting NPVs for different stages and cases. Figure 4.12 illustrates
the evolution of the objective function value with respect to the number of simulations. Consistent with
previous examples, a smaller Ns value leads to greater deviation from the reference case approximation,
while also yielding more significant computational savings with the reduced sampling algorithm. For the
case of Ns = 1, the objective function value improved from 1.0371 billion to 1.6198 billion with only 876
simulation runs. However, to further increase the value from 1.6198 billion to the final achieved 1.6744
billion, an additional 14,865 simulation runs were necessary. This occurred because the case with Ns = 1
132
Figure 4.12: The evolution of the objective function with respect to the number of reservoir simulations
for the Brugge example. fig:Brugge_optimization_simulation_runs
could not generate sufficiently accurate gradients in the later stages of optimization, resulting in a need for
a relatively large number of iterations in Stage 3, where gradients are computed using the full ensemble.
In general, once a sufficiently large Ns is employed, a clear reduction in computational effort can be
achieved. In this Brugge example, we observe that the case with Ns = 5 can deliver an overall computational reduction of 80.6% (5,010 simulations compared to 25,800) without sacrificing the NPV compared
to the reference case. Similarly, the case with Ns = 10 also demonstrates promising computational savings (75.4%). In summary, a smaller Ns is generally preferred during the initial stages of optimization due
to its ability to facilitate rapid improvement. However, in the later stages of optimization, increasing the
value of Ns may become necessary to enhance the accuracy of the gradient and the overall progress of the
optimization objective function.
133
RS1 RS5 RS10 Ref
Stage 1 Achieved NPV 1.6198 1.6781 1.6588 /
Number of simulations 876 2,895 2,750 /
Stage 2 Achieved NPV 1.6599 1.6802 1.6802 /
Number of simulations 4,065 915 2,400 /
Stage 3 Achieved NPV 1.6744 1.6830 1.6815 /
Number of simulations 7,800 1,200 1,200 /
Total Achieved NPV 1.6744 1.6830 1.6815 1.6832
Number of simulations 10,800 5,010 6,350 25,800
Computational reduction 39.0% 80.6% 75.4% /
Table 4.1: Brief summary of achieved NPV and required simulation in the Brugge example table:reduced_sampling_brugge_comprehensive_results
4.4 Conclusion
In this chapter, we present an efficient approach to solving robust well control optimization problems using
a reduced sampling strategy. At each iteration of the robust optimization process, the method randomly
selects a small subset of the full ensemble of model realizations (which typically comprises a relatively large
set) to approximate the expected value of the objective function and its gradient. The randomness in the
sampling strategy is designed to enhance the computational efficiency of the algorithm while ensuring that,
over multiple optimization iterations, all samples contribute to the computation of the objective function
and its gradient. The reduction in sample size directly correlates with the number of full-scale simulations
required, which are often computationally expensive to execute.
Three numerical experiments are performed to evaluate the proposed method: the first example is
based on the widely-used Rosenbrock function minimization, the second example involves a 2D reservoir
model with 100 realizations adapted from the original SPE10 model (where the adjoint gradient is assumed
to be available), and the third example utilizes the Brugge 3D reservoir model with 100 realizations, where
adjoint-based gradients are not available, and a stochastic approach (StoSAG) is employed to numerically
approximate the required gradient.
134
Sensitivity analysis is conducted to assess the performance of the reduced sampling approach with
respect to the sampling size. Generally, the choice of sample size is problem-dependent and typically functions of the dimensionality, level of uncertainty, and complexity of the objective function. At each iteration,
the reduced sampling approach utilizes a small number of randomly selected samples to approximate the
objective function and its gradient, effectively serving as a proxy model for both gradient estimation and
objective function evaluation.
Several examples, including a 3D case study based on the Brugge reservoir model, are provided to
demonstrate the effectiveness of the reduced sampling strategy in alleviating the computational burden of
robust optimization. Overall, reduced sampling proves to be an effective strategy for balancing accuracy
and computational efficiency, particularly in large-scale problems where the computational demands of
robust optimization can become prohibitive for field applications.
135
Chapter 5
Conclusions and Future works
This dissertation focuses on developing efficient optimization approaches for control optimization in subsurface systems, with a particular emphasis on constructing machine learning-based surrogate models. It
discusses several practical considerations that enhance the performance and reliability of the optimization
results while significantly reducing the computational load.
We first explore a project involving a realistic large-scale reservoir with thousands of wells in Long
Beach, CA. Given the extensive number of wells, we focus on the localized and sparse inter-well connectivity pattern, which serves as a straightforward yet crucial engineering insight. To incorporate this
insight, we employ a sparsity-promoted strategy and feature engineering techniques to transform it into a
mathematical framework that can be seamlessly integrated into the training process. We emphasize that
embedding such insights is critical for introducing a level of interpretability that significantly enhances the
reliability of the original ’black-box’ model while also reducing the number of required training samples.
Constructing a reasonable and accurate surrogate model is only the beginning. When the surrogate
model is used for optimization, additional concerns have to be exercised to make the surrogate-based modeling successful. One key concern is ensuring that the surrogate model maintains reasonable accuracy as
the optimizer iterates. Another is avoiding the unnecessary use of computational resources to obtain redundant or less valuable samples. To address these challenges, we proposed an active learning framework
136
designed to selectively and dynamically acquire new training samples. This approach allows the surrogate model to be updated adaptively, ensuring that it evolves in response to the optimization process and
remains accurate and efficient throughout.
Another major challenge in any subsurface system application is geological uncertainty. Due to the
reliance on limited and sparsely observed data for subsurface characterization, there is often a significant
degree of uncertainty. In control optimization scenarios, this uncertainty complicates the problem as robust
optimization. However, this complexity dramatically increases the computational load, sometimes making
it impractical for real-world applications. In this dissertation, we propose a reduced sampling strategy
inspired by stochastic gradient methods from the computer science literature. This approach has proven
effective in reducing the computational burden without compromising the quality of the optimization
results. By efficiently managing uncertainty, our method enables more practical and scalable solutions for
subsurface system optimization.
In this dissertation, we explored various strategies aimed at developing efficient and practical optimization approaches for control optimization in subsurface systems, using machine learning surrogate.
Below, we outline some of our key findings:
• In our Long Beach Unit project, we were motivated by the need for a surrogate model that aligns
with the principle that a producer’s performance is primarily influenced by nearby wells. To achieve
this, we designed frameworks that construct the surrogate models while ensuring they provide an
estimation of inter-well connectivity and, therefore, can be validated by geological evidence.
• We also found that, despite the inherent complexity of flow behavior in porous media, a simple linear
model can effectively capture the key input-output relationships relevant to control optimization.
This finding offers a practical lesson: when the number of required training samples becomes too
demanding due to the complexity of the flow behavior, considering a simpler model that reduces
137
the need for extensive training data can still be an efficient and effective approach for supporting
decision-making in real-world applications.
• We observed that as increasingly powerful and complex network and machine learning frameworks
are developed in the computer science literature, there is a growing trend in the subsurface flow field
to "borrow" these advanced models to build surrogate models with improved prediction accuracy.
However, this approach often overlooks a critical aspect: the primary purpose of surrogate models
in this context is their use within optimization frameworks.
• In practice, we found that relying on an open-loop workflow—where proxies are developed first and
then applied to downstream tasks—may not always yield reliable results. A more robust approach
involves dynamically updating the surrogate models, ensuring they are tailored to the specific optimization target. This adaptive framework is versatile and can be directly applied to a wide range
of applications that involve computationally expensive simulators, enhancing both reliability and
efficiency in optimization processes.
• Geological uncertainty is crucial to consider because it allows for probabilistic prediction rather
than deterministic prediction, which is vital in real-world applications. Decision-making in operating subsurface systems is highly expensive, and a poor decision can result in significant profit
loss. However, accounting for this uncertainty requires the inclusion of multiple realizations in the
optimization process, which adds complexity to the procedure.
• To mitigate the increased computational complexity caused by including multiple realizations, we
developed a reduced sampling strategy that approximates both the objective and gradient of robust optimization. This approach is simple to implement, requiring no additional expertise, yet it
significantly reduces computational demands while maintaining effectiveness.
138
An efficient surrogate-based optimization workflow can play a crucial role in real applications within subsurface flow systems. However, several aspects have not been thoroughly explored in this dissertation and
remain highly valuable for further investigation. Below, we highlight several areas we consider important
for future research:
• Our proposed active learning strategy involves adding new training samples along the optimization path to maintain adequate local surrogate accuracy. This approach can be seen as a modified
version of the trust region method, utilizing gradient information to guide updates. However, as
with any gradient-based optimization approach, a potential issue is getting stuck in local minima,
which can lead to underestimated results. Another important aspect to consider is incorporating
new samples that have higher uncertainty, as these tend to be more informative. Bayesian optimization and Bayesian neural networks [158] are suitable approaches for this purpose, as they provide a
framework for including uncertainty descriptions corresponding to prediction values.
• It is straightforward to integrate a robust optimization objective into our active learning workflow
and apply our proposed reduced sampling strategy to minimize computational load. We have implemented this framework in a geological carbon sequestration example and obtained preliminary
results. However, further sensitivity analysis and detailed investigation are required to fully evaluate
the framework’s performance. Consequently, these results are not included in this dissertation.
• In our Long Beach Unit (LBU) project, we used a mathematical graph to represent the inter-well connectivity pattern. A linear model with specific regularization constraints was employed to estimate
this connectivity. However, our approach focused primarily on static connectivity without seriously
accounting for temporal changes. Recently, graph neural networks (GNNs) [185] have emerged as
a promising method for handling spatial-temporal data. Given their potential, applying GNNs to
139
represent very large-scale fields, such as the one used in our study with a vast number of wells,
presents an intriguing opportunity for further research.
140
Bibliography
[1] Sigurd I Aanonsen, Geir Nævdal, Dean S Oliver, Albert C Reynolds, and Brice Vallès. “The
ensemble Kalman filter in reservoir engineering–a review”. In: SPE Journal 14.03 (2009),
pp. 393–412.
[2] P Fernandes de Aguiar, B Bourguignon, MS Khots, DL Massart, and R Phan-Than-Luu.
“D-optimal designs”. In: Chemometrics and intelligent laboratory systems 30.2 (1995), pp. 199–210.
[3] Natalia M Alexandrov, John E Dennis Jr, Robert Michael Lewis, and Virginia Torczon. “A
trust-region framework for managing the use of approximation models in optimization”. In:
Structural optimization 15.1 (1998), pp. 16–23.
[4] Ahmed H Alhuthali, Akhil Datta-Gupta, Bevan Yuen, and Jerry P Fontanilla. “Field applications of
waterflood optimization via optimal rate control with smart wells”. In: SPE Reservoir Evaluation &
Engineering 13.03 (2010), pp. 406–422.
[5] Shun-ichi Amari. “Backpropagation and stochastic gradient descent method”. In: Neurocomputing
5.4-5 (1993), pp. 185–196.
[6] Linda C Ames. “Long Beach oil operations-a history”. In: (1987).
[7] Virgil L Anderson and Robert A McLean. Design of experiments: a realistic approach. CRC Press,
2018.
[8] Nooshin Asadi and Hamid Zilouei. “Optimization of organosolv pretreatment of rice straw for
enhanced biohydrogen production using Enterobacter aerogenes”. In: Bioresource technology 227
(2017), pp. 335–344.
[9] Solomon Asante-Okyere, Chuanbo Shen, Yao Yevenyo Ziggah, Mercy Moses Rulegeya, and
Xiangfeng Zhu. “Investigating the predictive performance of Gaussian process regression in
evaluating reservoir porosity and permeability”. In: Energies 11.12 (2018), p. 3261.
[10] O Badru and CS Kabir. “Well placement optimization in field development”. In: SPE Annual
Technical Conference and Exhibition? SPE. 2003, SPE–84191.
141
[11] Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm,
Manish Parashar, Abani Patra, James Sethian, Stefan Wild, et al. Workshop report on basic research
needs for scientific machine learning: Core technologies for artificial intelligence. Tech. rep. USDOE
Office of Science (SC), Washington, DC (United States), 2019.
[12] Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham,
Andrew G Wilson, and Eytan Bakshy. “BoTorch: A framework for efficient Monte-Carlo Bayesian
optimization”. In: Advances in neural information processing systems 33 (2020), pp. 21524–21538.
[13] Enrico Barbier. “Geothermal energy technology and current status: an overview”. In: Renewable
and sustainable energy reviews 6.1-2 (2002), pp. 3–65.
[14] RP Batycky, Martin J Blunt, and Marco R Thiele. “A 3D field-scale streamline-based reservoir
simulator”. In: SPE Reservoir Engineering 12.04 (1997), pp. 246–254.
[15] Mohammed C Bellout, David E Ciaurri, Louis J Durlofsky, Bjarne Foss, and Jon Kleppe. “Joint
optimization of oil well placement and controls”. In: Computational Geosciences 16.4 (2012),
pp. 1061–1079.
[16] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization. Vol. 28.
Princeton university press, 2009.
[17] Yoshua Bengio and Yves Grandvalet. “No unbiased estimator of the variance of k-fold
cross-validation”. In: Advances in Neural Information Processing Systems 16 (2003).
[18] Zyed Bouzarkouna, Didier Yu Ding, and Anne Auger. “Well placement optimization with the
covariance matrix adaptation evolution strategy and meta-models”. In: Computational Geosciences
16 (2012), pp. 75–92.
[19] Dirk Roelof Brouwer and J-D Jansen. “Dynamic optimization of waterflooding with smart wells
using optimal control theory”. In: SPE journal 9.04 (2004), pp. 391–402.
[20] Emmanuel J Candès, Justin Romberg, and Terence Tao. “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information”. In: IEEE Transactions on
information theory 52.2 (2006), pp. 489–509.
[21] Marco A Cardoso and Louis J Durlofsky. “Linearized reduced-order models for subsurface flow
simulation”. In: Journal of Computational Physics 229.3 (2010), pp. 681–700.
[22] Marco A Cardoso, Louis J Durlofsky, and Pallav Sarma. “Development and application of
reduced-order modeling procedures for subsurface flow simulation”. In: International journal for
numerical methods in engineering 77.9 (2009), pp. 1322–1350.
[23] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby,
Leslie Vogt-Maranto, and Lenka Zdeborová. “Machine learning and the physical sciences”. In:
Reviews of Modern Physics 91.4 (2019), p. 045002.
[24] Michael Carter, Camille C Price, and Ghaith Rabadi. Operations research: a practical introduction.
Chapman and Hall/CRC, 2018.
142
[25] Rich Caruana, Steve Lawrence, and C Giles. “Overfitting in neural nets: Backpropagation,
conjugate gradient, and early stopping”. In: Advances in neural information processing systems 13
(2000).
[26] Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. “Machine learning interpretability:
A survey on methods and metrics”. In: Electronics 8.8 (2019), p. 832.
[27] Bailian Chen and Albert C Reynolds. “Ensemble-based optimization of the
water-alternating-gas-injection process”. In: Spe Journal 21.03 (2016), pp. 0786–0798.
[28] Yan Chen and Dean S Oliver. “Ensemble-based closed-loop optimization applied to Brugge field”.
In: SPE Reservoir Evaluation & Engineering 13.01 (2010), pp. 56–71.
[29] Yan Chen, Dean S Oliver, and Dongxiao Zhang. “Efficient ensemble-based closed-loop production
optimization”. In: Spe Journal 14.04 (2009), pp. 634–645.
[30] Ching-Shui Cheng. Theory of factorial design. Chapman and Hall/CRC Boca Raton, FL, USA, 2016.
[31] Jean-Paul Chiles and Pierre Delfiner. Geostatistics: modeling spatial uncertainty. Vol. 713. John
Wiley & Sons, 2012.
[32] Vivek Chitale. “Borehole imaging in reservoir characterization: implementation of a standard
interpretation workflow for the clastic-and carbonate reservoirs”. In: SPWLA Annual Logging
Symposium. SPWLA. 2005, SPWLA–2005.
[33] CMG Reservoir Simulation Software. https://www.cmgl.ca/. Accessed: 2024-04-15.
[34] Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. Trust region methods. SIAM, 2000.
[35] Matej Črepinšek, Shih-Hsi Liu, and Marjan Mernik. “Exploration and exploitation in evolutionary
algorithms: A survey”. In: ACM computing surveys (CSUR) 45.3 (2013), pp. 1–33.
[36] Salvatore Cuomo, Vincenzo Schiano Di Cola, Fabio Giampaolo, Gianluigi Rozza, Maziar Raissi,
and Francesco Piccialli. “Scientific machine learning through physics–informed neural networks:
Where we are and what’s next”. In: Journal of Scientific Computing 92.3 (2022), p. 88.
[37] Carlos A De Moura and Carlos S Kubrusly. “The courant–friedrichs–lewy (cfl) condition”. In:
AMC 10.12 (2013), pp. 45–90.
[38] Angela Dean and Daniel Voss. Design and analysis of experiments. Springer, 1999.
[39] Shuaiwei Ding, Hanqiao Jiang, Junjian Li, and Guoping Tang. “Optimization of well placement by
combination of a modified particle swarm optimization algorithm and quality map method”. In:
Computational Geosciences 18 (2014), pp. 747–762.
[40] Sy T Do and Albert C Reynolds. “Theoretical connections between optimization algorithms based
on an approximate gradient”. In: Computational Geosciences 17 (2013), pp. 959–973.
143
[41] David L Donoho. “Compressed sensing”. In: IEEE Transactions on information theory 52.4 (2006),
pp. 1289–1306.
[42] ECLIPSE Industry-Reference Reservoir Simulator. https://www.software.slb.com/products/eclipse.
Accessed: 2024-04-15.
[43] IM Edwards and Arthur Jutan. “Optimization and control using response surface methods”. In:
Computers & chemical engineering 21.4 (1997), pp. 441–453.
[44] Michael Elad. Sparse and redundant representations: from theory to applications in signal and image
processing. Springer Science & Business Media, 2010.
[45] Alexandre A Emerick, Eugênio Silva, Bruno Messer, Luciana F Almeida, Dilza Szwarcman,
Marco Aurélio C Pacheco, and Marley MBR Vellasco. “Well placement optimization using a
genetic algorithm with nonlinear constraints”. In: SPE Reservoir Simulation Conference? SPE. 2009,
SPE–118808.
[46] David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. “Scalable
global optimization via local Bayesian optimization”. In: Advances in neural information
processing systems 32 (2019).
[47] Lawrence C Evans. Partial differential equations. Vol. 19. American Mathematical Society, 2022.
[48] Geir Evensen. “The ensemble Kalman filter: Theoretical formulation and practical
implementation”. In: Ocean dynamics 53 (2003), pp. 343–367.
[49] John R Fanchi. Principles of applied reservoir simulation. Elsevier, 2005.
[50] Rahul Rahul-Mark Fonseca, Bailian Chen, Jan Dirk Jansen, and Albert Reynolds. “A stochastic
simplex approximate gradient (StoSAG) for optimization under uncertainty”. In: International
Journal for Numerical Methods in Engineering 109.13 (2017), pp. 1756–1776.
[51] RM M Fonseca, SS S Kahrobaei, LJT JT van Gastel, Olwijn Leeuwenburgh, and JD D Jansen.
“Quantification of the impact of ensemble size on the quality of an ensemble gradient using
principles of hypothesis testing”. In: SPE Reservoir Simulation Conference? SPE. 2015,
D031S011R003.
[52] RM M Fonseca, Olwijn Leeuwenburgh, PMJ MJ Van den Hof, and JD D Jansen. “Improving the
ensemble-optimization method through covariance-matrix adaptation”. In: Spe Journal 20.01
(2015), pp. 155–168.
[53] Fahim Forouzanfar and AC Reynolds. “Joint optimization of number of wells, well locations and
controls using a gradient-based algorithm”. In: Chemical Engineering Research and Design 92.7
(2014), pp. 1315–1328.
[54] Peter I Frazier. “A tutorial on Bayesian optimization”. In: arXiv preprint arXiv:1807.02811 (2018).
[55] Tadayoshi Fushiki. “Estimation of prediction error by using K-fold cross-validation”. In: Statistics
and Computing 21 (2011), pp. 137–146.
144
[56] Virginie Gabrel, Cécile Murat, and Aurélie Thiele. “Recent advances in robust optimization: An
overview”. In: European journal of operational research 235.3 (2014), pp. 471–483.
[57] Mohammadreza Ghasemi, Yanfang Yang, Eduardo Gildin, Yalchin Efendiev, and Victor Calo. “Fast
multiscale reservoir simulations using pod-deim model reduction”. In: SPE Reservoir Simulation
Conference? SPE. 2015, D031S010R005.
[58] Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal.
“Explaining explanations: An overview of interpretability of machine learning”. In: 2018 IEEE 5th
International Conference on data science and advanced analytics (DSAA). IEEE. 2018, pp. 80–89.
[59] Gene H Golub, Per Christian Hansen, and Dianne P O’Leary. “Tikhonov regularization and total
least squares”. In: SIAM journal on matrix analysis and applications 21.1 (1999), pp. 185–194.
[60] Herbert Martins Gomes and Armando Miguel Awruch. “Comparison of response surface and
neural network with other methods for structural reliability analysis”. In: Structural safety 26.1
(2004), pp. 49–67.
[61] Antonio Gulli and Sujit Pal. Deep learning with Keras. Packt Publishing Ltd, 2017.
[62] Zhenyu Guo and Albert C Reynolds. “Robust life-cycle production optimization with a
support-vector-regression proxy”. In: Spe Journal 23.06 (2018), pp. 2409–2427.
[63] Isabelle Guyon and André Elisseeff. “An introduction to variable and feature selection”. In:
Journal of machine learning research 3.Mar (2003), pp. 1157–1182.
[64] David J Hand. “Principles of data mining”. In: Drug safety 30 (2007), pp. 621–622.
[65] Nikolaus Hansen and Andreas Ostermeier. “Completely derandomized self-adaptation in
evolution strategies”. In: Evolutionary Computation, 2001. Proceedings of the 2001 Congress on.
Vol. 1. IEEE. 2001, pp. 142–149.
[66] Jincong He and Louis J Durlofsky. “Constraint reduction procedures for reduced-order subsurface
flow models based on POD–TPWL”. In: International Journal for Numerical Methods in
Engineering 103.1 (2015), pp. 1–30.
[67] Jincong He, Jonsat Sætrom, and Louis J Durlofsky. “Enhanced linearized reduced-order models
for subsurface flow simulation”. In: Journal of Computational Physics 230.23 (2011), pp. 8313–8341.
[68] T Heijn, Renato Markovinovic, and J-D Jansen. “Generation of low-order reservoir models using
system-theoretical concepts”. In: SPE Journal 9.02 (2004), pp. 202–218.
[69] Curtis P Henderson. “The stratigraphy of the Wilmington oil field”. In: (1987).
[70] Arthur E Hoerl and Robert W Kennard. “Ridge regression: applications to nonorthogonal
problems”. In: Technometrics 12.1 (1970), pp. 69–82.
145
[71] Rafael Wanderley de Holanda, Eduardo Gildin, Jerry L Jensen, Larry W Lake, and C Shah Kabir.
“A state-of-the-art literature review on capacitance resistance models for reservoir
characterization and performance forecasting”. In: Energies 11.12 (2018), p. 3368.
[72] John H Holland. “Genetic algorithms”. In: Scientific american 267.1 (1992), pp. 66–73.
[73] Philip Holmes. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge
university press, 2012.
[74] Reiner Horst, Panos M Pardalos, and Nguyen Van Thoai. Introduction to global optimization.
Springer Science & Business Media, 2000.
[75] Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. “Meta-learning in
neural networks: A survey”. In: IEEE transactions on pattern analysis and machine intelligence 44.9
(2021), pp. 5149–5169.
[76] Xuri Huang, Laurent Meister, and Rick Workman. “Reservoir characterization by integration of
time-lapse seismic and production data”. In: SPE Annual Technical Conference and Exhibition?
SPE. 1997, SPE–38695.
[77] Thomas D Humphries, Ronald D Haynes, and Lesley A James. “Simultaneous and sequential
approaches to joint optimization of well placement and control”. In: Computational Geosciences 18
(2014), pp. 433–448.
[78] Obiajulu J Isebor, David Echeverría Ciaurri, and Louis J Durlofsky. “Generalized
field-development optimization with derivative-free procedures”. In: Spe Journal 19.05 (2014),
pp. 891–908.
[79] Obiajulu J Isebor, Louis J Durlofsky, and David Echeverría Ciaurri. “A derivative-free
methodology with local and global search for the constrained joint optimization of well locations
and controls”. In: Computational Geosciences 18 (2014), pp. 463–482.
[80] Obiajulu Joseph Isebor. Derivative-free optimization for generalized oil field development. Stanford
University, 2013.
[81] Behnam Jafarpour, Vivek K Goyal, Dennis B McLaughlin, and William T Freeman. “Compressed
history matching: exploiting transform-domain sparsity for regularization of nonlinear dynamic
data integration problems”. In: Mathematical Geosciences 42 (2010), pp. 1–27.
[82] Sharad Kumar Jain, A Das, and DK Srivastava. “Application of ANN for reservoir inflow
prediction and operation”. In: Journal of water resources planning and management 125.5 (1999),
pp. 263–271.
[83] Jan Dirk Jansen. “Adjoint-based optimization of multi-phase flow through porous media–a
review”. In: Computers & Fluids 46.1 (2011), pp. 40–51.
[84] Jan Dirk Jansen, Sander G Douma, Diederik R Brouwer, Paul M J Van den Hof, and
Arnold W Heemink. “Closed-loop reservoir management”. In: SPE Reservoir Simulation
Symposium. Society of Petroleum Engineers. 2009.
146
[85] Mansoureh Jesmani, Behnam Jafarpour, Mathias C Bellout, and Bjarne Foss. “A reduced random
sampling strategy for fast robust well placement optimization”. In: Journal of Petroleum Science
and Engineering 184 (2020), p. 106414.
[86] C Hsein Juang, Jie Zhang, Mengfen Shen, and Jinzheng Hu. “Probabilistic methods for unified
treatment of geotechnical and geological uncertainties in a geotechnical analysis”. In: Engineering
geology 249 (2019), pp. 148–161.
[87] Shyam Kumar Karna, Rajeshwar Sahai, et al. “An overview on Taguchi method”. In: International
journal of engineering and mathematical sciences 1.1 (2012), pp. 1–7.
[88] George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang.
“Physics-informed machine learning”. In: Nature Reviews Physics 3.6 (2021), pp. 422–440.
[89] Sourabh Katoch, Sumit Singh Chauhan, and Vijay Kumar. “A review on genetic algorithm: past,
present, and future”. In: Multimedia tools and applications 80 (2021), pp. 8091–8126.
[90] James Kennedy and Russell Eberhart. “Particle swarm optimization”. In: Proceedings of ICNN’95 -
International Conference on Neural Networks. Vol. 4. IEEE. 1995, pp. 1942–1948.
[91] Mohammadreza Mohammad Khaninezhad, Behnam Jafarpour, and Lianlin Li. “Sparse geologic
dictionaries for subsurface flow model calibration: Part I. Inversion formulation”. In: Advances in
Water Resources 39 (2012), pp. 106–121.
[92] André I Khuri and Siuli Mukhopadhyay. “Response surface methodology”. In: Wiley
interdisciplinary reviews: Computational statistics 2.2 (2010), pp. 128–149.
[93] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv
preprint arXiv:1412.6980 (2014).
[94] Satoshi Kitayama, Masao Arakawa, and Koetsu Yamazaki. “Sequential approximate optimization
using radial basis function network for engineering optimization”. In: Optimization and
Engineering 12 (2011), pp. 535–557.
[95] Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. “Fast bayesian
optimization of machine learning hyperparameters on large datasets”. In: Artificial intelligence
and statistics. PMLR. 2017, pp. 528–536.
[96] Ksenia Konyushkova, Raphael Sznitman, and Pascal Fua. “Learning active learning from data”. In:
Advances in neural information processing systems 30 (2017).
[97] JFBM Kraaijevanger, PJP Egberts, and JRVHW Buurman. Optimal waterflood design using the
adjoint method. SPE Reservoir Simulation Symposium, 26-28 February, Houston, Texas, USA.
Tech. rep. SPE-105764-MS, 2007.
[98] Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, and Michael W Mahoney.
“Characterizing possible failure modes in physics-informed neural networks”. In: Advances in
neural information processing systems 34 (2021), pp. 26548–26560.
147
[99] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep
convolutional neural networks”. In: Advances in neural information processing systems 25 (2012).
[100] Larry Lake. Reservoir characterization. Elsevier, 2012.
[101] B Li and F Friedmann. “Novel multiple resolutions design of experiment/response surface
methodology for uncertainty analysis of reservoir simulation forecasts”. In: SPE Reservoir
Simulation Conference? SPE. 2005, SPE–92853.
[102] Lianlin Li and Behnam Jafarpour. “A sparse Bayesian framework for conditioning uncertain
geologic models to nonlinear flow measurements”. In: Advances in Water Resources 33.9 (2010),
pp. 1024–1042.
[103] Lianlin Li, Behnam Jafarpour, and M Reza Mohammad-Khaninezhad. “A simultaneous
perturbation stochastic approximation algorithm for coupled well placement and control
optimization under geologic uncertainty”. In: Computational Geosciences 17 (2013), pp. 167–188.
[104] Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. “Efficient mini-batch training for
stochastic optimization”. In: Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining. 2014, pp. 661–670.
[105] Knut-Andreas Lie et al. Matlab Reservoir Simulation Toolbox (MRST). http://www.mrst.no/.
Accessed: 2024-04-15.
[106] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. “Explainable ai: A review of
machine learning interpretability methods”. In: Entropy 23.1 (2020), p. 18.
[107] Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André Biedenkapp, Difan Deng,
Carolin Benjamins, Tim Ruhkopf, René Sass, and Frank Hutter. “SMAC3: A versatile Bayesian
optimization package for hyperparameter optimization”. In: Journal of Machine Learning Research
23.54 (2022), pp. 1–9.
[108] Michael L Litvak, Lynda A Hutchins, Roger C Skinner, Bruce L Darlow, Robert C Wood, and
L John Kuest. “Prudhoe Bay E-field production optimization system based on integrated reservoir
and facility simulation”. In: SPE Annual Technical Conference and Exhibition? SPE. 2002,
SPE–77643.
[109] Rolf Johan Lorentzen, Aina M Berg, Geir Naevdal, and Erlend H Vefring. “A new approach for
dynamic optimization of waterflooding problems”. In: SPE Intelligent Energy International
Conference and Exhibition. SPE. 2006, SPE–99690.
[110] F Jerry Lucia, Charles Kerans, and James W Jennings Jr. “Carbonate reservoir characterization”.
In: Journal of petroleum technology 55.06 (2003), pp. 70–72.
[111] John L Lumley. “Coherent structures in turbulence”. In: Transition and turbulence. Elsevier, 1981,
pp. 215–242.
148
[112] Mohammadreza M. Khaninezhad and Behnam Jafarpour. “Prior model identification during
subsurface flow data integration with adaptive sparse representation techniques”. In:
Computational Geosciences 18 (2014), pp. 3–16.
[113] David JC MacKay et al. “Introduction to Gaussian processes”. In: NATO ASI series F computer and
systems sciences 168 (1998), pp. 133–166.
[114] Kiran Maharana, Surajit Mondal, and Bhushankumar Nemade. “A review: Data pre-processing
and data augmentation techniques”. In: Global Transitions Proceedings 3.1 (2022), pp. 91–99.
[115] Dale B McDonald, Walter J Grantham, Wayne L Tabor, and Michael J Murphy. “Global and local
optimization using radial basis function response surface models”. In: Applied Mathematical
Modelling 31.10 (2007), pp. 2095–2110.
[116] Bert Metz, Ogunlade Davidson, HC De Coninck, Manuela Loos, and Leo Meyer. IPCC special
report on carbon dioxide capture and storage. Cambridge: Cambridge University Press, 2005.
[117] Shahab Mohaghegh. “Virtual-intelligence applications in petroleum engineering: Part
1—Artificial neural networks”. In: Journal of Petroleum Technology 52.09 (2000), pp. 64–73.
[118] Marcin Molga and Czesław Smutnicki. “Test functions for optimization needs”. In: Test functions
for optimization needs 101 (2005), p. 48.
[119] Christopher Z Mooney. Monte carlo simulation. 116. Sage, 1997.
[120] Fadl Moukalled, Luca Mangani, Marwan Darwish, F Moukalled, L Mangani, and M Darwish. The
finite volume method. Springer, 2016.
[121] Raymond H Myers, Douglas C Montgomery, and Christine M Anderson-Cook. Response surface
methodology: process and product optimization using designed experiments. John Wiley & Sons,
2016.
[122] Cuthbert Shang Wui Ng, Ashkan Jahanbani Ghahfarokhi, and Menad Nait Amar. “Application of
nature-inspired algorithms and artificial neural network in waterflooding well control
optimization”. In: Journal of Petroleum Exploration and Production Technology 11.7 (2021),
pp. 3103–3127.
[123] Masoud Nikravesh. “Soft computing-based computational intelligent for reservoir
characterization”. In: Expert Systems with Applications 26.1 (2004), pp. 19–38.
[124] Dean S Oliver and Yuxin Chen. “Recent progress on reservoir history matching: a review”. In:
Computational Geosciences 15.1 (2011), pp. 185–221.
[125] Jerome E Onwunalu and LJ Durlofsky. “A new well-pattern-optimization procedure for
large-scale field development”. In: SPE journal 16.03 (2011), pp. 594–607.
[126] Stanley Osher, Martin Burger, Donald Goldfarb, Jinjun Xu, and Wotao Yin. “An iterative
regularization method for total variation-based image restoration”. In: Multiscale Modeling &
Simulation 4.2 (2005), pp. 460–489.
149
[127] Daniel Asante Otchere, Tarek Omar Arbi Ganat, Raoof Gholami, and Syahrir Ridha. “Application
of supervised machine learning paradigms in the prediction of petroleum reservoir properties:
Comparative analysis of ANN and SVM models”. In: Journal of Petroleum Science and Engineering
200 (2021), p. 108182.
[128] Yan Pan and Roland N Horne. “Improved methods for multivariate optimization of field
development scheduling and well placement design”. In: SPE Annual Technical Conference and
Exhibition? SPE. 1998, SPE–49055.
[129] Donald W Peaceman. Fundamentals of numerical reservoir simulation. Elsevier, 2000.
[130] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion,
Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al.
“Scikit-learn: Machine learning in Python”. In: the Journal of machine Learning research 12 (2011),
pp. 2825–2830.
[131] John Pendrel. “Seismic inversion—The best tool for reservoir characterization”. In: CSEG Recorder
26.1 (2001), pp. 18–24.
[132] E Peters, RJ Arts, GK Brouwer, CR Geel, Stan Cullick, Rolf J Lorentzen, Yan Chen, KNB Dunlop,
Femke C Vossepoel, Rong Xu, et al. “Results of the Brugge benchmark study for flooding
optimization and history matching”. In: SPE Reservoir Evaluation & Engineering 13.03 (2010),
pp. 391–405.
[133] Michael Prince. “Does active learning work? A review of the research”. In: Journal of engineering
education 93.3 (2004), pp. 223–231.
[134] Friedrich Pukelsheim. Optimal design of experiments. SIAM, 2006.
[135] Maithra Raghu and Eric Schmidt. “A survey of deep learning for scientific discovery”. In: arXiv
preprint arXiv:2003.11755 (2020).
[136] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. “Physics-informed neural networks: A
deep learning framework for solving forward and inverse problems involving nonlinear partial
differential equations”. In: Journal of Computational physics 378 (2019), pp. 686–707.
[137] Saman Razavi, Bryan A Tolson, and Donald H Burn. “Review of surrogate modeling in water
resources”. In: Water Resources Research 48.7 (2012).
[138] Jens Christian Refsgaard, Steen Christensen, Torben O Sonnenborg, Dorte Seifert,
Anker Lajer Højberg, and Lars Troldborg. “Review of strategies for handling geological
uncertainty in groundwater flow and transport modeling”. In: Advances in Water Resources 36
(2012), pp. 36–50.
[139] Mohammad Reza Rasaei and Muhammad Sahimi. “Upscaling and simulation of waterflooding in
heterogeneous reservoirs using wavelet transformations: application to the SPE-10 model”. In:
Transport in Porous Media 72 (2008), pp. 311–338.
150
[140] Ribana Roscher, Bastian Bohn, Marco F Duarte, and Jochen Garcke. “Explainable machine
learning for scientific insights and discoveries”. In: Ieee Access 8 (2020), pp. 42200–42216.
[141] Jose D Salas and Hyun-Suk Shin. “Uncertainty analysis of reservoir sedimentation”. In: Journal of
Hydraulic Engineering 125.4 (1999), pp. 339–350.
[142] Alireza Salmachi, Mohammad Sayyafzadeh, and Manouchehr Haghighi. “Infill well placement
optimization in coal bed methane reservoirs using genetic algorithm”. In: Fuel 111 (2013),
pp. 248–258.
[143] P Sarma and WH Chen. “Improved estimation of the stochastic gradient with quasi-Monte Carlo
methods”. In: ECMOR XIV-14th European Conference on the Mathematics of Oil Recovery. Vol. 2014.
1. European Association of Geoscientists & Engineers. 2014, pp. 1–25.
[144] Pallav Sarma, Wen H Chen, Louis J Durlofsky, and Khalid Aziz. “Production optimization with
adjoint models under nonlinear control-state path inequality constraints”. In: SPE Reservoir
Evaluation & Engineering 11.02 (2008), pp. 326–339.
[145] Morteza Sayarpour. Development and application of capacitance-resistive models to water/carbon
dioxide floods. The University of Texas at Austin, 2008.
[146] Morteza Sayarpour, C Shah Kabir, and Larry Wayne Lake. “Field applications of
capacitance-resistance models in waterfloods”. In: SPE reservoir evaluation & engineering 12.06
(2009), pp. 853–864.
[147] D Seifert and JL Jensen. “Using sequential indicator simulation as a tool in reservoir description:
Issues and uncertainties”. In: Mathematical Geology 31 (1999), pp. 527–550.
[148] Burr Settles. “Active learning literature survey”. In: (2009).
[149] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. “Taking the
human out of the loop: A review of Bayesian optimization”. In: Proceedings of the IEEE 104.1
(2015), pp. 148–175.
[150] Hossein Shamshiri and Behnam Jafarpour. “Controlled CO2 injection into heterogeneous
geologic formations for improved solubility and residual trapping”. In: Water Resources Research
48.2 (2012).
[151] Yun-Wei Shang and Yu-Huang Qiu. “A note on the extended Rosenbrock function”. In:
Evolutionary Computation 14.1 (2006), pp. 119–126.
[152] James J Sheng. “Critical review of low-salinity waterflooding”. In: Journal of Petroleum Science
and Engineering 120 (2014), pp. 216–224.
[153] Anirbid Sircar, Kriti Yadav, Kamakshi Rayavarapu, Namrata Bist, and Hemangi Oza. “Application
of machine learning and artificial intelligence in oil and gas industry”. In: Petroleum Research 6.4
(2021), pp. 379–391.
151
[154] Per Arne Slotte and E Smørgrav. “Response surface methodology approach for history matching
and uncertainty assessment of reservoir simulation models”. In: SPE Europec featured at EAGE
Conference and Exhibition? SPE. 2008, SPE–113390.
[155] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. “Practical bayesian optimization of machine
learning algorithms”. In: Advances in neural information processing systems 25 (2012).
[156] Pavel Solín. ˆ Partial differential equations and the finite element method. John Wiley & Sons, 2005.
[157] James C Spall. “Implementation of the simultaneous perturbation algorithm for stochastic
optimization”. In: IEEE Transactions on aerospace and electronic systems 34.3 (1998), pp. 817–823.
[158] Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. “Bayesian optimization
with robust Bayesian neural networks”. In: Advances in neural information processing systems 29
(2016).
[159] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.
“Dropout: a simple way to prevent neural networks from overfitting”. In: The journal of machine
learning research 15.1 (2014), pp. 1929–1958.
[160] R Mohan Srivastava. “An overview of stochastic methods for reservoir characterization”. In:
(1994).
[161] Andreas S Stordal, Slawomir P Szklarz, and Olwijn Leeuwenburgh. “A theoretical look at
ensemble-based optimization in reservoir management”. In: Mathematical Geosciences 48 (2016),
pp. 399–417.
[162] John C Strikwerda. Finite difference schemes and partial differential equations. SIAM, 2004.
[163] Bagus Sudaryanto and Yannis C Yortsos. “Optimization of fluid front dynamics in porous media
using rate control. I. Equal mobility fluids”. In: Physics of Fluids 12.7 (2000), pp. 1656–1670.
[164] Pejman Tahmasebi, Serveh Kamrava, Tao Bai, and Muhammad Sahimi. “Machine learning in
geo-and environmental sciences: From small to large scale”. In: Advances in Water Resources 142
(2020), p. 103619.
[165] Boxin Tang. “Orthogonal array-based Latin hypercubes”. In: Journal of the American statistical
association 88.424 (1993), pp. 1392–1397.
[166] Albert Tarantola. Inverse problem theory and methods for model parameter estimation. SIAM, 2005.
[167] Vahid Tavakoli. Geological core analysis: Application to reservoir characterization. Vol. 99. Springer,
2018.
[168] William Murray Telford, Lloyd P Geldart, and Robert E Sheriff. Applied geophysics. Cambridge
university press, 1990.
[169] Marco R Thiele, RP Batycky, and DH Fenwick. “Streamline simulation for modern
reservoir-engineering workflows”. In: Journal of Petroleum Technology 62.01 (2010), pp. 64–70.
152
[170] Robert Tibshirani. “Regression shrinkage and selection via the lasso”. In: Journal of the Royal
Statistical Society Series B: Statistical Methodology 58.1 (1996), pp. 267–288.
[171] Ilya Tsvankin, James Gaiser, Vladimir Grechka, Mirko Van Der Baan, and Leon Thomsen.
“Seismic anisotropy in exploration and reservoir characterization: An overview”. In: Geophysics
75.5 (2010), 75A15–75A29.
[172] GM Van Essen, MJ Zandvliet, PMJ Van den Hof, OH Bosgra, and Jan-Dirk Jansen. “Robust
waterflooding optimization of multiple geological scenarios”. In: Spe Journal 14.01 (2009),
pp. 202–210.
[173] Chunhong Wang, Gaoming Li, and Albert C Reynolds. “Production optimization in closed-loop
reservoir management”. In: SPE journal 14.03 (2009), pp. 506–523.
[174] G Gary Wang. “Adaptive response surface method using inherited Latin hypercube design
points”. In: J. Mech. Des. 125.2 (2003), pp. 210–220.
[175] G Gary Wang, Zuomin Dong, Peter Aitchison, et al. “Adaptive response surface method-a global
optimization scheme for approximation-based design problems”. In: Engineering Optimization
33.6 (2001), pp. 707–734.
[176] G Gary Wang and Songqing Shan. “Review of metamodeling techniques in support of
engineering design optimization”. In: International Design Engineering Technical Conferences and
Computers and Information in Engineering Conference. Vol. 4255. 2006, pp. 415–426.
[177] Honggang Wang, David Echeverría Ciaurri, Louis J Durlofsky, and Alberto Cominelli. “Optimal
well placement under uncertainty using a retrospective optimization framework”. In: Spe Journal
17.01 (2012), pp. 112–121.
[178] Lian Wang, ZhiPing Li, Caspar Daniel Adenutsi, Liang Zhang, FengPeng Lai, and KongJie Wang.
“A novel multi-objective optimization method for well control parameters based on PSO-LSSVR
proxy model and NSGA-II algorithm”. In: Journal of Petroleum Science and Engineering 196 (2021),
p. 107694.
[179] Pengju Wang, Michael Litvak, and Khalid Aziz. “Optimization of production operations in
petroleum fields”. In: SPE Annual Technical Conference and Exhibition? SPE. 2002, SPE–77658.
[180] Xiang Wang, Ronald D Haynes, Yanfeng He, and Qihong Feng. “Well control optimization using
derivative-free algorithms and a multiscale approach”. In: Computers & Chemical Engineering 123
(2019), pp. 12–33.
[181] Jacob K White et al. “A trajectory piecewise-linear approach to model order reduction of
nonlinear dynamical systems”. PhD thesis. Massachusetts Institute of Technology, 2003.
[182] Christopher Williams and Carl Rasmussen. “Gaussian processes for regression”. In: Advances in
neural information processing systems 8 (1995).
153
[183] Jia Wu, Xiu-Yun Chen, Hao Zhang, Li-Dong Xiong, Hang Lei, and Si-Hao Deng. “Hyperparameter
optimization for machine learning models based on Bayesian optimization”. In: Journal of
Electronic Science and Technology 17.1 (2019), pp. 26–40.
[184] Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu.
“Large scale incremental learning”. In: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition. 2019, pp. 374–382.
[185] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. “A
comprehensive survey on graph neural networks”. In: IEEE transactions on neural networks and
learning systems 32.1 (2020), pp. 4–24.
[186] Cong Xiao, Olwijn Leeuwenburgh, Hai Xiang Lin, and Arnold Heemink. “Non-intrusive
subdomain POD-TPWL for reservoir history matching”. In: Computational Geosciences 23 (2019),
pp. 537–565.
[187] Liang Xue, Yuetian Liu, Yifei Xiong, Yanli Liu, Xuehui Cui, and Gang Lei. “A data-driven shale gas
production forecasting method based on the multi-objective random forest regression”. In:
Journal of Petroleum Science and Engineering 196 (2021), p. 107801.
[188] Kenny Q Ye. “Orthogonal column Latin hypercubes and their application in computer
experiments”. In: Journal of the American Statistical Association 93.444 (1998), pp. 1430–1439.
[189] Berkay Yeten, Louis J. Durlofsky, and Khalid Aziz. “Optimization of well placement under
time-dependent uncertainty”. In: SPE Journal 8.03 (2003), pp. 200–208.
[190] Tjalling J Ypma. “Historical development of the Newton–Raphson method”. In: SIAM review 37.4
(1995), pp. 531–551.
[191] Junjie Yu and Behnam Jafarpour. “Active learning for well control optimization with surrogate
models”. In: SPE Journal 27.05 (2022), pp. 2668–2688.
[192] Ya-xiang Yuan. “A review of trust region algorithms for optimization”. In: Iciam. Vol. 99. 1. 2000,
pp. 271–282.
[193] Friedemann Zenke, Ben Poole, and Surya Ganguli. “Continual learning through synaptic
intelligence”. In: International conference on machine learning. PMLR. 2017, pp. 3987–3995.
[194] JHHW Zhang, HW Huang, and KK Phoon. “Application of the Kriging-based response surface
method to the system reliability of soil slopes”. In: Journal of Geotechnical and Geoenvironmental
Engineering 139.4 (2013), pp. 651–655.
[195] Kai Zhang, Gaoming Li, Albert C Reynolds, Jun Yao, and Liming Zhang. “Optimal well placement
using an adjoint gradient”. In: Journal of Petroleum Science and Engineering 73.3-4 (2010),
pp. 220–226.
[196] Alice Zheng and Amanda Casari. Feature engineering for machine learning: principles and
techniques for data scientists. " O’Reilly Media, Inc.", 2018.
154
[197] Fangning Zheng, Atefeh Jahandideh, Birendra Jha, and Behnam Jafarpour. “Geologic CO2 storage
optimization under geomechanical risk using coupled-physics models”. In: International Journal
of Greenhouse Gas Control 110 (2021), p. 103385.
155
Abstract (if available)
Abstract
Well control optimization plays a crucial role in closed-loop reservoir management, aiding in decision-making to enhance both the profitability and safety of reservoir operations. Achieving effective optimization typically requires the use of computationally expensive reservoir simulators to provide reliable predictions for specific control strategies. However, running a single forward simulation for a large-scale field can take days, or even weeks, making the process highly resource-intensive.
A popular approach to mitigate the prohibitive computational demands is to develop surrogate models, particularly those based on machine learning, which serve as fast predictive alternatives and can significantly accelerate the optimization process. However, several practical concerns must be addressed to ensure the reliability of surrogate-based optimization. These include ensuring that the model is interpretable and consistent with domain knowledge, maintaining the model's reliability throughout the optimization path, and efficiently managing geological uncertainties.
This dissertation introduces innovative and tailored optimization algorithms and workflows specifically designed for subsurface systems, with a particular emphasis on addressing the aforementioned challenges. In brief, a collaborative project with California Resource Corporation illustrates the development of a surrogate model that incorporates fundamental engineering insights. We also present an active learning framework aimed at significantly reducing computational load while improving the control optimization performance in applications such as waterflooding and carbon sequestration. Additionally, we design a novel reduced sampling method to effectively account for geological uncertainties.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimization of CO2 storage efficiency under geomechanical risks using coupled flow-geomechanics-fracturing model
PDF
Deep learning for subsurface characterization and forecasting
PDF
Deep learning architectures for characterization and forecasting of fluid flow in subsurface systems
PDF
Deep learning for characterization and prediction of complex fluid flow systems
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
PDF
Stochastic oilfield optimization under uncertainty in future development plans
PDF
Spectral optimization and uncertainty quantification in combustion modeling
PDF
Feature learning for imaging and prior model selection
PDF
Dynamics of CO₂ leakage through faults
PDF
Algorithm and system co-optimization of graph and machine learning systems
PDF
Geological modeling in GIS for petroleum reservoir characterization and engineering: a 3D GIS-assisted geostatistics approach
PDF
Hybrid physics-based and data-driven computational approaches for multi-scale hydrologic systems under heterogeneity
PDF
Sparse feature learning for dynamic subsurface imaging
PDF
Optimization of coupled CO₂ sequestration and enhanced oil recovery
PDF
A coupled multiphase flow-geomechanics framework for modeling material, kinematic and contact nonlinearities in faulted reservoirs
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Subsurface model calibration for complex facies models
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Impurity gas detection for spent nuclear fuel (SNF) canisters using ultrasonic sensing and deep learning
Asset Metadata
Creator
Yu, Junjie
(author)
Core Title
Efficient control optimization in subsurface flow systems with machine learning surrogate models
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Chemical Engineering
Degree Conferral Date
2024-08
Publication Date
08/29/2024
Defense Date
05/08/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
active learning,control optimization,geological carbon sequestration,machine learning,neural network,OAI-PMH Harvest,reservoir simulation,subsurface flow system,surrogate model,uncertainty quantification,waterflooding
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Jafarpour, Behnam (
committee chair
), Jha, Birendra (
committee member
), De Barros, Felipe (
committee member
)
Creator Email
junjieyu@usc.edu,yjjpersonal@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113999ZIB
Unique identifier
UC113999ZIB
Identifier
etd-YuJunjie-13449.pdf (filename)
Legacy Identifier
etd-YuJunjie-13449
Document Type
Dissertation
Format
theses (aat)
Rights
Yu, Junjie
Internet Media Type
application/pdf
Type
texts
Source
20240830-usctheses-batch-1204
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
active learning
control optimization
geological carbon sequestration
machine learning
neural network
reservoir simulation
subsurface flow system
surrogate model
uncertainty quantification
waterflooding