Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Machine-learning approaches for modeling of complex materials and media
(USC Thesis Other)
Machine-learning approaches for modeling of complex materials and media
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MACHINE-LEARNING APPROACHES FOR MODELING OF COMPLEX MATERIALS AND MEDIA by Serveh Kamrava A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (CHEMICAL ENGINEERING) May 2021 Copyright 2021 Serveh Kamrava Acknowledgments I am eternally grateful to have had Professor Muhammad Sahimi as my Ph.D. advisor, who always supported and encouraged my research independence. His endless support, guidance, immense knowledge, logical way of thinking, positivity and insight have been of great value to me. I am truly grateful for the support, guidance and inspiration I have received from Professor Rajiv Kalia as my Ph.D. committee member. I would also like to thank my other Ph.D. committee members, Professor Aiichiro Nakano, and Professor Felipe De Barros who provided useful guidance and feedback and support, which I truly appreciate. I am grateful to my parents for their love and support. Finally, I would like to thank my husband for his tremendous unconditional support, the one person who has always been by my side in this journey. You have not been only my husband but my best friend. You have always reminded me what is important in life. This work was supported by the American Chemical Society (ACS) and National Science Foundation (NSF), which I am grateful to both agencies. ii Table of Contents Acknowledgments ii List of Tables vii List of Figures viii Abstract xv 1 A review on applications of machine learning in ne-scale 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Articial intelligence, machine learning, and data analytics . . . . . . . . 4 1.1.2 Boosting algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.1 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2 Enhancing images of nano-materials by a hybrid stochastic and deep learning algorithm 33 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.1 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.2 Stochastic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3 Hybrid stochastic deep-learning algorithm . . . . . . . . . . . . . . . . . . . . . 41 iii 2.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.1 Comparison of the images' features . . . . . . . . . . . . . . . . . . . . . 48 2.4.2 Comparison of the images' morphology . . . . . . . . . . . . . . . . . . . 51 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3 Linking morphology of porous media to their macroscopic permeability by deep learning 56 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Background on deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 The methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.1 The Training Images and Generation of Their Realizations . . . . . . . . 60 3.3.2 Layers of the Convolutional Neural Networks . . . . . . . . . . . . . . . . 63 3.3.2.1 The convolutional and activation layers . . . . . . . . . . . . . . 64 3.3.2.2 The pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2.3 The fully connected layer . . . . . . . . . . . . . . . . . . . . . 66 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4 Quantifying accuracy of stochastic methods of reconstructing complex ma- terials by deep learning 76 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2 Microstructural descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3 The deep-learning methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4 The reconstruction methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.4.1 The CCSIM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.2 The SNESIM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.3 The SISIM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.4 The FILTERSIM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.1 A two-phase microstructure . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.2 A multiphase material . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.5.3 A continuous microstructure . . . . . . . . . . . . . . . . . . . . . . . . . 97 iv 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 Phase transitions, percolation, fracture of materials, and deep learning 101 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6 Machine Learning Algorithm for Identifying and Predicting Mutations in Amyotrophic Lateral Sclerosis 109 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.3.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7 Physics- and image-based prediction of uid ow and transport in complex porous membranes and materials by deep learning 122 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2.1 The porous membrane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2.2 Recurrent RU-Net structure . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.3 Training Datasets and the Network . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.3.1 The training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.3.2 The recurrent RU-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8 Simulating uid ow in complex porous materials: Integrating the governing equations with deep-layered machines 146 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 v 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Bibliography 164 vi List of Tables 2.1 Summary of the parameters used in the HSDL algorithm. . . . . . . . . . . . . . 47 2.2 Computed PSNR, SSIM and NIQE for the bicubic, regular deep learning, and HSDL algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3 Comparison between estimates of the eective porosity of the various images. . . 52 4.1 Comparison of the internal similarities I of the three algorithms. . . . . . . . . 94 4.2 Comparison between the external similarities E of the three algorithms and the I. 94 4.3 Comparison between the nal scores s of the three algorithms. . . . . . . . . . . 94 4.4 Comparison between the nal scores s for two algorithms. . . . . . . . . . . . . 97 4.5 The nal scores for the two algorithms. . . . . . . . . . . . . . . . . . . . . . . . 99 6.1 The confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2 Classier evaluation on training and test data . . . . . . . . . . . . . . . . . . . 116 7.1 The membrane's and uid's properties . . . . . . . . . . . . . . . . . . . . . . . 128 7.2 Details of the recurrent RU-Net structure. . . . . . . . . . . . . . . . . . . . . . 137 vii List of Figures 1.1 (a) The basic unit of an ANN. (b) A multilayer feedforward neural network . . . 5 1.2 (a) A typical architecture for convolutional neural networks, where a and b are the number of channels. (b) A zoomed-in view of the operations for a simple convolution. (c) The overall work ow for producing the feature maps . . . . . . 12 1.3 Illustration of the architecture of an autoencoder with fully connected layers . . 14 1.4 A simple RNN with one input, one hidden, and one output unit, and illustration of unfolded RNN across time steps . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5 Illustration of one LSTM memory cell, where represents a sigmoid function . . 20 1.6 Permeability predictions for (a) the regular CNN, and (b) physics-informed CNN (Wu et al. (2018)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.7 Permeability predictions for (a) the regular CNN, and (b) physics-informed CNN in a dilated porous medium (Wu et al. (2018)) . . . . . . . . . . . . . . . . . . . 23 1.8 Permeability predictions by the physics-informed CNN, against (a) the Kozeny- Carman equation, and (b) the numerical results obtained by the lattice-Boltzmann method (Wu et al. (2018)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.9 Comparison of the eective diusivity D e predicted by (a) CNN and LBM, and (b) CNN and the Bruggeman equation (Wu et al. (2019)) . . . . . . . . . . . . . 25 1.10 (a) An image of a porous material, and (b) the same image after pre-processing. (c) The eective diusivityD e predicted by the CNN with the pre-processed input image, and (d) by the physic-informed CNN (Wu et al. (2019)) . . . . . . 26 1.11 Predicted capillary pressure Pcow by the neural network versus calculated values using approach 1 (see the text) and (a) the training dataset 1, and (b) the training dataset 2 (Liu (2017)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 viii 1.12 Comparison of error for the predicted eective thermal conductivity using ana- lytical models and (a) the CNN, and (b) the SVR and GPR methods (Wei et al. (2018)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.13 Comparison between real samples and their realizations generated by the GANs (Liu et al. (2019b)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.14 The input training images and their realizations generated by the GAN (Mosser et al. (2017)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.15 (a) The utilized discriminator architecture in the CGAN. (b) Two realizations generated by the CGAN (Feng et al. (2019)) . . . . . . . . . . . . . . . . . . . . 30 1.16 Flow visualization for predicted concentration, velocity eld, and pressure and their comparison with the data (Raissi et al. (2019c)) . . . . . . . . . . . . . . . 31 1.17 Predicted versus exact values for (a) the lift, and (b) for the drag force (Raissi et al. (2019c)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.1 Schematic illustration of the CCSIM algorithm . . . . . . . . . . . . . . . . . . . 40 2.2 Illustration of the performance of the proposed stochastic method for generating realizations with dierent structures. The panel in (a) represents the input image, while the three panels in (b) show three realizations . . . . . . . . . . . . 40 2.3 Schematic illustration of the HSDL algorithm . . . . . . . . . . . . . . . . . . . 42 2.4 Flipping (a) the original image by (b) left-right; (c) up-down, and (d) up-down and left-right rotation. Resizing (e) the original image to (f), (g), and (h) larger scales. Noise in (i) the original image ltered by (j), (k), and (l) various Gaussian noises. Rotating (m) the original image at (n), (o), and (p) 45, 90, and 225. Cropping of (q) the original image at (r), (s), and (t) various locations . . . . . 44 2.5 The loss function (MSE) for the training (upper curve) and validation(lower curve) phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Comparison of (a) the reference image and (b) the low-resolution input with the image obtained by (c) regular deep learning; (d) the bicubic interpolation, and (e) the proposed HSDL algorithm. For a better comparison, a zoom-in portion is also shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 ix 2.7 Frequency distributions (histograms) for the low-resolution (LR) input, high- resolution (HR) reference, bicubic interpolation, and HSDL images . . . . . . . . 51 2.8 Comparison of the computed multi-point connectivity function p(r). Solid black and gray, dashed gray and dotted gray represent, respectively, the results for the reference high-resolution, HSDL, bicubic interpolation, and the original low- resolution images. The results for the original high-resolution image and the enhanced images produced by the HSDL algorithm one are practically identical and dicult to separate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.9 The computed auto-correlation functions. The notation is the same as in Fig.2.8 54 3.1 Illustration of some of 3-D images created by Boolean method . . . . . . . . . . 61 3.2 Schematic illustration of the CCSIM method along with various overlap regions OL and a candidate pattern that is selected based on the similarity of the OLs and the digitized image DI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3 (a): The original large 3D image, (b-d): Three stochastic models generated using the image in (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4 Schematic of a convolutional neural network . . . . . . . . . . . . . . . . . . . . 64 3.5 Illustration of some of the extracted feature maps. a The original input image as the training data; b 96 activated feature maps after the rst convolutional layer, and c four zoomed-in random feature maps from (b) . . . . . . . . . . . . . . . . 69 3.6 Illustration of some extracted feature maps in a the convolutional layer deep within the network (conv5 layer) and b four zoomed-in random maps from (a) . 70 3.7 Schematic representation of computing the permeability and generating its dataset for 3D models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.8 Transformed distribution of the permeabilities for all the porous media . . . . . 72 3.9 Loss function for training data and validation shown in red and black respectively 73 3.10 Estimated permeabilities versus actual (target) values for the training (left) and testing of the network (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.11 Estimated permeabilities versus actual (target) values for a sample of Fontainebleau sandstone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.1 Schematic representation of an autoencoder deep learning algorithm . . . . . . . 84 x 4.2 Comparison between the original digital image I of a material with a binary microstructure and the realizations generated by three stochastic reconstruction algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3 Comparison between (top) the computed chord-length density function and (bot- tom) multiple-point connectivity function p(h) for the digital image shown in Fig. 4.2 and its realizations generated by two reconstruction methods: [(a) and (c)] the CCSIM and [(b) and (d)] the SISIM. The black curves are the computed results for the I, while the colored areas indicate the uncertainty space for the realizations generated by the CCSIM and SISIM. . . . . . . . . . . . . . . . . . 92 4.4 Uncertainty space representation of the realizations of the digital image I of Fig.4.2, after they were processed by the deep-learning algorithm. The I is shown as a black circle at the center. The others are for the realizations generated by the CCSIM (blue, those located around I), SNESIM (red, those located at the bottom of the right image), and SISIM (green, those located on the left side of the right image) algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Comparison between the original digital image I of a multiphase material and the realizations generated by the two stochastic reconstruction algorithms. . . . 95 4.6 Comparison between (top) the computed chord-length density function, and (bottom) multiple-point connectivity function p(h) for the image shown in Fig. 4.5 and its realizations generated by two reconstruction methods: [(a) and (c)] the CCSIM and [(b) and (d)] the SNESIM. The black curves are the computed results for the I, while the colored areas indicate the uncertainty space for the realizations generated by the CCSIM and SISIM. . . . . . . . . . . . . . . . . . 96 4.7 Uncertainty space representation of the realizations of the digital image of Fig. 4.5 after they were processed by the deep-learning algorithm. The I is shown as a black circle at the center. The others are for the realizations generated by the CCSIM (blue, those located in the right of the middle image) and SNESIM (red, those located in the left of the middle image) algorithms. . . . . . . . . . . . . . 97 4.8 Comparison between the original digital image I of a material with a continuous microstructure and the realizations generated by two stochastic reconstruction algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 xi 4.9 Uncertainty space representation of the realizations of the digital image I of Fig. 4.8 after they were processed by the deep-learning algorithm. The I is shown as a black circle at the center. The others are for the realizations generated by the CCSIM (blue, those located in lower-left side of the right image) and FILTERSIM (red, those located in upper-right side of the right image) algorithms. 99 5.1 Comparison of the computed percolation probability P(p) in the square lattice with the DNN predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 Comparison of the computed percolation probability P(p) in the simple cubic lattice with the DNN predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3 Comparison of the computed bulk modulus K of a simple cubic lattice with the DNN predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4 Comparison of the computed ratio of the bulkK and shear moduli of a simple- cubic lattice with the DNN predictions. and are the force constants. . . . . 108 6.1 The eect of Non-ALS data size on percent signicant . . . . . . . . . . . . . . . 117 6.2 Parallel coordinates plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3 Saliency Maps (heat maps) for 24 variables in the ve datasets, used for testing the trained algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.1 (a) Schematic of convolutional neural network, and (b) recurrent neural network. 125 7.2 Schematic of an autoencoder network. The convolution block contains the con- volution layer, batch normalization, the ReLU activation function, and pooling layer. The same settings are used for the decoder (right). The encoder and de- coder blocks are represented by E and D, respectively. The subscripts indicate the corresponding layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.3 Schematic of (a) U-Net and (b) RU-Net. The residual blocks (lower gray blocks) represent the convolution layer, batch normalization, and the ReLU activation function, such that the input of the rst convolution layer is added to the nal results produced by the batch normalization. . . . . . . . . . . . . . . . . . . . . 131 xii 7.4 Schematic of the proposed deep recurrent residual U-Net (recurrent RU-Net). Here, x refers to the morphologies of the membrane provided by its 2D images, and y denotes the output (the pressure p and the uid velocities v x and v y ) for that morphology at four distinct times. . . . . . . . . . . . . . . . . . . . . . . . 133 7.5 (a) Computational grid based on which (b) the uid velocity eld/streamlines, and (c) the pressure eld are computed. . . . . . . . . . . . . . . . . . . . . . . 134 7.6 Comparison between the actual numerical values of the pressure p and the pro- posed ML method ^ p at four distinct times. The error produced by the ML relative to the ground-truth data are also shown as (^ pp). For better illustration and clarity, the data are all normalized for the pressure and velocity proles. . . . . . 138 7.7 Same as in Figure 6, but for the uid velocity. . . . . . . . . . . . . . . . . . . . 139 7.8 Comparison of the accuracy of the ML algorithm, as measured by R 2 and for the training and testing data, at many epochs for predicting (a) and (c) the pressure, and (b) and (d) the uid velocity. . . . . . . . . . . . . . . . . . . . . . 140 7.9 Comparison between the actual distribution of the pressure and those predicted by the proposed ML method for a random point at four distinct times. (a) t 1 ; (b) t 2 ; (c) t 3 , and (d) t 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.10 Same as in Figure 9, but for the uid velocity. . . . . . . . . . . . . . . . . . . . 142 7.11 Comparison between the actual ensemble-averaged proles of the pressure at four distinct times and the predictions. . . . . . . . . . . . . . . . . . . . . . . . 143 7.12 Same as in Figure 11, but for the uid velocity. . . . . . . . . . . . . . . . . . . 144 8.1 Schematic of the proposed PIRED network. E i andD i indicate the encoder and decoder blocks; 2 is the cost function, x i is the input, and the pressure P j and uid velocityjvj j are the output. . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.2 Comparison of RMSE for training (a) and test data (b) for the PIRED network. 154 8.3 Comparison of the RMSE for training (a) and test data (b) for the DDML network.155 8.4 Eect of data size on (a)R 2 score and (b) RMSE for PIRED network and DDML for the two developed networks, namely PIRED and DDML. . . . . . . . . . . . 155 8.5 Comparison of the predicted pressure ^ P with the numerically calculated P at four (dimensionless) times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 xiii 8.6 Comparison of the predicted uid velocityj^ vj with the numerically calculted values at four (dimensionless) times. . . . . . . . . . . . . . . . . . . . . . . . . 157 8.7 Comparison of predicted pressures and uid velocities with the numerical simu- lations in a randomly selected 2D image along a line perpendicular to the macro- scopic direction of ow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.8 Comparison of the actual and predicted permeabilities K of 300 2D images of the membrane, and 100 images of the sandstone. K is normalized according to (KK min )=(K max K min ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.9 Comparison of the actual ensemble-averaged uid velocities v (top) and those predicted by the PIRED (bottom), ^ v, in the polymeric network at four (dimen- sionless) times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.10 Comparison of the actual ensemble-averaged uid velocities P (top) and those predicted by the PIRED (bottom), ^ P , in the polymeric network at four (dimen- sionless) times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.11 (a) and (b) A 2D cut from the original 3D image of the polymeric membrane and sandstone, respectively. Black and white represent, respectively, the solid matrix and the pores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.12 Comparison of the predicted pressure ^ P with the numerically calcultedP at four (dimensionless) times in a randomly-selected 2D cut of the sandstone. . . . . . . 162 8.13 Comparison of the predicted uid velocities v with the numerically calculted values ^ v at four times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 xiv Abstract In recent years, signicant breakthroughs in exploring big data, recognition of complex pat- terns, and predicting intricate variables have been made. One ecient way of analyzing big data, recognizing complex patterns, and extracting trends is through machine-learning (ML) algorithms. The eld of porous media has also witnessed much progress, and recent progress in developing ML techniques has beneted various problems in porous media across disparate scales. Thus, it is becoming increasingly clear that it is imperative to adopt advanced ML methods for the problems in porous media because they enable researchers to solve many dif- cult problems. At the same time, one can use the already existing extensive knowledge of porous media to endow ML algorithms and develop novel physics-guided methods. First, a comprehensive review of the basic concepts of ML and advanced methods, known as deep-learning algorithms is provided in Chapter 1. Then, the applications of such methods to various problems in porous media are reviewed and criticized. In this chapter, a variety of problems related to porous media, starting from ne- to large-scale systems are reviewed carefully where the emphasis has been put on how ML can help to solve or facilitate long- standing problems. Accounting for the morphology of nanoscale materials, which represent highly heteroge- neous porous media, is a dicult problem. Although two- or three-dimensional images of such materials may be obtained and analyzed, they either do not capture the nanoscale features of the porous media, or they are too small to be an accurate representative of the media, or both. Increasing the resolution of such images is also costly. While high-resolution images may be used to train a deep-learning network in order to increase the quality of low-resolution images, an important obstacle is the lack of a large number of images for the training, as the accuracy of the network's predictions depends on the extent of the training data. Generat- xv ing a large number of high-resolution images by experimental means is, however, very time consuming and costly, hence limiting the application of deep-learning algorithms to such an important class of problems. To address the issue we have proposed a novel hybrid algorithm in Chapter 2, by which a stochastic reconstruction method is used to generate a large number of plausible images of a nanoscale material, using very few input images at very low cost, and then train a deep-learning convolutional network by the stochastic realizations. We refer to the method as hybrid stochastic deep-learning (HSDL) algorithm. The results indicate promising improvement in the quality of the images, the accuracy of which is conrmed by visual, as well as quantitative comparison between several of their statistical properties. The results are also compared with those obtained by the regular deep learning algorithm without using an enriched and large dataset for training, as well as with those generated by bicubic interpolation. Flow, transport, mechanical, and fracture properties of porous media depend on their mor- phology and are usually estimated by experimental and/or computational methods. The pre- cision of the computational approaches depends on the accuracy of the model that represents the morphology. If high accuracy is required, the computations and even experiments can be quite time-consuming. At the same time, linking the morphology directly to the permeability, as well as other important ow and transport properties, has been a long-standing problem. In Chapter 3, we develop a new network that utilizes a deep learning (DL) algorithm to link the morphology of porous media to their permeability. The input data include three-dimensional images of the porous material, hundreds of their stochastic realizations generated by a recon- struction method, and synthetic unconsolidated porous media produced by a Boolean method. To develop the network, we rst extract important features of the images using a DL algo- rithm and then feed them to an ANN to estimate the permeabilities. We demonstrate that the network is successfully trained, such that it can develop accurate correlations between the morphology of porous media and their eective permeability. The high accuracy of the network is demonstrated by its predictions for the permeability of a variety of porous media. Time and cost are two main hurdles to acquiring a large number of digital image I of the microstructure of materials. Thus, use of stochastic methods for producing plausible realiza- tions of materials' morphology based on one or very few images has become an increasingly common practice in their modeling. The accuracy of the realizations is often evaluated using two-point microstructural descriptors or physics-based modeling of certain phenomena in the xvi materials, such as transport processes or uid ow. In many cases, however, two-point cor- relation functions do not provide accurate evaluation of the realizations, as they are usually unable to distinguish between high- and low-quality reconstructed models. Calculating ow and transport properties of the realization is an accurate way of checking the quality of the realizations, but it is computationally expensive. In Chapter 4, a method based on machine learning is proposed for evaluating stochastic approaches for reconstruction of materials, which is applicable to any of such methods. The method reduces the dimensionality of the realiza- tions using an unsupervised deep-learning algorithm by compressing images and realizations of materials. Two criteria for evaluating the accuracy of a reconstruction algorithm are then intro- duced. One, referred to as the internal uncertainty space, is based on the recognition that for a reconstruction method to be eective, the dierences between the realizations that it produces must be reasonably wide, so that they faithfully represent all the possible spatial variations in the materials' microstructure. The second criterion recognizes that the realizations must be close to the original I and, thus, it quanties the similarity based on an external uncertainty space. Finally, the ratio of two uncertainty indices associated with the two criteria is consid- ered as the nal score of the accuracy of a stochastic algorithm, which provides a quantitative basis for comparing various realizations and the approaches that produce them. The proposed method is tested with images of three types of heterogeneous materials in order to evaluate four stochastic reconstruction algorithms. Percolation and fracture propagation in disordered solids represent two important problems in science and engineering that are characterized by phase transitions: loss of macroscopic connectivity at the percolation threshold pc and formation of a macroscopic fracture network at the incipient fracture point (IFP). Percolation also represents the fracture problem in the limit of very strong disorder. An important unsolved problem is accurate prediction of physical properties of systems undergoing such transitions, given limited data far from the transition point. There is currently no theoretical method that can use limited data for a region far from a transition point pc or the IFP and predict the physical properties all the way to that point, including their location. In Chapter 5, a deep neural network (DNN) is used for predicting such properties of two- and three-dimensional systems and in particular their percolation probability, the threshold pc, the elastic moduli, and the universal Poisson ratio at pc. All the predictions are in excellent agreement with the data. In particular, the DNN predicts correctly pc, even xvii though the training data were for the state of the systems far from pc. This opens up the possibility of using the DNN for predicting physical properties of many types of disordered materials that undergo phase transformation, for which limited data are available for only far from the transition point. Next, a machine learning (ML) method is proposed to classify the ALS and non-ALS variants based on 24 variables in ve dierent datasets. This represents a highly imbalanced classication problem, as a large majority of the data represents non- ALS variants. As such, classifying the data is a dicult problem. In Chapter 6, the proposed ML method classies the ve datasets with very high accuracy. In particular, it predicts the ALS variants with 100 percent accuracy, while its accuracy for the non-ALS variants range from 92.8 to 98 percent. The trained classier also identies nine most in uential mutation assessors that help distinguishing the two classes from each other. They are FATHMM score, PROVEAN score, Vest3 score, CADD phred, DANN score, meta-SVM score, phyloP7way vertebrate, metaLR, and REVEL. Thus, they may be used in future studies in order to reduce the time and cost of collecting data and carrying out experimental tests, as well as in studies with more focus on the recognized assessors. Although the morphology of porous membranes is the key factor in determining their ow, transport and separation properties, a general relation between the morphology and the physical properties has been dicult to identify. One promising approach to develop such a relation is through the application of a machine-learning (ML) algorithm to the problem. Over the last decade signicant developments in the development of the ML approaches have led to many breakthroughs in various elds of science and engineering, but their application to porous media has been very limited. In Chapter 7, a deep network is developed for predicting ow properties of porous membranes based on their morphology. The predicted properties include the spatial distributions of the uid pressure and velocity throughout the entire membranes, provided that the deep network is properly trained by using high-resolution images of the membranes and the pressure and velocity distributions in their pore space at certain points in time. The network includes a residual U-net for developing a mapping between the input and output images, as well as a recurrent network for identifying physical correlations between the output data at various times. The results demonstrate that the deep network provides highly accurate predictions for the properties of interest. Thus, such a network may be used for predicting ow and transport properties of many other types of porous materials, as well as designing xviii membranes for a specic application. Finally, a novel physics-informed ML algorithm is introduced in Chapter 8 for studying uid ow in ne-scale porous media. The network embeds the Navier-Stokes equations in its learning process. Because of the complexity of such systems, the high-resolution images showing the morphology of the systems used as the input to the network and outputs are the velocity and pressure data obtained by the Navier-Stokes equations over a time interval. ML algorithms are often criticized for being a black-box, where the governing equations embedded in the proposed network decrease the chaos and blindness within the network. Since the output data are a sequence of data and both input and output data are in form of images, the proposed network is a physics-informed recurrent encoder-decoder (PIRED). The developed PIRED network allows using less data for the training process compared to a data-driven network while providing highly accurate predictions. xix Chapter 1 A review on application of machine learning in ne-scale 1 1.1 Introduction The world is facing momentous challenges regarding climate change, natural hazards, water resources, and energy consumption. Facing such challenges and addressing them requires de- velopment of solutions to some very dicult problem problems. Among such problems are those related to characterization and modeling of porous media and understanding uid ow, transport, reaction, adsorption, and deformation there, at small scale. Although tremendous progress has been made, as the so-called big data, which are needed for the task, have been available historically and are being produced more than ever (Faghmous and Kumar (2014); Gupta and Nearing (2014); Mohaghegh (2018, 2017); Monteleoni et al. (2013); Sahimi (2011b); Tahmasebi et al. (2018); Vandal et al. (2017)), much remains to be done. Technological advances in data gathering, computational power, and demand for making high-resolution and accurate models, and cloud systems have made it possible to leverage big data and complex computa- tions. Some of such information (e.g., remote-sensing images) are, for example, collected daily, which results in producing a large volume of data. Unlike other elds, however, most of the data in porous media are often available publicly and, thus, by accessing them one can develop more 1 Portions of this chapter were published in the Review Article, Tahmasebi, P., Kamrava, S., Bai, T., Sahimi, M.: Machine learning in geo- and environmental sciences: From small to large scale. Advances in Water Resources 142, 103619 (2020). 1 ecient techniques. Traditionally, computational approaches and experiments have been used to characterize and model small-scale porous media, ranging from membranes to adsorbents, catalysts, and core-scale porous materials. Such methods can, however, be quite expensive and time-demanding, and their use at larger scales pose signicant diculties, or when the number of samples to be studied is very large. For example, extensive experiments are conducted in very long work ows in order to characterize and describe the heterogeneity of porous materials and compute their important properties, such as permeability, electrical conductivity, and sorption capacity. Furthermore, carrying out such experiments requires extensive and often expensive equipment and instrumentation and expertise, as well as an environment that yields reliable results. On the other hand, computational methods have also made substantial advances and they now can produce, with acceptable error, data that are comparable with the experimen- tal results. Moreover, compared to the experiments, computational methods provide a more controlled environment and produce accurate results over a smaller time scale, and often more economically. But these methods can still be expensive and, more importantly, the acquired experience from the previous calculations cannot easily be used for future computations, unless the problems in the past and future are closely linked. Therefore, ideally, a combination of experimental and computational modeling may be the most ecient way of characterization of porous materials at a small scale. Both experimental studies, particularly for large-scale problems, and computational methods require a deep understanding of the physics of the sys- tem, and the eect of various variables on the process of interest, if one is to make accurate predictions. Such methods may not be able to address accurate predictions for multiphysics problems if the necessary work ows have not been developed. For example, it is currently very dicult to study by experiments the eect of morphology, uid, temperature, and mineralogy on the deformation of porous media and materials, due to the necessity of calibrating vari- ous parameters. In a similar manner, coupling the physics ow, reaction, and deformation in a consistent and seamless computational framework remains very challenging. One important ap- proach to analyzing big data and using them in the modeling of porous media-related problems is based on the use of articial intelligence (AI). Large datasets are almost always necessary for proper evaluation, comparing ML algorithms, and assessing their performance. The need for big datasets in the ML contexts stems from the fact that the analytical examination (and comparison) of the most exible ML algorithms can range from very dicult to nearly impos- 2 sible. Thus, big data is not only necessary for developing a successful application of ML, but is also critical in verifying which ML performs better. Unlike the standard statistical methods that often do not perform well in dealing with big data, the ML methods are more eective when they use big data. Signicant progress has recently been made in the use of AI, machine learning (ML), and more recently the so-called deep learning (DL) for modeling of small-scale porous media. Brie y, DL is a subset of ML, and ML itself is a subclass of AI. AI is dened for most of those algorithms that try to solve a problem by integrating an algorithm with in- telligence. Thus, they can either perform work better than humans or they can intelligently solve a problem, while ML refers to those methods that can use the computer power to learn a specic job and make decisions. The DL algorithms, on the other hand, are developed to mimic the pattern recognition of the human brain. Moreover, due to the aforementioned is- sues, the ML techniques have emerged as promising alternatives for dealing with multi-physics processes. In particular, discovering latent patterns, extracting important features, and iden- tifying the connections between various variables when one deals with a large amount of data is not straightforward, if one relies on the more traditional approaches, whereas the use of the ML approaches exhibits much potential for helping one to make more informed and accurate decisions. Porous media problems, from those at the smallest scale { the nanoscale { to the largest at eld scales, are often endowed with big data that are spatially distributed, and can dynamically vary as well, which add another level of complexity (Mohaghegh (2018); Sahimi (2011b); Tahmasebi et al. (2018)). The growing availability of big data oers fertile grounds for the use of various ML and data-mining techniques (Kim (2016); Mohaghegh (2018)). Several key attributes distinguish the data in porous media from the common data in other elds. For example, unlike the regular image-recognition eld in computer science, the data in porous media are based on physical laws. As such, the ML techniques cannot sometimes be applied to such data directly, implying that such techniques should be reformulated and tailored to the existing data. All such issues, along with the explosion of data-sensing tools, as well as the continuing development of advanced computational algorithms, have provided the impetus for developing ML and automated methods that can address the aforementioned issues. The results obtained so far by the application of the ML in engineering and science have indicated the discovery of new physics and patterns that could not be discerned before or required such a signicant amount of time and resources as to render them practically impossible to use. Such 3 abilities of the ML techniques have aided the researchers to uncover the potential of big data and enhance our understanding of complex phenomena related to porous media, from small to the largest scale. In this chapter we have provided a review of the state-of-the-art methods in the ML methods. 1.1.1 Articial intelligence, machine learning, and data analytics Articial neural networks (ANNs) are computing systems that imitate the working mecha- nism of human brains. They can complete their tasks without being explicitly programmed to follow some specic rules (Chen et al. (2019b)). Generally speaking, the ANNs consist of parallelly operated elements named neurons that resemble the basic unit of the nervous sys- tem. Mathematically speaking, a neuron in an ANN serves as a nonlinear, parameterized, and bounded function, which receives in- put data from the outside world and produces or outputs its predictions, such as the value of a target variable. In this context, two dierent methods usually are applied to parameterize the function. The rst method parameterizes the input variable and forms a global input as a linear combination of the input variablex i , weighted by a parameterw i , named the weights, that signies the relative signicance of the input. The output y of the neuron is obtained as a nonlinear function of the global input, such that, y =f b + n X i=1 w i x i ! (1.1) where b is the bias, which is initially a random number. Eq. (1) is shown graphically in Fig.1(a) for a simple model and in Fig. 1 (b) for a more complex and multilayer network. A nonlinear transform function can also be parameterized. Given f as a Gaussian radial basis function, the output is calculated by y = exp n X i=1 (x i w i ) 2 2b 2 0 ! (1.2) where thew i are the coordinates of the multivariate normal distribution and a bias param- eter b 0 is assumed along with all directions. A neural network is represented mathematically as a nonlinear function of two or more neurons that can be connected through various ways, 4 and the structure of the network describes exactly the procedure for generating the output by combing and weighting the inputs. The most common and widely used ANN is the multilayer feedforward neural network. The feedforward architecture allows data to be transferred only from the inputs to outputs, with no loop appearing in the network. As shown in Figure 1(b), a multilayer feedforward neural network contains three types of layers, namely, the input, the hidden, and the output layers. The number of neurons in the input and output layer depends on the number of independent and dependent variables, respectively. For the hidden layer, the number of neurons is a hyperparameter and can be optimized in order to achieve better performance. This type of network is a so-called supervised learning algorithm (SLA), and is especially suitable for classication and regression. The task of the SLA is \learning" the rela- tion that maps the input data that produces the output. Such input-output pairs are usually referred to as the training data, which consist of a set of training examples that are pairs of an input object and the desired output value, based on which a function is inferred. After ana- lyzing the training data, the SLA produces a function that is used for mapping new examples. Generally, a loss function L(^ y,y) is rst dened to measure the dierence between the predicted value ^ y and the true value y. Then, an optimization algorithm is used to minimize the predicted error, the dierence between the two, by iteratively updating the weight and bias parameters. Input Layer Hidden Layers Output Layer + Figure 1.1: (a) The basic unit of an ANN. (b) A multilayer feedforward neural network The most commonly used and eective algorithm to train an ANN is backpropagation, introduced by Rumelhart et al. (1985). Based on the chain rule of dierentiation, the back- 5 propagation algorithm computes the derivative of the loss function L with respect to all the variables and parameters in the network. Then, the derivatives are utilized to update the weight and bias parameters according to a gradient descent algorithm Ruder (2016). Generally speaking, the backpropagation process usually involves three stages: the feedforward pass to calculate output based on the input data; backpropagation of the associated error, and, nally, updating of the weight and bias parameters. Mathematically, a training example x is selected at random from the dataset and is fed into the network. The value of each node in the hidden layers and the output layer is denoted byv j and ^ y k , respectively. To this end, the loss function L(^ y k ,y k ) is calculated in order to measure the dierence between the network output and real value for each output node k, for which one calculates k = @L (^ y k ;y k ) @^ y k l 0 k (a k ) (1.3) where l 0 k is the activation function for kth output node, and a k is the weighted sum or the input for the activation function. Given the value for each node in the intermediate prior layer, one calculates, j =l 0 j (a j ) X k k w kj (1.4) wherew k j is the weight from node j to k. Such a procedure is repeated to produce j for the jth node until the input layer. Each j computes the derivative of the loss function with respect to the jth node's activation function input. Given the values v j calculated in the forward pass, and the values j calculated in the backward pass, the derivative of the loss function L with respect to w jj 0 is computed by, @L @w jj 0 = j v j 0 (1.5) The parameter is then updated based on gradient descent: w t jj 0 =w t1 jj 0 @L @w t1 jj 0 (1.6) where is the learning rate that controls how much we adjust the weight with respect to the calculated loss gradient. Currently, the most widely used optimization algorithm is the 6 mini-batch gradient descent, which attempts to combine the strength of stochastic gradient descent (SGD) and batch gradient descent (BGD). The SGD updates the parameters after the computation on each training example for robustness, whereas the BGD updates the parameters after the evaluation of all the training examples for eciency. There are also many variants of the SGD, such as AdaGrad (Duchi et al. (2011)), AdaDelta (Zeiler (2012)), RMSprop (Tieleman and Hinton (2012)), and Adam (Kingma and Ba (2015)). The ANNs are reliable techniques when dealing with problems with an incomplete dataset, or with highly complex and ill-posed problems, where decisions are usually made based on intuition. They have been applied successfully to many problems in a number of elds (Bishop (1995)), and can also be used for functional approximation (Haque and Sudhakar (2002); Prasad et al. (2009)). The ANNs are actually nonlinear parametric function approximators that estab- lish a mapping between multiple inputs and a single output. They can also be used eectively to solve problems in pattern recognition, such as, for instance, in sound (George et al. (2013); Olmez and Dokur (2003)), image (Sharma et al. (2008); Zhou et al. (2002)) or video recognition (Bhandarkar and Chen (2005); Karkanis et al. (2001)). Such tasks can be completed without a priori denition of a pattern. In such cases, the ANNs learn to identify new patterns and have the ability to associate memories (Borders et al. (2017); Michel and Farrell (1990)), which means they can recall a pattern when given only a subset clue. The network structure for such applications is usually complex, with many interacting dynamic neurons. 1.1.2 Boosting algorithms Boosting derives from the computational learning theory (Alpaydin (2020); Freund (1995); Hastie et al. (2009); James et al. (2013); Schapire (1990)) and usually performs a classication task. It is one of the ensemble methods that aims to improve the prediction accuracy of a classier by converting multiple weak learners into a strong one, hence \boosting" it. The main principle is that a single weak learner has limited capabilities and is dicult to improve, but it can be combined with others to build a powerful and stable learner. AdaBoost (Freund and Schapire (1997)) is the rst successful boosting algorithm developed for binary classication. The weak learner (or the base learner) used in AdaBoost is a decision tree with one level, referred to as the decision stump. Generally speaking, AdaBoost algorithm 7 contains the following steps: Initially, a weak learner is trained with an equally weighted training dataset. Then, the distribution of the training data is adjusted based on the predicted results of the trained weak learner. Specically, the misclassied training data are associated with larger weights and, then, are used to train the next weak learner. Such steps are repeated until the number of weak classiers recaches a predened value. Finally, the weak learners are weighed and combined to form a strong learner. Gradient boosting (Friedman et al. (2000)) extended the method of regression. It works similar to a numerical optimization algorithm and iteratively searches for a new additive model that reduces the loss function at each step. Specically, an initial guess is made with a decision tree to maximally reduce the loss function. Then, at each step, a new decision tree is tted to the residual between the data and the predicted results and added to the previous model to update the residual. This step is repeated until a predened maximum number of iterations is reached. Note that the decision trees added to the model at prior stages are not modied. The added decision tree at each step is usually rescaled with a shrinkage parameter between 0 and 1. It is believed that a higher number of small steps provide higher accuracy than a lower number of large steps (Touzani et al. (2018)). In each iteration, instead of using the complete training dataset, a randomly selected subsample is used to t the decision tree, which can eciently reduce the computational cost. XGBoost (Chen and Guestrin (2016)), which stands for extreme gradient boosting, is a version of the gradient boosting machine but focuses on improving the computational speed and model performance. XGBoost uses a second-order Taylor's expansion for the objective function, rather than a full objective function for optimization, which allows faster computation. Moreover, it introduces some regularization terms to prevent overtting. Random under-sampling(RUS) boosting is used to classify imbalanced class data. RUS- Boost is a hybrid method that is based on under-sampling in AdaBoost algorithm. RUS- Boost approaches a new data set by randomly under-sampling the major class until a desired new training data set is achieved. RUSBoost has smaller new training dataset and is simpler compared to SMOTEBoost algorithms which both these characteristics makes it faster than SMOTEBoost (Seiert et al. (2010)). RUSBoost algorithm can be shown as what is presented in Algorithm 1. In algorithm 1,x i 2fXg, withfXg being the input, andy i 2fYg withY representing the 8 output class (which are either 1 or 0). The purpose of the algorithm is to evaluate the \class" or estimate the \probability"h m (x i ;y i )! [0; 1]. Initially, the weight of each sample is set to be 1=N, where N is the number of the training cases. One has M weak hypotheses, i.e., batches of input and output data with their own associated weights and, therefore, the algorithm is iterated M times. During each iteration various weak hypotheses are selected and tested and the error in the classication prediction associated with the hypothesis is calculated. Then, the weight of each example is adjusted such that misclassied examples have their weights increased, whereas the correctly-classied examples have their weights decreased. Therefore, subsequent iterations of boosting will generate hypotheses that are more likely to correctly classify the previously mislabeled examples. In practice, one rst applies the RUSBoost algorithm to remove samples from the majority class until a certain pre-set percentage P of the new temporary training dataset, S 0 m , with new weight distribution w 0 m belongs to the minority class. Next, S 0 m and w 0 m are passed to the next weak learner, the weak hypothesis h m is generated, and the pseudo-loss m , as well as the weight update parameter m , are computed. Then, the weight distribution for the next iteration,w m+1 , is updated and normalized. AfterM iterations, a weighted \vote" of the weak hypotheses forms the nal hypothesis, H(x), i.e. , the selection of data with the weights that generates the minimum error. 1.2 Convolutional Neural Networks Convolutional neural networks (CNNs), developed by Lecun (LeCun (1989); Lecun et al. (1998)), are feedforward neural networks since the data are passed from the input to the output only along the forward direction. The input data for the CNN usually have a grid-like topol- ogy, including time-series data (1D data), grayscale images (one channel with 2D inputs), color images (three-channel data with 2D inputs), and multidimensional time-varying data (3D; e.g. seismic data). Various architectures of CNN have been proposed in the literature, with most of them containing convolutional and pooling layers; see Figure 1.2. Generally speaking, one or more fully connected layers are connected after the convolutional and pooling layers. Convolutional layers extract the features representing the input data. Specically, the convolutional operation moves a lter (i.e., kernel) across the input layer in a certain order and 9 input : Training set S =fs 1 ;s 2 ;:::;s N g; minority class: y r 2 Y;s i = (x i ;y i ); where x i 2X, y i 2f1; 1g output: Final hypothesis H(x) Weak learners: weak learners h 1 ,. . . , h M : weak hypothesis by weak learner M: number of iterations P: percentage of accuracy in the minority class 1 Initialize w 1 (i) = 1 n for i2f1;:::;ng; 2 for m = 1 to M do 3 Create temporary training dataset S 0 m with new weight distribution w 0 m by random sampling; 4 if P i:y i =y r 1 N <P then remove samples from the major class; 5 Call weak learner; 6 Train weak learner h m with data in the new training set S 0 m and their weights w 0 m ; 7 Test h m on original dataset, h m (x i ;y i )2 [0; 1]; 8 Calculate pseudo-loss with w m(i) : m = P (i;y):y i 6=y w m(i) (1h m (x i ;y i ) +h m (x i ;y)); 9 Calculate updated weight parameter m = m 1m ; 10 Update weights: w m+1 (i) =w m(i) 1 2 [1+hm(x i ;y i )hm(x i ;y)] m ; 11 Normalize: w m+1 (i) :w m+1 (i) = w m+1 (i) P j w m+1 (j) ; 12 end 13 Output the nal hypothesis H(x)= arg max y P M m=1 h m (x;y) log 1 m ; Algorithm 1: RUSBoost (RUS embedded in AdaBoost) 10 generates a feature map. Each movement produces one unit in the feature map by calculating a dot product between the lter and a small local region of the input. Typically, there is more than one lter, and each of them is assigned to detect a specic feature. The dimension of the lter is usually much smaller than that of the input. The mathematical formulation for the output of a convolutional layer expressed by z i;j = k=M X k=1 l=N X l=1 w k;l x i+k1;j+l1 +b ! (1.7) where w is the lter with a size of MN, x is the input data or the output from previous layers, b is the bias for the current convolutional lter, and z is the output for the convolution operation. In each convolutional layer, a stride parameter S is dened to control the skip distance between consecutive lter moves. The input data are sometimes padded with zeros around the border in order to control the spatial size of the output. The size of zero-padding, P, is a hyperparameter, meaning that it should be optimized. For a 2D input with dimensions of W H and a lter of size F F, the size of the output from the convolutional operation or the size of the feature map is (W { F + 2P) / S + 1, (H { F + 2P) / S + 1. A nonlinear activation function,, is used to conduct an element-wise transformation after the convolution operation. Traditionally, the sigmoid and hyperbolic tangent functions were used to add nonlinearity, but they bring about a vanishing gradient problem to retard the training process (Kolen and Kremer (2010)). Glorot et al. (2010) found that a piecewise linear activation function, namely, a rectied linear unit (ReLU) can relieve that problem. ReLU, dened as maxf0;xg, where x is the input, is faster to compute and outputs a sparse representation, i.e., only about half of the hidden units are active and have non-zero outputs. The pooling layers aim at reducing the spatial resolution of the feature maps and canceling the in uence caused by distorting and translating the input data (Ranzato et al. (2007)). Average pooling is one of the popular functions that computes the mean value of a small local region in the feature maps and then passes it to the next operation. Recently, max-pooling has also been used for sending the maximum value of a small region to the next layer. Scherer et al. (2010) found that max-pooling technique is skilled in extracting the invariant features from image-like data, improving the generalization performance and converging much faster, hence outperforming the traditional subsampling operation. To extract the latent features with a 11 Input Convolution Filter Size: 5 5 Stride: 1 Max Pooling Size: 2 2 Convolution Filter Size: 5 5 Stride: 1 Max Pooling Size: 2 2 Flattened Fully Connected Fully Connected 2 1 2 0 1 3 0 2 3 -2 2 1 1 1 0 2 2 0 0 1 2 2 1 1 0 3 0 3 2 2 -2 1 1 1 0 2 2 0 0 1 2 2 1 3 0 3 2 1 1 1 0 2 2 0 0 1 1 0 2 -2 2 2 1 1 3 0 0 3 2 2 1 -2 1 1 0 2 2 0 0 1 2 2 1 3 0 1 3 0 2 1 1 2 1 -2 0 2 2 0 0 1 -4 4 3 0 5 -2 5 1 -2 Kernel/Filter Input data ReLU (Activation Layer) 0 4 3 0 5 0 5 1 0 Feature map Convolution Layer Input Zero padding 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 2 1 0 0 0 0 0 0 Filters plus Bias Feature map 0 0 0 0 1 0 0 1 0 0 0 -1 1 1 0 1 -1 0 1 0 1 1 0 0 1 2 1 0 0 -1 1 1 0 1 -1 0 1 3 2 1 0 2 2 4 4 Figure 1.2: (a) A typical architecture for convolutional neural networks, where a and b are the number of channels. (b) A zoomed-in view of the operations for a simple convolution. (c) The overall work ow for producing the feature maps 12 high degree of abstraction, multiple convolutional and pooling layers are connected, where the simple features extracted by lower convolutional layers are further processed by the following layers to produce more complex feature representations. The fully-connected layers at the nal layers aim to understand and utilize the extracted high-level features and complete the reasoning tasks, such as regression or classication (Zeiler and Fergus (2014)). The key idea of convolution operation, namely, sparse interactions, parameter sharing, and equivariant representations, is to reform the world of machine learning and make them closer to human's abilities (Bengio et al. (2016)). For example, the sparse interactions are achieved by reducing the size of the lter as they are smaller than that of input data and result in less memory volume and higher eciency of the computations. Parameter sharing indicates that instead of studying a dierent set of parameters for each location, only one set needs to be studied and, therefore, it further relieves the storage demanding for model parameters. Equivariant representations also imply that the way the output changes is the same as that of input data. The capability of CNN to learn hierarchical representations of context invariant features enables its application to image classication and recognition. There are many CNN-based deep architectures in the literature. LeNet (LeCun (1989); Lecun et al. (1998)), a pioneering work by LeCun, is used for handwritten digit recognition and later for reading zip codes, digits. AlexNet (Krizhevsky et al. (2017)) is similar to LeNet but much bigger and deeper with all the convolutional layers stacked together, which won the ILSVRC-2012 competition award. GoogleNet (Szegedy et al. (2015)) introduced a new architecture called Inception in which the depth of the network is increased, but with fewer parameters than the AlexNet. VGGNet (Wang et al. (2015)) does a thorough analysis of the depth factor in a ConvNet, and eciently controls the number of parameters by using very small 33 convolution lters in all the layers. ResNet (He et al. (2016)) is a learning framework in which the layers learn residual functions with respect to the inputs, instead of learning unreferenced functions. It was demonstrated that the residual networks are easier to optimize to gain much better accuracy. The aforementioned CNNs utilizes typically one or more fully connected layers between the last convolutional layer and the classier, in order to enable classication into a small number of classes. Currently, end- to-end fully CNNs, such as the U-Net (Ronneberger et al. (2015)) and SegNet (Badrinarayanan et al. (2017)), are preferred, both of which have been used for image segmentation. 13 1.3 Autoencoder Autoencoder is a widely used unsupervised learning algorithm, which attempts to accurately reconstruct or reproduce the input data. Autoencoder was rst introduced by Hinton and the PDP group to address the problem of `backpropagation without a teacher' by using the input data as the teacher (Rumelhart and McClelland (1987)). A typical autoencoder consists of an encoder, a decoder, and a latent layer; see Figure 1.3. Encoder Decoder Input Data Reconstructed Data Latent Space Encoded Data Input image Reconstructed image Figure 1.3: Illustration of the architecture of an autoencoder with fully connected layers Given the input data x2 R N , where N is the data dimension, the encoder function rst projects it onto a latent space h2 R L with a weight matrix W n , bias b n and an activation function () :R! [0; 1] : h = (W n x +b n ) (1.8) The decoder function then transforms the latent representation into the output z2 R N through an inverse mode with a weight matrix W h , bias b h and an activation function () : R! [0; 1] : z = (W h h +b h ) (1.9) The tied weight strategy W n = W h =W has usually been applied to simplify the network structure. Therefore, the parameters to be solved for are W,b n ,b h . Given the training samples, 14 the objective function to train an autoencoder is to minimize the cost function: arg min w;bn;b h L(x;z) (1.10) which measures the dierence between the input and reconstructed data by, for example, the mean-square error. The autoencoder is mostly used for reducing the dimensionality of input data and generating a latent representation of them while neglecting the eect of noise that may appear in the dataset. Specically, the implicit probability distribution of the data is estimated based on the training dataset, with new samples drawn from the inferred data distribution. The training data with high dimensionality is compressed to a low-dimensional latent representation, based on the assumption that the real data could be reconstructed from them with the help of the decoder. This assumption is based on the relative characteristics of the patterns of the output that are stored in the latent layer and interpreted by the model. There are dierent types of variants for autoencoder. Sparse autoencoder (Makhzani and Frey (2013)) adds regularization, the sparsity constraint, on the hidden neurons, or the latent representation. A neuron is dened as active when its output is close to one, and inactive when it is close to zero. A sparse autoencoder reduces the number of neurons being used by disabling some of them. The constraint is realized by a penalty named Kullback-Leibler (KL) divergence (Kullback and Leibler (1951)): D KL (j ^ j ) = log ^ j + (1) log 1 1 ^ j (1.11) where ^ j is the average activation of the jth hidden unit, with the average taken over the training set, and is the desired activation value (close to zero). A denoising autoencoder (Vincent et al. (2008)) was proposed for dealing with the small variations of the input data, which can generate the same result even if the input data are partially destroyed. In the training process, we rst corrupt the input data x with one of the two main kinds of corruption methods, binary noise, and Gaussian noise, in order to obtain a destroyed version ~ x. For binary noise, several data points are randomly picked, and their values are set to zero. Gaussian noise generates a set of random values following the Gaussian distribution and adds them to the original data. The destroyed input is sent to the vanilla autoencoder. Dierent from 15 the archetypal autoencoder, the outputs from encoder and decoder of variational autoencoder represent a sample drawn from a parameterized probability density function (Doersch (2016)). Given an example x, the probabilistic encoder produces a distributionq (hj x) (e.g., Gaussian) over the possible values of latent representation h, rather than a single point, where the model parameters and latent states sample this statistical distribution hq (hj x). The probabilistic decoderp (hj x) is a conditional generative model that computes the probability of generating x, given the latent variable h. A variational autoencoder then tries to approximate p (hj x) by the given distribution q (hj x). It is also of high signicance to keep the balance between the reconstruction accuracy and matching the Gaussian distribution. Therefore, the loss function consists of these items. The reconstruction accuracy is measured by the mean-square error and the dissimilarity between the distribution of the latent representation, while the Gaussian distribution is represented by the KL divergence. Variation autoencoder could produce samples that do not appear in the real training dataset. While the fully-connected autoencoder commonly ignores the spatial structure within the image, a convolutional autoencoder (Masci et al. (2011)) is proposed to handle this problem. The convolutional autoencoder is a combination of the vanilla autoencoder and convolutional layers. Some redundancy of the model parameters is introduced to force the learned representation to be global, i.e., covering the entire input data. Moreover, since the weights and bias are shared as the lters are moved over the input data, the spatial locality can be conserved. On the other hand, the basic autoencoders can be concatenated on top of each other in order to construct a deep-stacked autoencoder (Qi et al. (2014)), which consists of an input layer, several hidden layers, and an output layer. A stacked autoencoder inherits the benet of deep neural networks and can still learn a deep representation of the input. The rst hidden layer tends to learn the low-level features of the input data. The hidden layer that follows it typically produces the high-level features extracted from the pattern that appeared in the previous levels. Therefore, a hierarchical representation of input data is extracted. 1.3.1 Long Short-Term Memory Long short-term memory (LSTM) is a recurrent neural network (RNN) used mainly for pro- cessing time-series data. Originally discovered by Hopeld (1982) and extended by Williams et 16 al. (1986; see also, Nasrabadi and Choo (1992)), the RNNs are especially suited for modeling time-sequence data and the data with variable lengths. They have the ability to selectively deliver the information through successive steps while dealing with sequential data one element at a time. Unlike feedforward neural networks, the RNN has the so-called recurrent edges to form circles, which are self-connected from a node to themselves across time. Specically, at time t the node with recurrent edges receives input data points x (t) and hidden node values h (t1) from the previous state. The output ^ y (t) at each time t is calculated based on the hidden node values at the same time. Thus, the inputx (t1) at time (t { 1) in uences the output ^ y (t) at time t through the recurrent connections. Mathematically, the forward pass in a simple RNN is governed by following equations, h (t) = h x (t) +W hh h (t1) +b h (1.12) ^ y (t) = y W yh h (t) +b y (1.13) whereW hx is the weight between input and hidden node,W hh is the weight for hidden node between successive time steps, andW yh is the weight between hidden node and output. b h and b y are bias parameters representing an oset. h and y are activation functions, such as tanh and ReLU, to introduce nonlinearity. As shown in Figure 1.4, the network can be interpreted as noncyclic by unfolding it as a deep neural network with one layer per time step and shared weights across time steps. Using backpropagation, the unfolded network is trained over many time steps (Werbos (1990)). 17 Unfold Figure 1.4: A simple RNN with one input, one hidden, and one output unit, and illustration of unfolded RNN across time steps However, backpropagating errors through many time steps may give rise to vanishing gra- dients. Pascanu et al. (2013) described a comprehensive analysis of the vanishing gradient problems and, in order to address it, suggested adding a regularization term that imposes weights on the values, so as to prevent the gradient from vanishing. In particular, the LSTM (Hochreiter and Schmidhuber (1997)) was introduced to overcome this problem, which replaces the hidden unit with a memory cell that contains a node with a self-connected recurrent edge of xed weight one to guarantee the ow of gradient across many time steps. All elements of the LSTM are illustrated in Figure 1.5. The input node g (t) calculates a weighted sum of the current input datax (t) and previous hidden unith (t1) , and then runs through a tanh activation function: g (t) = tanh W gx x (t) +W gh h (t1) +b g (1.14) The input gatei (t) receives the same inputs as the input node, but with a sigmoid activation function for controlling the information ow: i (t) = sigmoid W ix x (t) +W ih h (t1) +b i (1.15) Specically, if its values are zero, then ow from the input node is cut o; if its value is one, all the ow is passed through. The forget gate f (t) , introduced by Gers (2001), also receives the same inputs as the input node and allows the network to ush the contents in the internal state: 18 f (t) = sigmoid W fx x (t) +W fh h (t1) +b f (1.16) The internal state s (t) is the heart of each memory cell with a linear activation function. It has the aforementioned recurrent edge with xed unit weight to avoid gradient vanishing. The internal state receives information from the input nodeg (t) and the previous internal state s (t1) under control of the input gate i (t) and forget gate f (t) : s (t) =g (t) i (t) +s (t1) f (t) (1.17) where is pointwise multiplication. The output gate o (t) and the internal state s (t) yield the value h (t) of the hidden unit at the current time step: o (t) = sigmoid W x x (t) +W oh h (t1) +b o (1.18) h (t) = tanh s (t) o (t) (1.19) Usually, the internal state runs through a tanh function to endow each cell's output with the same dynamic range as an ordinary tanh hidden unit. Currently, the ReLU activation function is more common due to its greater dynamic range. Generally, in the forward pass, the LSTM learns when to let activation operation into the internal state by the input and output gates. If the two gates are closed (their values are zero), activation is dormant in the memory cell and does not aect the output at intermediate time steps. In the backward pass, the constant recurrent edge of the internal state governs the ow of the gradient over many time steps, implying that the gates learn when to let error in and when to let it out. Therefore, the LSTM has a much better ability than the RNN to learn long-range dependencies. Various LSTM networks have been developed to process data in the real world. Gener- ally speaking, they can be divided into two categories: the LSTM-dominated networks that are mainly built by the LSTM cells for optimizing the connections of the inner LSTM cells in order to enhance network properties, such as stacked LSTM network (Saleh et al. (2018)), bidirectional LSTM network (Graves and Schmidhuber (2005)), multidimensional LSTM net- work (Graves et al. (2007)), graph LSTM network (Liang et al. (2016)), grid LSTM network 19 (Kalchbrenner et al. (2015)), and convolutional LSTM network (Shi et al. (2015)). Integrated LSTM networks, which consist of the LSTM layers and other components, such as the CNN, to integrate the advantages of dierent components, such as DBN { LSTM network (Vohra et al. (2015)), CFCC-LSTM network (Yang et al. (2018a)), C-LSTM network (Zhou et al. (2016)) and LSTM-in-LSTM network (Song et al. (2016)) have also been developed. Figure 1.5: Illustration of one LSTM memory cell, where represents a sigmoid function 1.4 Literature Review Given a large amount of data collected and stored over the past years, the ML has been widely used in several massive and complex data-intensive elds, such as geoscience, hydrology, energy resources, biology, medicine, and economy. Based on the application of the ML in other disciplines, we can get an initial impression about the kinds of problems for which the ML is well suited, and the state in which the ML is applied in various disciplines. In what follows we brie y describe the application of the ML to two broad categories of disciplines, namely, ne-scale systems involving porous media, and describe how the ML methods are applied to solve specic problems in each eld. Fine-scale porous media have only recently experienced signicant progress in the applica- tion of the ML methods. One reason is the availability of high-quality images, thanks to the 20 progress in imaging and non-destructive scanning. Furthermore, by the leap advances in char- acterization of complex materials (Sahimi (2003a); Torquato (2002)), as well as the emerging stochastic simulation (Tahmasebi (2018b); Tahmasebi and Sahimi (2015b, 2013, 2015a)), using advanced ML and deep learning (DL) methods are becoming more practical. Such methods can produce a vast amount of data that is necessary for training the neural network in a reasonable time. In this section, we put the emphasis on geomaterials and in particular recent studies based on the DL. Important properties of porous media, such as permeability and porosity, as well as those describing the state of a porous medium in which two uids are owing, such as saturation, are among the properties that determine the eciency of many other operations involving porous media. Estimation of such properties has been always a challenge to scientists, both experimentally and computationally. Clearly, the experimental methods can be costly and often need a considerable amount of time, which makes them dicult to use on a large number of samples and, hence, the uncertainty and in turn, an accurate estimation of the intrinsic properties of porous media is dicult to address. Similarly, computational modeling with numerous samples is also time demanding. Through recent advances in the AI techniques, as described earlier, the prediction of such important properties without carrying out time- consuming experiments is becoming possible. A review of the literature indicates that, as far as the application of the AI to various prob- lems involving small-scale porous media is concerned, a few specic types of AI methods have been utilized. Such studies are focused on, for example, image enhancement (Kamrava et al. (2019)), estimation such properties as the permeability, resistivity, and diusivity (Kamrava et al. (2020b); Tembely and AlSumaiti (2019); Wu et al. (2018)), porous media reconstruc- tion (Adams et al. (2018); Feng et al. (2019); Liu et al. (2019b); Mosser et al. (2017, 2018b); Shams et al. (2019); Tran and Tran (2019)), P- and S-wave velocities estimation (Karimpouli and Tahmasebi (2019)), segmentation (Karimpouli et al. (2019)), conditional simulation of three-dimensional pore models (Mosser et al. (2018a)), mapping between design variables and microstructures (Li et al. (2018); Yang et al. (2018b)), classication of surface wettability (Yun (2017)), microstructure synthesis (Fokina et al. (2019)), estimation of mechanical and ther- mal properties of porous materials (Avalos-Gauna and Palafox-Novack (2019); Pires de Lima (2019); Wei et al. (2018); Wu et al. (2019); Zhang et al. (2019b)), and using the ML ability for 21 estimation of other physical property (B elisle et al. (2015)). As mentioned earlier, another application of the DL to the ne-scale porous media has been estimating important ow and transport properties of porous media, such as permeability, resistivity, and diusivity. The idea is to use the DL's ability for unrevealing important and latent features in complex porous media and linking them to the properties and to reduce the computational time. In other words, although the training phase of the DL may require a large amount of time, it can perform the estimation for new samples in a matter of seconds once the training phase is concluded. Wu et al. (2018) used CNN to generate models of porous media and computed their per- meability. They noted that since the permeability is a sole function of the pore geometry, it should be possible to estimate it based on the geometry and its images using the CNN and avoid direct simulations or pore-network calculations. Their proposed algorithm was as fol- lows: (1) Generating a large number of 2D images using the Voronoi tessellation algorithm and calculating permeabilities of the generated images using the lattice-Boltzmann method, which were then used as the training dataset. (2) Performing the training of the CNN, and (3) testing the trained network with new images for predicting their permeability. They also added to the CNN some physical properties that aect the permeability in order to enhance the accuracy of the predicted permeability, but reported, however, that not all physic-informed CNN perform better than the regular CNN. They suggested that rather than training the network only with the target variable (permeability), if the porosity () and specic surface (s) area are added to the network, the results' accuracy will increase. Some of the permeability predictions generated by the proposed method are presented in Figure 1.6. The physic-informed method has higher accuracy than the regular CNN, while both predictions have about 10% error if we consider the lattice-Boltzmann-calculated permeabilities as the true values. They concluded that for some cases where the pores are dilated, physics-informed CNN does not provide more accurate predictions than the regular CNN, as shown in Figure 1.7. They also studied the eect of in- creasing the number of seeds N in the permeability prediction for material with dilated pores by physics-informed CNN. By seeds, we mean the cells generated on a plane for partitioning it by the Voronoi tessellation method. Figure 1.8 compares the permeabilities predicted by physics- informed CNN, the empirical Kozeny-Carman equation, and the numerical results obtained by the lattice-Boltzmann simulations. The CNN predictions agree with lattice-Boltzmann simu- 22 lations to within 10 percent. They are also much more accurate than the predictions provided by the empirical Kozeny-Carman equation. Figure 1.6: Permeability predictions for (a) the regular CNN, and (b) physics-informed CNN (Wu et al. (2018)) Figure 1.7: Permeability predictions for (a) the regular CNN, and (b) physics-informed CNN in a dilated porous medium (Wu et al. (2018)) Wu et al. (Wu et al. (2019)) used a CNN network to predict the eective diusivity D e of a porous material based on 2D images. They generated a large dataset using reconstruction methods and calculated their eective diusivity using the lattice-Boltzmann method, consid- ered as the true values. The generated images and the corresponding eective diusivities were used for training the network, which was then used to predict the eective diusivity for new images. The results are shown in Figure 1.9. When the true values of D e is less than 0.2 and the porosity is between 0.28 and 0.98, then, 95% of the predicted results by the proposed method are within 10% error from the LB methods and more accurate than the predictions by the eective-medium approximation of Bruggeman (1935). They also reported that when D e < 0.1, then the CNN predictions have high errors, approximately more than 30% (Wu et al. 23 (2019)), and since porosity is correlated with eective diusivity, it can be used to enhance the predictions by the CNN. Thus, they developed physics-informed CNN by adding the porosity of each porous material to the attened feature map of the last pooling layer in order to form the rst fully-connected layer on the CNN. Figure 1.8: Permeability predictions by the physics-informed CNN, against (a) the Kozeny- Carman equation, and (b) the numerical results obtained by the lattice-Boltzmann method (Wu et al. (2018)) Wu et al. (2019) also studied another type of CNN, namely, CNN with pre-processed input, where they removed the trapped and/or dead-end pores in the images of the porous structures, in order to further enhancing the CNN predictions. Figure 10 compares the D e predicted by the CNN and the LB method, as well as the results by the CNN and Bruggeman's equation, D e = " +1 (Wu et al. (2019)), where depends on the structure of porous media. They reported that the CNN can better predict D e for porous materials with complex structure than Bruggeman equation. They further expanded the CNN ability for predicting the eective diusivities over various ranges and reported that the absolute error for the predicted D e by CNN is smaller than 0.1 for all the ranges. The relative error for the predicted D e by the physic-informed CNN is approximately 12% less than those of the regular CNN (Wu et al. (2019)). 24 Figure 1.9: Comparison of the eective diusivity D e predicted by (a) CNN and LBM, and (b) CNN and the Bruggeman equation (Wu et al. (2019)) In another set of studies (Liu (2017); Liu et al. (2019b)), experimental tests using capillary tubes were carried out in order to measure the saturations, phase conductance, and two-phase capillary pressure and relative permeabilities. Drainage and imbibition experiments were carried out for generating the relative permeability curves. The data were then used as the training datasets in an ANN to predict the relative permeability and capillary pressure. The input parameters that were perceived as unnecessary were identied using sensitivity analysis. Two approaches were used in the study. In the rst, a neural network was used to predict only the capillary pressure curve, Pcow, using the input data that were related to the tube's cross-section geometrical properties. In the second approach, they predicted the threshold capillary pressures and water, oil relative permeabilities simultaneously using the neural net (Liu (2017)). 25 Figure 1.10: (a) An image of a porous material, and (b) the same image after pre-processing. (c) The eective diusivityD e predicted by the CNN with the pre-processed input image, and (d) by the physic-informed CNN (Wu et al. (2019)) A comparison of the results for P cow by the trained neural network, trained using both datasets, versus the calculated results from experiments using the rst approach is presented in Figures 1.11(a) and 1.11(b), respectively. The training datasets consisted of 3,000 random polygons with shape factor from 0 to 0.04 and 3,000 random polygons with shape factor from 0.04 to 0.07958. The two studies indicated that the results are better predicted when the second training dataset is used and that the predictions by the NN depend on the elongation factor of the system. It should be noted that the input data and models were designed for pores with circular cross-sections, and for pores with more realistic geometry the same type of accuracy may not be obtained (Liu (2017)). Wei et al. (2018) used the CNN for predicting eective thermal conductivities of porous ma- terials (Wei et al. (2018)). They used a quartet structure generation set for generating model porous materials and calculated their eective thermal conductivity using the LB method. They then compared various ML algorithms, such as a CNN, the SVR, and Gaussian process regression (GPR), for predicting the eective thermal conductivity. The results are shown in Figure 1.12. All the ML algorithms had better accuracy for predicting the eective ther- 26 mal conductivity than the Maxwell-Eucken (Maxwell, 1904; Eukan, 1932) and the Bruggeman models. Figure 1.11: Predicted capillary pressure Pcow by the neural network versus calculated values using approach 1 (see the text) and (a) the training dataset 1, and (b) the training dataset 2 (Liu (2017)) Figure 1.12: Comparison of error for the predicted eective thermal conductivity using analyt- ical models and (a) the CNN, and (b) the SVR and GPR methods (Wei et al. (2018)) A vast number of published papers in the eld of ne-scale porous media have been devoted to reconstruction, which is often achieved using GAN networks. Such studies are usually based on the methods in computer science and have not undergone considerable change. Therefore, we only mention some of the key results here. Cang et al. (2017) used a convolutional deep belief 27 network to produce stochastic models of porous media. Although they used the method to model complex materials, they can also be used for geomaterials as well. Several methods have also been used for reconstruction of the microstructure of porous materials. For example, GANs was used for reconstruction of an image of a relatively homogenous sample of Berea sandstone and a heterogeneous sample of Estaillades carbonate (Liu et al. (2019b)). Subsamples of 3D images of porous rocks were used for training the network. They also added nonlinear statistical information gained from 3D micro-CT images of actual rock in the training. The authors noted that GANs-generated reconstructed images have better quality for homogenous rocks than heterogeneous ones. The generated images by the GANs along with real samples are shown in Figure 1.13. The GANs were also for 2D and 3D image reconstruction using segmented volumetric images (Mosser et al. (2018b, 2017)). The accuracy of the model was veried using some of the statistical, morphological, and transport properties of the samples, such as the Euler characteristic, two-point statistics, and directional single-phase permeability. Some of the results are shown in Figure 1.14. It was noted that the GANs require graphics processing units (GPU) and large GPU memory, due to the usage of 3D datasets in the training process. Figure 1.13: Comparison between real samples and their realizations generated by the GANs (Liu et al. (2019b)) 28 Figure 1.14: The input training images and their realizations generated by the GAN (Mosser et al. (2017)) In the same group of the application of the GAN methods to porous media problems, Feng et al. used conditional GAN (CGAN), which is claimed to be an improved version of the GANs, for generating realistic images (Feng et al. (2019)). They noted that the GANs input data are only a noise distribution, such as Gaussian or a uniform noise from specic samples, whereas the CGANs input data are Gaussian noise and conditional data. The added noise is meant to add diversity in the generated output. Feng et al. mentioned that the CGANs is a deep CNN in which every neuron in a layer is connected to all other neurons in another layer, whereas neurons in a CNN are only connected locally between two adjacent layers with all the weights being similar in one layer. The loss function is used to minimize the error in both the Generator and Discriminator sections (Feng et al. (2019)). The CGAN is schematically shown in Figure 1.15. 29 Figure 1.15: (a) The utilized discriminator architecture in the CGAN. (b) Two realizations generated by the CGAN (Feng et al. (2019)) The DL methods have also been used widely to address important problems in uid me- chanics (Raissi et al. (2018, 2019a,c)). For example, the DL is used to encode incompressible Navier-Stokes equations, integrated with the structure's dynamic motion equation, in order to study vortex-induced vibrations (Raissi et al. (2019c)). The proposed method enables velocity and pressure quantication from ow snapshots in small subdomains. Raissi et al. used the DL to predict uid's lift and drag forces on the structure based on very limited information on the velocity eld or snapshots of dye visualization. They applied the Navier-Stokes-informed deep neural networks to predict equation for the structure's dynamic motion, using their method for three scenarios: rst, with given acting forces on the body they predicted the structure's motion. Second, with the given velocity eld and the structural motion at a limited location in space-time, they determined the lift, drag forces, the pressure eld, the entire velocity eld, and the structure's dynamic motion. Third, with given concentration data in space-time, they obtained all ow elds components and structure's motion, and the lift and drag forces. Some of their results for predicting the concentration of passive scalar, velocity eld, and pressure, lift and drag forces by their suggested algorithm are shown in Figures 1.16 and 1.17, respec- 30 tively. The authors noted that their algorithm can accurately (order of 10-3) reconstruct the velocity eld, the concentration of passive scalar, and pressure without sucient data on these quantities (Raissi et al. (2019c)). Figure 1.16: Flow visualization for predicted concentration, velocity eld, and pressure and their comparison with the data (Raissi et al. (2019c)) Figure 1.17: Predicted versus exact values for (a) the lift, and (b) for the drag force (Raissi et al. (2019c)) 31 1.5 Summary The recent tremendous progress in the eld of AI and ML is undeniable. The advancement is indebted to the signicant growth of computational power and the availability of advanced statistical methods, which enable the ML methods to deal with more realistic problems. The ML methods can be applied to large and complex datasets, such as images, signals, multivari- ate data, and so on. Due to the judicious application of ML our understanding of complex phenomena involving big data has shifted and improved signicantly. For example, there are numerous examples in porous media wherein the available data are very complex, so much as that their interpretations require a lot of time without any guarantee that the hidden informa- tion can be identied. The eld of porous media are now in the era of big data that may be way before other elds, due to the necessity of collecting several datasets from various sources. Thus, several spatial and spatio-temporal methods have already been developed in these elds that are being used in other major research elds. The recent AI techniques, however, have brought a commonality to all the elds through which they all can be connected and helped to make further progress. In this chapter we attempted to review most of the recently developed methods. 32 Chapter 2 Enhancing images of shale formations by a hybrid stochastic and deep learning algorithm 1 2.1 Introduction Natural porous media and materials, as well as many synthetic ones, are heterogeneous. In particular, large-scale (LS) porous media are highly disordered over many distinct length scales, from pore to core to much larger scales. Thus, robust characterization of the morphology of porous media, i.e., the spatial distributions of their porosity and the connectivity of their permeable zones, has been a long-standing problem of great interest that has been studied for a long time (Sahimi (2011a); Blunt (2017b)). Accurate characterization of the morphology of porous media will not only shed light on the structure of their complex pore space but will also lead to accurate estimates of their eective ow, transport, reaction, and elastic properties. With the considerable advancements in instrumentation and measurement techniques, char- acterization of porous media and materials has made signicant progress over the past decade or so. In particular, progress in non-destructive two- and three dimensional (3D) imaging has made it possible to extract more information on the complexity and heterogeneity of vari- ous types of porous media (Brandon and Kaplan (2013); Herman (2009); Kak et al. (2002)). 1 This chapter was published as a paper in Neural Networks 118, 310{320 (2019). 33 High-resolution focus ion-beam scanning electron microscopy (FIB-SEM) has also become an essential part of characterization of the microstructure of porous materials, as it reveals impor- tant information regarding their morphological properties (Javadpour et al. (2007); Park et al. (2003); Tahmasebi (2018c); Tahmasebi et al. (2017, 2015a,b)). High-resolution 3D images are not, however, always accessible and, moreover, the scanning process is typically time consuming. It may also not be economical, because one typically requires at least several images, which is costly to obtain. Furthermore, high resolution FIB- SEM can only be used with small-scale samples, which may not be able to capture the essentials of a representative pore-size distribution and other properties of a porous sample. Scanning electron microscopy (SEM) provides accurate 2D images of porous media, as it is endowed with the exibility of eld-of-view and resolution. Among other methods for obtaining high- resolution 2D or 3D images, computed tomography (CT), a valuable technique that provides accurate representation of the internal structure of porous material, is increasingly being used (Agar and Geiger (2015); Jiang et al. (2013)). Thus, use of 2D and 3D images for modeling of various types of porous media has been increasing (Arns et al. (2018); Tahmasebi et al. (2017); Tahmasebi (2018c); Wang et al. (2018b,a)). Shales constitute one of the most complex types of porous media with a multiscale mor- phology. Due to their ubiquity, they have attracted much attention as the main unconventional source of fossil energy. In addition to their signicance as a new source of energy, the com- plexity of shales' pore space has also made them the target of a wide range of research, as well as the motivation for the development of new methods of characterizing their morphology, giving rise to a highly active current research eld. For example, depending on their location, the morphology of shales can vary widely. They typically present a wide and skewed pore-size distribution, with the sizes of the pores varying from nano- to micro- and mesoscales, which implies that their accurate measurement is dicult. Shales also contain fractures, either nat- ural or hydraulically induced, and the nanopores aect hydrocarbon storage and uid ow to the micropores and the fractures. At the same time, accurate estimation and/or measurements of shales' properties is essential to evaluating their gas-storage capacity, and ow and production optimization. High-resolution images of shales contain much details of their morphology, due to their high density of the pixels (voxels in 3D). Therefore, it is important to have high-resolution 2D or 3D images of 34 shales, in order to be able to characterize their pore space and extract important information about the porosity, permeability, minerology, total organic carbon (TOC), and other properties. This is more easily said than done, however. One way of addressing this problem is through image enhancement. But, in order for the enhancement to be accurate, one needs an eective approach. In this chapter we propose that deep learning can be a powerful tool for this purpose. In recent years articial intelligence and deep-learning algorithms have found a wide array of applications. Among such applications, enhancing the quality of images by learning algorithms has had much success. A key reason for the success is the existence of an automatic feature extraction, which is done by training the algorithms with raw data. Deep-learning methods are also considered as algorithms by which nonlinear modules are used in order to form multiple levels of representation. The levels begin from the input data and slowly reach more abstract features. They make the learning process easier for a large number of complex functions, which were previously dicult for the earlier articial intelligence methods to understand. More importantly, and related to what we study in this chapter, is the fact that the power of deep-learning methods has been demonstrated for applications in which one is faced with analyzing large data sets (LeCun, Yann, Yoshua Bengio et al. (2015)). In such situations, such as, for example, when one has multiple arrays of data and complex images, convolutional neural networks (CNNs) are the preferred choice. The main characteristics that make the CNNs distinct are their use of shared weights and biases, pooling, local connections and existence of many layers in the NN. A convolutional layer identies local continuity of features from the previous layers and combines them together. The reason for using such architecture is the assumption that some of the features are repeated throughout the image, and that their local statistics are the same, meaning that the same patterns share the same weights and (LeCun, Yann, Yoshua Bengio et al. (2015)). In other words, the system that one analyzes is spatially stationary. It is this feature that makes deep learning an attractive method for analyzing problems on porous media. Use of deep-learning methods can, however, be hampered by lack of large data sets for net- work training. One way to address the problem is through enhancement of the images that are available for a given porous medium, in order to expand the data set. The enhancement enables one to better estimate the physical properties of a porous medium, such as its permeability, the key property for characterizing uid ow in its pore space. Therefore, using more robust and 35 accurate image enhancing is crucial. Enhancing images of porous media has been pursued in a few studies in the past that were, however, based on either statistical methods (Gerke et al. (2015); Julleh et al. (2011); Tahmasebi (2018c); Tahmasebi et al. (2015a); Arns et al. (2018)), or a poor training database (Arns et al. (2018)). In this chapter, we address the problem of lack of adequate amount of data by proposing a hybrid method. In the proposed approach a stochastic reconstruction method is used to generate numerous images of plausible realizations of a porous medium as the input data based on a few initial images. It then uses various lters in order to generate further variations in the stochastically generated images. The diverse dataset is then used to train the deep-learning network by linking the high- and low-resolution segments of the images. Practically speaking, preparing a single high-resolution image of shale samples takes a considerable amount of time { at least a few days { which includes cutting, cleaning and drying the sample, assembling the core and core holder, and imaging and processing the raw data. Aside from being very costly, the process is also very time consuming. Furthermore, the imaging tools typically have their limitations in terms of capturing the ne-scale information that are vital for evaluating the physical properties. In this chapter we combine a physics-based approach for reconstructing models of porous media with deep learning to develop an application of the latter in rock physics. The discussion on the structure of Convolutional neural network can be found in chapter 1. The rest of chapter two is organized as follows. In the next section we discuss brie y the background on deep learning and stochastic modeling. In Section 2.3 the hybrid method that consists of the stochastic modeling of porous media and deep-learning methods is described and utilized to generate various images. Next, the high- and low-resolution images are compared based on several statistics and morphological characteristics of the porous media. This chapter is summarized in the last section. 2.2 Methodology Enhancing low-resolution and noisy images has been addressed in the literature (Glasner et al. (2009); Kim et al. (2015); Park et al. (2003)). The goal has been increasing the image quality by minimizing the mean-square errors (MSE) between the generated high-resolution image and 36 the original low-quality version (Ledig et al. (2016)). Given the available data, the problem can be addressed by using either a single image or several of them in order to produce enhanced images. When only a single image is used, the result is not usually very accurate because it does not contain the high-frequency contents, which is clearly due to the fact that a low-quality image that does not contain the necessary information for enriching the image further. In addition, such methods produce images whose quality is limited to small improvements relative to the original low-resolution image. Moreover, the problem as described is ill-posed in the sense that a single low-resolution image can generate many high-resolution ones. The problem is more complex when the high-quality image must be generated based on a low-resolution image that has little to no details of the high-frequency features. An important problem is upscaling of models of porous media in which a high-resolution model involving several millions of blocks in the computational grid that represents the porous media is coarsened to produce another model that requires less intensive computations, but is just as accurate as the original high-resolution model (Ebrahimi and Sahimi (2004); Mehrabi and Sahimi (1997); Reza Rasaei and Sahimi (2008)). Clearly, a high-resolution image must be represented by a ne resolution computational grid that requires, however, a considerable amount of computational time. An accurate upscaling method (see, for example, (Rasaei and Sahimi (2009b,a)) preserves the important information in the high-resolution computational grid and coarsens the rest. On the other hand, at larger length scales the heterogeneity of the pore space is much more severe, but information about it is missing in the low-resolution image. 2.2.1 Deep learning Although machine-learning methods have had many successful applications to various problems, their application to modeling of porous media has not been as fruitful, particularly in cases in which the data are represented by images, due to the limited ability of the methods for processing natural data. For example, pixel values in an image must rst be transformed into a feature vector that can be detected by the method. Then, a new class of machine-learning algorithms { the deep-leaning methods { was developed in early 2006 that are trained by multiple levels of representation in order to model complex correlations between the various input data. Raw data can then be used as the input to accomplish such important goals as 37 detection or classication of certain features in data sets. Deep feed-forward networks, which are also known as the CNNs, represent typical deep-learning methods (Dahl et al. (2013); Deng and Dong (2014); Kim (2016); LeCun, Yann, Yoshua Bengio et al. (2015); Schmidhuber (2015); Wu et al. (2014)). In the context of improving the quality of images, deep learning algorithms learn the map- ping between low- and high-resolution images that dier only in the high-frequency details through the training process. Indeed, the performance of every method in this eld depends on how the network is trained, which is strongly controlled by the size and diversity of the data set provided for the training (Dong et al. (2016)). Therefore, one prominent issue in using such a method is its requirement for having many high-resolution images for training the network (Dong et al. (2016); Johnson et al. (2016); Kim et al. (2015); Mao et al. (2016)). On the other hand, having large sets of high-resolution images for complex porous media is not feasible, because obtaining them is time consuming and costly. To alleviate the issue, we propose a method to use a limited number of high-resolution images, as few as one image, to generate a diverse large dataset for the training phase of the deep learning. Previous studies that utilized CT images with deep-learning methods had either focused on optimizing the parameters used in the network, or on evaluating the quality of the output image (Hagita et al. (2018); Wang et al. (2018b)). To our knowledge, however, deep-learning algorithms using a limited number of actual images have not been proposed or implemented. 2.2.2 Stochastic modeling The main purpose of using a stochastic modeling is increasing the number of training images and diversifying the patterns. For this aim, the cross correlation-based simulation (CCSIM) algorithm is used, which has been shown to successfully model various complex 2D and 3D porous media based on their images (Tahmasebi and Sahimi (2012, 2013, 2016a,b)). The CCSIM represents the digintal image DI, or the porous medium to be modeled, by a computational grid G, partitioned into overlapped blocks of sizes T x T y , where G and DI have the same sizes. The neighboring blocks share overlap regions OL with sizes ` x ` y ; see Figure 2.1. One then begins from a corner block of G (or any other block) and visits each grid block along a one-dimensional raster path. For each grid block a pattern of heterogeneity from DI 38 is selected randomly and inserted in the visiting block. We refer to the inserted pattern as the data event D T , with the word \event" implying that the inserted pattern of the heterogeneity in the block may change again (see below). Then, the next pattern is selected based on the similarity between the neighborhood points and the DI, meaning that, instead of considering all the previously constructed blocks (by lling them with patterns of heterogeneity from DI), only those in the neighborhood of the current blocks are used for the calculations. Next, the similarity between, or closeness to, the neighboring blocks and the DI is quantied based on the a cross-correlation function that represents a convolution between DI and D T (x;y): (i;j;x;y) = `x1 X x=0 `y1 X y=0 DI(x +i;y +j)D T (x;y); (2.1) with i2 [0;T x +` x 1) and j2 [0;T y +` y 1), Thus, an overlap region of size ` x ` y between two neighboring blocks and a data event D T are used to match the patterns in the DI. The overlap region contains a set of pixels or voxels that we pick from the previously constructed blocks and utilize them in Eq. (1) for identifying the next pattern of heterogeneity. For Euclidean distance (dierence) between the constructed block and the data to be minimum, (i;j;x;y) must be maximum or, in practice, exceed a preset threshold. After calculating (i;j;x;y) and selecting those patterns for which (i;j;x;y) exceeds the preset threshold, one of the acceptable ones is selected at random and inserted in the block currently being visited in G. The process is repeated until all the blocks of the grid G have been reconstructed. As a rule of thumb, the neighboring regions might have an overlap of size of about 1/5 - 1/6 of the size of the blocks. Large grid blocks increase the computations, as computing (i;j;x;y) would require considering many points, whereas small regions may cause discontinuity in the patterns. To demonstrate the accuracy of the CCSIM method, we generated three realizations of an image. The results are shown in Figure 2.2, indicating that the proposed method can utilize the given 2D image and produce the realizations that are not only dierent from each other, but also from the input image, while their structures are statistically similar. Thus, using this strategy, one can produce a large number of realizations for training as the input. 39 ۲ ் Candidate pattern Training Image One realization Figure 2.1: Schematic illustration of the CCSIM algorithm Figure 2.2 also indicates tne advantage of using the CCSIM algorithm, namely, the ease of extending it to 3D images. The computations associated with the CCSIM method may also be carried out in the frequency domain, which would result in signicant acceleration of the calculations (Tahmasebi et al. (2014)). Thus, the algorithm may be used straightforwardly for generating thousands of images in a reasonable time. (a) (b) Figure 2.2: Illustration of the performance of the proposed stochastic method for generating realizations with dierent structures. The panel in (a) represents the input image, while the three panels in (b) show three realizations 40 2.3 Hybrid stochastic deep-learning algorithm Schematic representation of the proposed hybrid stochastic deep-learning (HSDL) algorithm is presented in Figure 2.3. As already pointed out, large data sets are essential to the accuracy of deep learning (Angermueller et al. (2016)), but it is not always possible to have such sets. Thus, to remedy this, we used a limited number of images - 30 2D images with a size of 500 500 - with the stochastic CCSIM algorithm in order to generate 2000 plausible realizations of the same, which are used as the training images in the deep-learning algorithm. The size of 500 of the output realizations was 1000 1000, with the rest having the same size as the original images. Although the success of any learning algorithm depends heavily on the number and diversity of the images that are used for its training, using similar images may in fact cause overtting. To diversify, the 2000 training images were transformed using various lters, such as the Gaussian noise, scale, crop, rotation, and ip to increase the diversity between the training data sets. The details are as follows. The images were randomly divided into 20 sets. Gaussian noise with three dierent variances was applied to the images in three sets. Two sets with images of size 500 500 were enlarged and then cropped back to the original size of the DI, 500 500. Similarly, images from two other sets that contained larger stochastic images of size 1000 1000, generated by the CCSIM method, were rescaled to smaller sizes and cropped to have the same size as rest of the DI. We used three scale factors for enlarging and shrinking the training images, because we assumed that we have no information about the scale discrepancy between the input and training images. As such, it is reasonable to include a wide range of scales in the training data. Furthermore, such a diverse scaling helps the network to better learn and capture the multiscale structure of the images. Flipping was used with the images in three sets. They were ipped left-right, up-down and left-right, and up-down. Then, cropping was applied to the resulting images in the three sets. As mentioned earlier, a portion of the stochastic training images was generated with larger dimensions in order to have the same quality when some of the lters, such as rotation or resizing to smaller sizes, were applied to them. Therefore, rotation and resizing were applied to such images as well. Figure 2.4 presents the input images and examples of the results after applying the lters. As can be seen, the lters changed the initial input. Such variations 41 Figure 2.3: Schematic illustration of the HSDL algorithm 42 in the input training images help a deep-learning algorithm to perform better in predicting the features of a high-resolution image, because the low-resolution input image misses many features when it is upscaled to the target size. Many of such patterns might, however, be repeated in the training phase and, thus, a trained network can retrieve the missing features using the high-quality images. The deep-learning algorithm uses the high-quality training data and extracts their dier- ences with the upscaled images of the same set, using a residual learning strategy. In fact, the training images all have high resolution, whereas the image at hand that requires enhancement is a low-resolution one, i.e., one with a smaller size. Therefore, the target image is enlarged, i.e., upscaled to the size of the training images by using an interpolation method with bicubic functions. Then, the network learns iteratively how to estimate the residuals. After the training is complete, i.e., after the network learns how to estimate the residuals, the high-quality image is reconstructed by adding the original enlarged image to the estimated residuals. Therefore, the upscaled images are used as the input, while the estimated residuals represent the output. The deep-learning neural network that we used had 22 individual layers, whose architecture is shown in Figure 2.3. It represents a convolutional deep-learning network that learns mapping of high-resolution training images onto the low-resolution input image. The utilized images are similar in their content, but dier in their details. To reduce the computational time, 64 random patches of size 41 41 from the training data were selected from each training image and used in the neural network, rather than working with the full-size images. One may argue that such patches can jeopardize capturing large-scale structures in the images. Using the aforementioned lters, however, and in particular the scaling operators, the large-scale structures were still accounted for. Then, the constructed small patches were fed to the neural network over several iterations. If the computational power can be aorded, one can, of course, use the entire images in the computations. The low-resolution image represents the rst layer. The next layer is the convolution layer, which in this study contained 64 lters of size 3 3, hence assigning one lter to each patch. The performance of the deep-learning networks improves with increasing the number of the lters, but the training time also increases as well. Choosing a smaller lter size is usually preferred in the sense that it reduces the computational time, but it may also result in missing the large-scale structures of the image. On the other hand, larger lter sizes complicate the 43 Figure 2.4: Flipping (a) the original image by (b) left-right; (c) up-down, and (d) up-down and left-right rotation. Resizing (e) the original image to (f), (g), and (h) larger scales. Noise in (i) the original image ltered by (j), (k), and (l) various Gaussian noises. Rotating (m) the original image at (n), (o), and (p) 45, 90, and 225. Cropping of (q) the original image at (r), (s), and (t) various locations 44 training process (Wang et al. (2018b)). Thus, the optimal lter size should be decided a priori, or selected by trial and error. Zero padding was also used in our study in order to generate feature maps with the same input layer. Every convolution layer, except the last one, had 64 lters of size 33. The remaining 19 layers contained similar convolutional rectied linear unit (ReLU) elements. The initial weights were assigned randomly, but were optimized during the learning. For the hidden layers we used the ReLU elements, which represent the simplest form of nonlinear activation function, instead of the usual sigmoid. The layer applied the following equation - the rectier - to the input values without changing its depth information: f(x) = log[1 + exp(x)]; (2.2) which is called the Softplus or SmoothReLU function. In eect, deep learning takes the ex- tracted features at each pixel x and yields f(x). The ability of the ReLU for speeding up the computations with large training networks and making it much faster than the common activation function has already been demonstrated (Nair and Hinton (2010)). The last layer of the network has a single lter of size 3 3 50, followed by a regression layer that calculates the MSE 2 between the estimated residual image and the actual one available in the training data: 2 = P i;j;k [y Resid (i;j;k)y HSDL (i;j;k)] 2 P x;y;z y(i;j;k) : (2.3) 2 represents the loss function of the NN. Here, y(i;j;k) represents the value of each pixel (voxel) at (i;j;k) for the residual image y Resid , and the image predicted by the proposed HSDL network, y HDSL . The deep-learning part of the proposed HSDL method, namely, the CNN, was trained with stochastic gradient descent momentum optimization (see Ruder (2016), for a review), where the gradient refers to the errors. Since we deal with complex images and very large data sets, the gradient descent method does not by itself solve the problem of accurately estimating the optimal values of the weightsw. Thus, the algorithm was used with momentum after each iter- ation, with the momentum dened as the moving average of the gradients as it remembers the update w, the dierence between the weights between two consecutive iterations, and it uses it in the next update. The momentum is used to prevent the optimization computation from converging to a local minimum or a saddle point. A high momentum parameter can accelerate 45 the speed of convergence to the true minimum of 2 . We should, however, keep in mind that setting the momentum parameter too high may give rise to overshooting the true global min- imum, hence making the computations unstable. On the other hand, a momentum coecient that is too low cannot reliably help the minimization in order to avoid local minima, hence slowing down the training. We also used a moving average because the data are represented by images. The gradient descent method does the updating using only a few training examples in order to have more stable convergence and highly optimized matrix operation, which is used in the cost and gradient calculation. The momentum update is given by: v t+1 =v t +rQ t (w t ); (2.4) w t+1 =w t v t+1 ; (2.5) where Q i (w) is the loss function at the rst observation in the data set, is the learning rate or step size, is the hyperparameter2 [0; 1], and v t is the corrected bias. Hyperparameters are variables that determine the NN's structure, such as its number of hidden units, and the variables that determine how the NN is trained and, therefore, they control how many iterations are used with the current updates. The learning rate is another hyperparameter that controls how much the weights of the network should be adjusted with respect to the loss gradient. The lower its value, the slower does the iteration travel along the downward slope toward the true minimized state. While using low initial values of the learning rate may seem appealing in terms of making sure that no local minima is taken as the true one, the optimization computation will also take a long time to converge, especially if it is trapped on a plateau region. Table 2.1 summarizes the values of all the parameters used in the HSDL computations, most of which were estimated by starting with an initial guess for each parameter and iterating and rening the estimates by straightforward computations. We used an initial learning rate of 0.1 and a gradient threshold of 0.01. The learning rate was decreased to 10 10 in 100 epochs, using the indicated learning rate factor. L 2 norm of the gradients was used as the gradient threshold method. The training was carried out on a GTX-1030 graphic card (NVIDIA) and took about 41 GPU hours and 193,000 iterations. The loss function for the training and validation phases are compared in Figure 2.5. They both reduce drastically, reaching their nal value after around 20 epochs. Note, however, that 46 due to the dropout layers in the network, the noise in the loss function for both training and validation is omitted. Figure 2.5 also indicates that the number of epochs used, 100, is sucient for training the network. Table 2.1: Summary of the parameters used in the HSDL algorithm. Parameter Value Number of training images 2000 Minibatch size 64 Initial learning rate 0.1 Learning rate factor 0.1 Maximum epoch 100 Gradient threshold 0.01 Momentum coecient 0.9 0.5 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 0 20 40 60 80 100 MSE Epoch Training Validation Figure 2.5: The loss function (MSE) for the training (upper curve) and validation(lower curve) phases 2.4 Results and discussion The HSDL algorithm was used to analyze and model a complex shale formation with irregular pores. To check the accuracy of the results, they are compared visually and computationally based on rigorous statistical tests, which measure the similarity between the enhanced images and the original one. To better demonstrate the capability of the proposed method, the results 47 are compared with one of the common image-resizing methods, namely, the bicubic interpola- tion method, as well as with the regular deep-learning algorithms. Next, the accuracy of the generated images are compared in terms of the connectivity and correlation function of the images. 2.4.1 Comparison of the images' features Figure 2.6 presents visual comparison of the results, indicating excellent accuracy of the pro- posed HDSL algorithm. The initial low-resolution image represents a very smooth and opaque view of the pores in the shale sample, whereas the enhanced image generated by the HDSL manifests features as seen in the reference image. Similarly, the image generated by the bicubic interpolation method is a smooth reconstructed image. As expected, the regular deep-learning algorithm does not perform well when only a limited number of images is used. Note, however, that in modeling of porous media one usually has only a few high-resolution images, which are not sucient for taking advantage of the recent articial intelligence methods. A better comparison between the results generated by the various methods is provided by the zoom-in portions in the images; these are also shown in Figure 2.6. Note that no similar image, in addition to the low-resolution input, was used in the training of the HDSL network. We also compared the results quantitively using various statistical measures. The rst measure we used was peak signal-to-noise ratio (PSNR)R, which was used with the images generated by both the bicubic interpolation and the HSDL algorithm, and compared with that of the reference high-resolution image.R is dened by R = 10 log 10 M 2 I 2 ! ; (2.6) where M I denotes the maximum possible pixel value in an image I. The second measure is the structural similarity indexI s (sometimes referred to as the SSIM) that evaluates the visual impact of the characteristics of an image, such as the luminance l, the contrast c, and the structure s against the high-resolution image. It is computed by (Wang et al., 2004), I s (I 1 ;I 2 ) =l (I 1 ;I 2 )c (I 1 ;I 2 )s (I 1 ;I 2 ); (2.7) 48 Figure 2.6: Comparison of (a) the reference image and (b) the low-resolution input with the image obtained by (c) regular deep learning; (d) the bicubic interpolation, and (e) the proposed HSDL algorithm. For a better comparison, a zoom-in portion is also shown. 49 where I 1 represents the enhanced image while I 2 is the high-resolution image. Consider two windowsx andy that may be two complete images. Then, the three aforementioned quantities are given by l(I 1 ;I 2 ) = 2 x y +C 1 2 x + 2 y +C 1 ; (2.8) c(I 1 ;I 2 ) = 2 x y +C 2 2 x + 2 y +C 2 ; (2.9) s(I 1 ;I 2 ) = xy +C 3 x y +C 3 : (2.10) Here, and represent, respectively, the mean and standard deviations of the pixel values in the indicated windows, xy is the covariance of x and y, C 1 = (k 1 L) 2 and C 2 = (k 2 L) 2 are two variables to stabilize the above ratios with weak denominator, and C 3 = C 2 =2. L is the dynamic range of the pixel values, which is 1, and typically, k 1 = 0:01 and k 2 = 0:03. In the limit, = = = 1 one obtains I s (I 1 ;I 2 ) = (2 x y +C 1 )(2 xy +C 2 ) ( 2 x + 2 y +C 1 )( 2 x + 2 y +C 2 ) ; (2.11) Another quantitative comparative measure is the natural image quality evaluator (NIQE) that estimates the perceptual image quality, measuring the distance between a natural scene's statis- tics of the input image and features of an image in the data set used to train the HSDL network (Mittal et al. (2013)). Multidimensional Gaussian distributions were used to model the features in our study. The computed results for the numerical measures are listed in Table 2.2. Larger values of the PSNRR and SSIM I s indicate better image quality. But, while higher values ofR represent closer similarity between the enhanced image and the reference one, they are based on pixels' error and do not consider how one may interpret the quality of an image by making a visual comparison. To overcome the shortfall, I s should be considered as well, in order to include the contrast, brightness and structure of the image. Lower values of the NIQE represent better perceptual quality, and are interpreted as implying a smaller distance - closer similarity - between the natural scene of the generated image and the reference image. Table 2.2 indicates that the highest values of the PSNR and SSIM along with lowest value of 50 the NIQE are produced by the images generated by the HSDL image. Moreover, the quantita- tive comparison in Table 2.2 indicates that the regular deep-learning algorithm without enriched training image produces less accurate results than those produced by the HSDL algorithm. Table 2.2: Computed PSNR, SSIM and NIQE for the bicubic, regular deep learning, and HSDL algorithm Bicubic interpolation Regular deep learning HSDL PSNR 25.775 25.546 26.0586 SSIM 0.7017 0.7031 0.7094 NIQE 6.133 5.821 5.454 We also evaluated the accuracy of the HSDL algorithm by comparing the frequency distri- butions of the pixels. Figure 2.7 presents the comparison between the results for the low- and high-resolution images, as well as those generated by the bicubic interpolation and the HSDL algorithm. The results indicate that the frequency distribution of the image produced by the the HSDL algorithm is the closest to that of the reference image. 0 5 10 15 20 25 30 35 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Frequency Pixel Value LR HR bicubic VDSR HSDL Figure 2.7: Frequency distributions (histograms) for the low-resolution (LR) input, high- resolution (HR) reference, bicubic interpolation, and HSDL images 2.4.2 Comparison of the images' morphology Similar to any type of porous media and materials, accurate models of shale formations are crucial to the study of ow of water, oil, and gas in them. Developing such models entails the ability to correctly represent the morphology of the pore space, including its porosity - the 51 volume fraction of the pores - the long-range connectivity of the pores that provide the paths for uid ow through the formation, and the correlations between them. Table 2.3 summarizes the results for the porosity, which conrm that the estimate for the porosity of the image generated by the HSDL algorithm is very close to that of the original high-resolution reference image. Table 2.3: Comparison between estimates of the eective porosity of the various images. Image type Porosity (%) Reference image 3.78 Input image 2.33 Bicubic interpolation 2.44 HSDL 3.67 Next, we characterized the spatial long-range connectivity of the images using the multiple- point connectivity (MPC) function (Krishnan and Journel (2003)). The MPC function is the probability p(r;s) of having a sequence of s connected points in the pore space in a given direction r. If an indicator function I(u) is dened for the pores of a porous formation by I(u) = 8 > < > : 1 for u2 pore space 0 otherwise (2.12) then, p(r;s) is dened by p(r;s) = ProbfI(u) = 1; I(u + r) = 1;I(u +sr) = 1g : (2.13) Thus, we computedp(r;s) for the reference high-resolution image, those obtained by the HSDL and bicubic methods, and for the original low-resolution image. The results are shown in Figure 2.8. Clearly, the results of the original high-resolution image and the enhanced one produced by the HSDL algorithm are practically identical. The third morphological property that we used to test the accuracy of the method is the autocorrelation function g(r), dened by g(r) = h[I(u)][I(u + r)]i 2 ; (2.14) where r =jrj, is the porosity of the porous formation, andhi denotes a spatial average over locations u and, therefore, =hI(u)i. Figure 2.9 compares the computed autocorrelation 52 functions for the four images. Once again, the agreement between the autocorrelation functions of the original high-resolution image and the one that was generated by the HSDL algorithm is excellent. 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 1 11 21 31 41 p(r) Lag r 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 1 11 21 31 41 p(r) Lag r 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 1 11 21 31 41 p(r) Lag r Horizontal Vertical Diagonal Figure 2.8: Comparison of the computed multi-point connectivity function p(r). Solid black and gray, dashed gray and dotted gray represent, respectively, the results for the reference high- resolution, HSDL, bicubic interpolation, and the original low-resolution images. The results for the original high-resolution image and the enhanced images produced by the HSDL algorithm one are practically identical and dicult to separate. 53 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 1 21 41 61 g(r) Lag r 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 1 21 41 61 g(r) Lag r 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 1 21 41 61 g(r) Lag r Vertical Diagonal Horizontal Figure 2.9: The computed auto-correlation functions. The notation is the same as in Fig.2.8 54 2.5 Summary Producing high-quality images of porous media, and in particular shale formations, by conven- tional experimental methods is a dicult task that requires considerable investment of time and resources. Most of the currently-available images are of low quality type and require signicant enhancement, if they are to be used in practice for modeling of shale formations. One may train a deep-learning network by the available data or images and then utilize it to enhance the resolution of the available low-resolution images. The training faces a major diculty, however, as it requires a large number of images that are usually unavailable. To address the problem of enhancing the resolution and accuracy of images of porous media with a limited number of images, we proposed a novel hybrid method in this chapter. First, a reconstruction method is used to generate a large number of plausible realizations of a shale formation based on a limited number of high-resolution images, which is accomplished at very low computation cost. They are then used to train a deep-learning network. The results were compared with the enhanced images generated by the bicubic interpolation and the reference image, and those produced by the network without the enriched data set. The comparison conrmed the superior quality of the HSDL-generated images, when compared with those produced by bicubic interpolation and regular deep learning. Therefore, the images produced by the HSDL algorithm will enable one to better estimate the physical properties of complex porous media, including shale formations, and simulate and forecast ow of multiphase uids in them. 55 Chapter 3 Linking morphology of porous media to their macroscopic permeability by deep learning 1 3.1 Introduction Ecient and cost-eective utilization of natural resources that are stored in geological forma- tions depends on deep understanding of the formations' morphology and its relations with their ow, transport, and mechanical properties (Blunt (2017a); Sahimi (2011a)). The same is true about any other type of porous media, ranging from membranes to catalysts, adsorbents, and biological tissues. Thus, characterization of the morphology of porous media and materials and the link between their structure and various macroscopic properties have always been subjects of great interest and have been studied for decades (Sahimi (2003a)). In particular, the link between the eective permeability K e and the morphology of porous media is a much studied problem, for which various theoretical and computational approaches have been developed. Recent advances in developing images of complex materials by nondestructive methods have provided highly useful tools for achieving better understanding of the morphology of porous media. Examples of such tools include focus ion beam scanning electron microscopy (FIB- SEM) and X-ray imaging that provide high-resolution images for porous media (Baruchel et al. 1 This chapter was published as a paper in Transport in Porous Media 131, 427{448 (2020). 56 (2008); Brandon and Kaplan (2013); Kinney and Nichols (1992); Kak et al. (2002); Tahmasebi et al. (2017)). Such techniques can, in turn, help establishing a link between the permeability and the morphology. In general, the eective permeability of a porous medium is either measured or, provided that a model of the medium is available or can be constructed, is computed. Although they are routine, the experimental procedures are time-consuming and costly. Therefore, there have been many attempts to estimate K e by such analytical techniques as the critical path method (Arns et al. (2005); Daigle (2016); Ghanbarian et al. (2016); Katz and Thompson (1986, 1987); Skaggs (2011); Thompson (1991)), a model that connects a characteristic pore size to the eective permeability (Banavar and Johnson (1987); Johnson et al. (1986); Revil and Cathles (1999)), the eective-medium approximation (David et al. (1990); Doyen (1988); Ghanbarian and Javadpour (2017); Koplik et al. (1984); Mukhopadhyay and Sahimi (2000); Richesson and Sahimi (2019)), and various fractal models (Xu and Yu (2008)). Detailed models of porous media (see, for example, (Arns et al. (2001); Mostaghimi et al. (2013); Okabe and Blunt (2004); Tahmasebi and Kamrava (2018)) together with the lattice Boltzmann method and pore network models (Blunt (2017a); Sahimi (2011a)), including multiscale pore networks (Jiang et al. (2013); Prodanovi c et al. (2015); Tahmasebi and Kamrava (2018)), have also been used to compute the eective permeability of various types of porous media. Recent advances in machine learning methods oer a distinct approach to modeling of many phenomena involving porous media (see chapter 2) and (Karimpouli and Tahmasebi (2019); Karimpouli et al. (2019); Mosser et al. (2017); van der Linden et al. (2016)). In particular, deep learning (DL), an advanced machine learning technique, is based on training several layers of a convolutional neural network (CNN), the properties of a complex system in order to develop relationships among various types of data. The approach has proven to be far more eective than the traditional machine learning algorithms (Deng and Dong (2014); Bengio et al. (2016); LeCun, Yann, Yoshua Bengio et al. (2015)). The reason for their success is that the DL methods are capable of developing nonlinear mapping between various characteristics of a given system in order to predict its important properties. It is, however, critical to have a large amount of data to be able to train the DL networks. If extensive data are not available, one may take advantage of various available stochastic recon- struction methods (Adler et al. (1990); Chen et al. (2016); Hamzehpour et al. (2007); Jiao et al. 57 (2009, 2013); Mourzenko et al. (2011); Tahmasebi and Sahimi (2012, 2013, 2016a,b); Thovert et al. (2001); Yeong and Torquato (1998b,a); Zachary and Torquato (2011)) for generating plausible realizations of porous media that can be used for training the DL. In this chapter, we propose a novel approach based on the DL that combines Boolean meth- ods of generating models of porous media, stochastic image-based techniques of reconstructing porous media, and raw data in order to develop a link between the morphology of porous media and its eective permeability. More specically, the DL utilizes all the available data on the morphology and permeability of some porous media and predicts the eective permeability of other porous materials, based solely on their morphology. Thus, in eect, the DL establishes a link between the morphology and the eective permeability. The rest of chapter 3 is organized as follows. In the next section, we brie y describe the DL and the work ow that we utilize for generating the training data. Next, the DL method for estimating the permeability based on the morphology of porous media is explained, after which the procedure for training the DL algorithm using synthetic and actual porous media along with their corresponding permeabilities is described. The method is then tested by estimating the eective permeability of other porous media that were not used in the training. The chapter is summarized in the last section. 3.2 Background on deep learning The DL networks are classied into supervised learning (SL), unsupervised learning (UL), and hybrid or semi-supervised learning (SSL). In the SL, the inputs are provided with their labels, and the goal is to identify labels for an unseen sequence of inputs. In the UL networks, reusable feature representations are learned from unlabeled input data. In the SSL, the network is trained with both labeled and unlabeled datasets in order to modify the results from the labeled inputs. The SL and SSL have the same type of the DL networks structure. In this chapter we have used a SL method. Further details on the structure of articial neural networks as well as convolutional neural networks can be found in chapter 1. Therefore, we have only explained the details specic to the study in this chapter. In the context of the study in this chapter, the input data|3D images of porous media|are accompanied by their label, i.e., the permeability. Thus, the DL must 58 identify an ensemble of structures/features in the input data that lead to correct estimation of the permeabilities. It should be noted that in some problems, the input data do not have any labels, in which case the DL approach learns intrinsically the structure of the data without linking them to the output. The study in the present chapter focuses on training the network using 3D images with their permeability as the labels. Once the network is trained, it can estimate the target property, permeabilities, of other porous media. Training of the proposed network is explained further in Sect. 3.4. Convolutional neural networks take advantage of the hierarchical patterns in a given set of data and assemble more complex patterns using smaller and simpler ones. Thus, in terms of the connectedness and complexity, the CNNs are less extreme. They are often used with data with grid-like topology and have been shown to possess great potential for analyzing problems that involve images. They have four common layers that are called convolutional, activation, pooling, and fully connected layers (Alpaydin (2020); Bengio (2009); Chapelle et al. (2009); Deng and Dong (2014); Kamrava et al. (2019); Kim (2016); LeCun, Yann, Yoshua Bengio et al. (2015); Radford et al. (2015); Schmidhuber (2015); Wu et al. (2014)). The performance of the all neural networks is optimized by varying and adjusting such parameters as the weights and biases. In the SL methods, the output O j estimated by the network is compared with the real or expected output D j , through which the error e j is calculated (Schmidhuber (2015)): e j = 1 2 (O j D j ) 2 ; (3.1) with the total error being E = X j e j : (3.2) Thus, the goal is to minimize the total error E, the aforementioned loss function, by varying the weights. In practice, the average error, also called the cost function, is minimized. The minimization is done by the stochastic gradient descent (SGD) momentum optimization (Ruder (2016)) method, where the gradient refers to the gradient of the errors (see chapter 2.). 59 3.3 The methodology In this section, we describe the method that we have developed for linking the eective perme- ability to the morphology of a porous medium, using the proposed network. 3.3.1 The Training Images and Generation of Their Realizations We utilized ten 3D X-ray images of actual sandstones. Their statistical properties are described in the next section, where we present the results. (Andr a et al. (2013)) presented the image of a very large core, which we segmented and selected the ten images. It suces for now to mention that the porosity of the porous media is about 20%, while their permeability is between 200 and 500mD with a mean of 180mD. By computing the porosity of the samples of various sizes, we selected images of size 200 3 as the representative elementary volume (REV) of the core. A preprocessing step was carried out to prepare the input data for the CNN, which involved transforming the initial log-normal distribution of the permeabilities to a Gaussian distribution. The reason for doing is that a CNN can better connect the identied features, when the distribution is Gaussian, to a wider distribution than a skewed one (Chen and Lin (2014); Nicolas Remy et al. (2009); Sola and Sevilla (1997)). The procedure for transforming the data was as follows. First, the probability density function (PDF) of the permeability data was constructed based on which the cumulative density function (CDF)was computed. The target PDF, namely the Gaussian distribution, was also constructed using the mean and variance of the data, after which the corresponding CDF was determined. Having the two CDFs, the new PDF and, consequently, the new permeability values were calculated. To do so, we read for each selected permeability from the original CDF graph its equivalent permeability from the target CDF, i.e., the Gaussian distribution. To make the discussion rigorous, let us assume that we have a dataset (Z) that we wish to transform to a target distribution of the dataset (Y ), such that their CDFs areF Z (z) andF Y (y), respectively. The distribution transformation determines Y = (Z) and the p-quantiles such that (Devroye (1986); Johnson (1987)): F Y (y p ) =F Z (z p ) =p; p2 [0; 1]; 60 with y =F 1 Y [F Z (z)], where F 1 Y () is the inverse CDF. We originally have ten 3D images but, ten samples cannot provide the required variability in the types of morphology that one may encounter in sandstones, in order to provide reliable training to the CNN. Therefore, we increased the number of 3D images using two methods. One, the faster approach, was the Boolean method that, given the statistical distribution of the sizes of solid objects, produces as many realizations of porous media as needed by generating their packing. Some of the realizations are shown in Fig. 3.1. We generated 100 realizations of each type of the packing shown in Fig. 3.1. They cover a wide range of possibilities for the pore distribution and, thus, contribute to the training by adding more variations to the plausible realizations of the morphology. In this way, the DL algorithm used an enriched database that contained realizations of pore space with diverse morphologies, although their morphology is not close to those of the actual images. In the second approach, the original ten 3D Xray images were used to generate 500 stochastic realizations of the ten sandstone images, 50 for each of them. To do so, we utilized the cross-correlation-based simulation (CCSIM) that was developed recently (Tahmasebi et al. (2014); Tahmasebi and Sahimi (2012, 2013, 2016b,a)). All the realizations generated by both methods comply with the porosity distribution of the original 3D images. The CCSIM algorithm has been explained in chapter 2 (section 2.3.2). Here, we only present a schematic of CCSIM algorithm in generating a realization in Fig 3.2. (a) cube (b) sphere (c) ellipsoid (d) bar Figure 3.1: Illustration of some of 3-D images created by Boolean method 61 Candidate pattern Figure 3.2: Schematic illustration of the CCSIM method along with various overlap regions OL and a candidate pattern that is selected based on the similarity of the OLs and the digitized image DI Figure 3.3 shows the original large 3D image and its three stochastic realizations generated by the CCSIM. As an alternative, one can produce several large 3D images and divide them into smaller patches to make the training phase more feasible computationally. We then computed the eective permeability of all the realizations and the original images. No-slip condition at pore{ uid interface was assumed; a pressure dierence of 3 10 4 Pa (with the upstream pressure being 13 10 4 Pa and the downstream 10 5 Pa) was imposed across the realizations and images in a given direction, and a uid viscosity of 0.001Pa s was used. The Stokes' equation was then solved numerically. We used the commercial package Avizo to solve for the Stokes' equation. Detailed discussions are given by (Tahmasebi et al. (2015a)). 62 (a) (b) (c) (d) Figure 3.3: (a): The original large 3D image, (b-d): Three stochastic models generated using the image in (a) 3.3.2 Layers of the Convolutional Neural Networks The algorithm that we use is supervised learning (SL), implying that all the training inputs have their own labels, by which we mean a property in the input data or related to them. In this chapter, label refers to the permeability of the input data, i.e., 3D images of sand- stones. As discussed earlier, the original ten sandstone samples, the realizations of the pore space generated by a Boolean method, and stochastic realizations of the actual X-ray images of sandstones generated by the CCSIM are labeled with their permeability, calculated as de- scribed in the previous section. Our proposed network is composed of two machine learning structures, namely the CNN for feature extraction and a regular fully connected ANN for the permeability estimation. The CNN task is, thus, generating a mapping between the images and the permeabilities that must be developed using regression. After its training is complete, the proposed network is tested using the high-resolution 3D images of the actual sandstone. Figure 3.4 illustrates the schematic representation of the proposed machine learning architecture. In the next section, we only describe details of the network that are specic to the study in this chapter. More details on structure of convolutional neural network can be found in chapter 1 (section 1.3). 63 M Filters Input Layer Convolution +ReLU Pooling Convolution +ReLU Pooling Regression Output Fully connected Layer Predicted Responses for the inputs M Filters M Filters M Filters … Figure 3.4: Schematic of a convolutional neural network 3.3.2.1 The convolutional and activation layers We rst point out that, although both the traditional ANNs and the DL algorithms minimize the error dened by Eqs. (3.1) and (3.2), they do have one fundamental dierence, which is in the type of the input data that they utilize. The input to the traditional ANNs is typically in the form of point data, whereas the input for the DL algorithms is usually a very large dataset or a large image. In both cases, one has weights and biases that are updated by gradient descent (see Eqs. 2.4 and 2.5). For the study in this chapter we set = 0.1 and = 0.5. In the present work, we use convolution with the DL algorithm when the input data are represented by images, in which case the matrix w of the weights is referred to as lters. In other words, what are referred to as the lters, also called kernels, in the DL algorithms are essentially the same as the weights in the traditional ANNs. The network that we propose in the present chapter is neither a purely traditional ANN, nor is it a purely DL algorithm, but, rather, it is a combination of both. To develop the network, we rst extract the important features of the images using a DL algorithm and then feed them to a regular ANN to estimate the permeabilities. Furthermore, we use the lter terminology, rather than the weight matrix, when we convolve two matrices. Viewed this way, the overlap regions in the CCSIM algorithm described earlier may also be viewed as lters. The convolutional layers are the main layers in any CNN in which lters are used to extract the important features from the input data. The lters slide along the input data, with each producing a specic feature map. The input data, 3D images labeled with their permeabilities, 64 are then convolved using 3D lters, each of size k k k. For simplicity of describing the procedure, the following equation was used that represents a linear mapping (Yang et al., 2019): g(x;y;z) = m X i=m m X j=m m X k=m f(xi;yj;zk)w(i;j;k) +b (3.3) where f and g represent the input and output, respectively, (x;y;z) are the coodinates of the voxels in the input, and w and b are, respectively, the weights and bias. Each set of the 3D lters produces a 3D output image, which is the input for the next convolutional layer. Note that in Eq. (3.3), only the bias does not depend on the location; otherwise, every pixel value f (x, y, z) is multiplied by a weight and, therefore, the resulting feature map depends on the location. If, however, two pixels have the same value in the input data and the weights are also the same, then the output will be the same as well. This problem will be addressed by the pooling layer. The convolutional layer is responsible for convolving the input images using the lters, in order to extract their key features. It should be noted that one may require several lters, necessitated by the fact that image recognition requires various feature maps, and multiple lters capture their essential properties. Thus, in the present work we used fty 5 5 5 lters. As the activation function for all the layers up to the regression step, we used the ReLU, the simplest nonlinear activation function. More information regarding convolutional and activation layer can be found in chapter 1. 3.3.2.2 The pooling layer The pooling layer acts on each feature map separately in order to generate a new set of the same number of pooled feature maps. The computation involves selecting a pooling operation, much like a lter, to be applied to the feature maps. The size of the pooling operation or lter is smaller than that of the feature map. Two common functions used in the pooling operation are average pooling that calculates the average value for each patch on the feature map, and maximum pooling that computes the maximum value for each patch of the feature map. We used the former in this chapter, which is one of the most common pooling operations. Application of the pooling layer to all the feature maps produces the same number of maps. 65 The average pooling layer is expressed by, g(x;y;z) = s1 X k=0 s1 X j=0 s1 X i=0 f(sx +i;sy +j;sz +k) s 3 ; (3.4) where s is the coarsening length scale and (x;y;z) is the location of a voxel in the output feature map after applying the average pooling layer.We emphasize that the pooling layer is not designed to change the output, except coarsening the dimensions of the feature maps. Since pooling applies coarsening, the new values cannot be related to their previous locations, as it eliminates the spatial relationship between the original locations and the new values. More information on pooling layer can be found in chapter 1. 3.3.2.3 The fully connected layer The output of the pooling layers is attened to a 1D vector to provide the input to the fully connected layer, the nal layer in the proposed network before the output layer. As already pointed out, every convolutional layer has several lters, each of which presents a local feature. The fully connected layer, on the other hand, keeps a collection of the most important outputs from all the convolutional layers. Depending on the purpose of the network, such as regression or classication, the output layer may have a dierent number of neurons. In the classication problem, the number of classes determines the number of neurons in the output layer, with values ranging from 0 to 1 that represent the probability of occurrence of each class, whereas in the regression problem the number of neurons in the output layer depends on the number of the targets, i.e., the permeabilities that proposed network aim to estimate. Each neuron in the output layer has a continuous value that represents the prediction for each target (Maturana and Scherer (2015); Nielsen (2015); Yang et al. (2018c, 2019)). The functionality of a fully connected layer, which is the same as the conventional ANN, consists of several neurons expressed by, v =A R n X i=1 x i w i +b ! ; (3.5) where v denotes the neuron's output,A R (x) is the activation function for the regression layer, x i is the ith input, and n is the number of inputs in the fully-connected layer. 66 While the ReLU activation function is used as the operator in the convolutional layers, a linear activation function is utilized in the fully-connected layer. Therefore, all the feature maps in the last convolutional layer are connected to a unit in a fully-connected layer. Then, the output layer produces the result y i - the permeability for image i - calculated by a logistic regression layer: y i =A R (w i v i +b); (3.6) with A R (x) = 1 1 + exp(x) ; (3.7) where the number of outputs is also n. Thus, the overall training of the proposed network is summarized as follows: (i) Set the number of lters and their sizes, the architecture of the network, and other parameters. (ii) Initialize the weights, biases, and the lter matrices. The weights and lters are initial- ized by selecting their values from a Gaussian distribution with zero mean and a unit variance. (iii) Supply the training input - images in this paper - to the proposed network. The input is processed by various layers of the CNN as described, after which the proposed network produces its rst estimate of the output, the permeabilities. (iv) Compute the total error of the estimates relative to (iii) by Eqs. (3.1) and (3.2). (v) If the error is larger than a pre-set threshold, use back-propagation to calculate the gradients of the error with respect to all the variables, and stochastic gradient descent through Eqs. (2.4) and (2.5) to update all the weights and lter values and other parameter to minimize the error. (vi) Repeat steps (iii) to (v) for all the training input data until the error reaches a plateau and does not change any more, or a maximum number of iterations is reached. Some of the main features extracted by a given input image used in the study of this chapter are shown in Fig. 3.5. In Fig. 3.5a, one 2D training image that is used to extract features for visualization, and the activation of the rst convolutional layer, is shown, where we indicate which areas are extracted after applying the activation function on the convolved feature maps. Here, for the sake of demonstration and showing various other features, 96 lters with size of 5 11 3 and the ReLU as the activation function were used. As can be seen in Fig. 3.5b, 67 strong positive activation is indicated by the white pixels, whereas strong negative activation is illuminated by the black pixels. When a feature map is mostly gray, it indicates that it is not activated strongly. To have a closer view, some of such feature maps are shown in Fig. 3.5c. In a similar fashion, Fig. 3.6a shows a deeper layer of convolution (fth layer of the CNN network). As can be seen, the features look smoother and smaller, which is due to abstracting the features as the network progresses deeper. Figure 3.6b provides a zoomed-in view for four random feature maps shown in Fig. 3.6a. It should be noted that in the early layers of convolution, only simple features, such as the edge or color, are detected, whereas in the layers deeper within the network more complex features are learned by it. In the last layer, for example, the features are a combination of various patterns extracted from all the previous layers. 68 Figure 3.5: Illustration of some of the extracted feature maps. a The original input image as the training data; b 96 activated feature maps after the rst convolutional layer, and c four zoomed-in random feature maps from (b) 69 (a) (b) Figure 3.6: Illustration of some extracted feature maps in a the convolutional layer deep within the network (conv5 layer) and b four zoomed-in random maps from (a) 3.4 Results As discussed in the Introduction of this chapter, our main purpose for using the proposed DL method in this chapter is to develop a mapping between the complex morphology of porous media and their permeability, i.e., to establish a direct link between the two. To do so, various 3D realizations were generated, and their eective permeability was computed. Then, the 70 segmented images along with the computed permeabilities were used as the input and output, respectively. A schematic representation of the initial steps for producing the output is shown in Fig. 3.7. Once the input and output data were produced, 70 percent of them was used in the training phase, the step in which the CNN learns how to develop a relationship between the morpholog- ical information and the permeability, keeping in mind that, due to use of various lters, most of the important features are extracted and linked to the output. To make the training phase as useful to the CNN as possible, we selected a set of diverse data based on the permeability distribution. The distribution of the permeability for the entire available data is shown in Fig. 3.8, indicating a Gaussian distribution that makes the selection easier. As such, the lower, median, and upper quantiles were carefully sampled, a strategy that helps the CNN to be more predictive for the unseen or unsampled data. The rest of the data were assigned equally to the validation and testing of the accuracy. Segmented image Computing fluid distribution Permeability Figure 3.7: Schematic representation of computing the permeability and generating its dataset for 3D models 71 Figure 3.8: Transformed distribution of the permeabilities for all the porous media Based on the architecture of the network, the training data were fed into the CNN and, at the same time, the validation step was carried out. The loss functions for both phases are shown in Fig. 3.9. Note that the initial errors are higher in the training, whereas they have lower values in the validation dataset, which is due to using a trained network for the validation. For both phases, however, the error function levels o after the initial iterations. Furthermore, the results of the validation phase indicate that the network is ready to be used with the test data. After the training and validation phases, the test data were provided to the nal network. The results are shown in Fig. 3.10, which indicates that there are accurate correlations between the estimated and target (actual) permeabilities. The correlation coecients for the training and test were 0.94 and 0.91, respectively. The test results indicate that the CNN successfully links the complex morphology of the porous media to their permeability. The eciency of the computations is such that the permeability of a new sample can be estimated in less than 2 CPU seconds after the training is completed. Therefore, the trained CNN may be readily utilized for the future images and samples of porous media and, thus, allow avoiding the intensive computations for the eective permeability. 72 0 0.2 0.4 0.6 0.8 1 1.2 0 20 40 60 80 Loss Epochs Figure 3.9: Loss function for training data and validation shown in red and black respectively (a) (b) 0 100 200 300 400 500 600 600 500 400 300 200 100 0 Permeability (Estimated) Permeability (Target) 100 200 300 400 500 600 600 500 400 300 200 100 Permeability (Estimated) Permeability (Target) Figure 3.10: Estimated permeabilities versus actual (target) values for the training (left) and testing of the network (right) To further validate the performance of the trained CNN, the network was used with a distinct sample porous medium, namely a Fontainebleau sandstone that represents a dierent morphology and a wider range of the permeability. Its porosity is about 14%, representing a porous medium with relatively low porosity. More information about the Fontainebleau sandstone is given by (Anre a et al., 2013). The results are shown in Fig. 3.11. The accuracy of the correlation between the estimated and target permeabilities is comparable with that for the Berea sandstone. The correlation coecient for the Fontainebleau sandstone was 0.9, slightly smaller than the Berea sandstone. We conclude that training the proposed network 73 with adequate data enables it to predict the permeability of other types of porous media by computations that would require only a few CPU seconds. Permeability (Target) Figure 3.11: Estimated permeabilities versus actual (target) values for a sample of Fontainebleau sandstone 3.5 Summary Permeability, a most important property of porous media, is typically either computed or measured. Depending on the complexity of the pore space morphology, as well as the model that is used to represent it, computing the permeability requires intensive simulation. Experimental methods of measuring the permeability are routine and standard but can be time-consuming. Machine learning algorithms, on the other hand, learn and map the morphology of porous media onto their permeability. Although the training phase may require extensive computations, high- performance computing and graphics cards can be used to accelerate them. An important obstacle to using machine learning techniques and, in particular, the deep learning algorithms, is the lack of data. Providing large and diverse experimental data for porous media to machine learning algorithms is not practical. As such, the applications of such algorithms to porous media problems have remained limited. In this chapter, however, we combined machine learning with accurate reconstruction of porous media based on the CCSIM and Boolean methods in order to produce a large number of stochastic realizations of the media. This provides the proposed network with a rich dataset for its training, enabling it to produce 74 accurate predictions for the permeability. We should, however, point out an important point. The sandstones that we worked with in this chapter have relatively high porosity and pore connectivity. Other types of porous media, such as carbonates, have much lower connectivity's and porosities. Thus, the task of the DL-ANN hybrid algorithm for identifying the ow paths in low-connectivity porous media is more dicult, as there are fewer, and even rare ow paths, available. Though it remains to be studied, we believe that with enough data, a signicant portion of which can be generated by an accurate reconstruction method based on the images of such porous media, as well as synthetic models of low-connectivity porous media, the algorithm that we present should still produce the accurate results. We believe that the main ideas and the specic application discussed in this chapter are not limited to what we have studied, but that they can be used for other applications wherein enough data that are not available can be produced by combining reconstruction methods of porous media, such as the CCSIM, and machine learning. 75 Chapter 4 Quantifying accuracy of stochastic methods of reconstructing complex materials by deep learning 1 4.1 Introduction Materials with complex morphology, both human made and natural, are ubiquitous (Sahimi (2003a); Torquato (2002)). Characterizing their morphology the shape and size of their mi- croscopic elements and the way they are connected together is a critical rst step toward understanding the properties of complex materials and modeling the phenomena that occur there. Thus, over the past several decades a large set of microstructural descriptors has been developed theoretically (Sahimi (2003a); Torquato (2002)) and applied to characterization of a wide variety of complex materials. At the same time, many models have also been proposed for describing the morphology of materials that, due to their complexity, their development has entailed making various simplications and approximations. With the tremendous advances in instrumentation, image-based characterization of complex materials, as well as their direct use in computing their properties is gradually becoming the preferred approach. For example, recent advances in imaging techniques have played a funda- mental role in gaining deeper understanding of porous materials and their properties (Babout 1 This chapter was published as a paper in Physical Review E 101, 043301 (2020) 76 et al. (2001); Baruchel et al. (2008); Borb ely et al. (2004); Brandon and Kaplan (2013); Herman (2009); Kak et al. (2002); Kinney and Nichols (1992); Malmir et al. (2018, 2017); Padilla et al. (2012); Toda et al. (2008); Wang et al. (2011); Weck et al. (2008)). Producing high-quality images entails, however, investing a signicant amount of time and resources. One fruitful approach to addressing this problem is based on developing computational methods by which one is able to use one or very few images in order to reconstruct a large ensemble of plausible realizations of the same materials, so as to gain a better understanding of the uncertainties associated with computing their properties based on their images. Development of such ap- proaches, which are necessarily stochastic methods (Adler et al. (1990); Bekri et al. (2000); Hamzehpour et al. (2007); Levitz (1998); Yeong and Torquato (1998b,a)), has recently made considerable progress (Biswal et al. (2007, 1999); Biswal and Hilfer (1999); Chen et al. (2016, 2015); Jiao et al. (2008, 2009); Jiao and Torquato (2012); Tahmasebi (2018a); Tahmasebi and Sahimi (2012, 2013)), to the point that they may be used to design new materials with novel properties (Chen and Torquato (2018)). Stochastic reconstruction approaches may be divided into three main groups: object-based, statistical, and image-based methods, all of which are based on the availability of a limited amount of experimental data. In other words, having access to a dataset is an essential aspect of all such methods. In the object-based methods, the morphological statistics are extracted from the available images in the form of deterministic values or probability distributions. Then, some of the properties are selected from their distributions and, using an optimization technique such as simulated annealing, the initial \objects"|patterns of the morphology are generated and inserted in the simulation grid until all the predened constraints on the morphology are satised. Object-based methods provide satisfactory results if the morphology is relatively simple. In the statistical methods various spatial correlation functions describing the relationship between the microstructural properties are used to reconstruct realizations of a material (Adler et al. (1990); Yeong and Torquato (1998b); Bekri et al. (2000); Hamzehpour et al. (2007); Jiao et al. (2008, 2009); Tahmasebi and Sahimi (2012, 2013); Chen et al. (2015); Biswal and Hilfer (1999); Biswal et al. (1999, 2007); Chen and Torquato (2018); Kainourgiakis et al. (2005); Ma and Torquato (2018); Okabe and Blunt (2004); Zachary and Torquato (2011))using an optimization technique. In order to capture specic features, various correlations functions, 77 including two- (Malmir et al. (2017); Torquato (2002); Sahimi (2003a))and three-point (Malmir et al. (2018)) correlation functions, have been developed and computed. Provided that the correlation functions include a measure of the connectivity of the clusters of the various phases of a multiphase material, the statistical methods often provide accurate reconstruction (Jiao et al. (2008, 2013)). The most recent reconstruction algorithms are based on direct use of two- or three-dimensional (3D) images of materials. In these algorithms (Tahmasebi and Sahimi (2012, 2013); Tahmasebi (2018a); Tahmasebi and Sahimi (2016a,b); Hajizadeh et al. (2011)) digitized images I are di- rectly sampled without extracting any particular statistics or correlations functions. Such methods are able to infer rich information from the images and, hence, are capable of produc- ing high-quality realizations of materials and can be used with both pixel- and pattern-based reconstruction algorithms. The former generates each block in the computational grid sepa- rately, whereas the latter reconstructs a group of blocks together, which is computationally more ecient and mimics better extended correlations and connectivity in the microstructures. Given the variety of the stochastic reconstruction methods, an important issue is a critical evaluation of their accuracy and eciency. The evaluation may be in terms of the microstruc- tural statistics that are not used in generating the realizations, or based on computing the physical properties of materials, such as their permeability and elastic strength. The latter comparison is, of course, precise but requires intensive computations. Furthermore, compar- ing the realizations generated by various reconstruction algorithms based on relatively simple statistics, such as the porosity, which do not often shed much light on the complexity of the microstructure, might also not be a valid way of evaluating them. In this chapter we describe an ecient methodology based on machine learning that allows evaluating and ranking of various stochastic algorithms for reconstruction of 2D or 3D discrete or multiphase and continuous images. The methodology evaluates the realizations generated by the reconstruction algorithms in order to identify the optimal method that produces models of the images with maximum similarity with the original I, and a reasonable range of variability between the realizations, so that they are not more or less identical. To this end, we develop an autoencoder deep-learning (DL) method in order to reduce the dimensionality of microstruc- tural images. Then a quantitative measure is introduced for quantifying the performance of stochastic algorithms for materials reconstruction. 78 The reason for using the DL is its ability for capturing the latent complex features in the realizations. Such features cannot be captured or analyzed by the regular statistical methods (Tan et al. (2014)), whereas the DL can represent them highly accurately. Furthermore, as we show below, the standard two-point descriptors, or even a multiple-point connectivity function, cannot accurately dierentiate the dierences between the realizations generated by various stochastic reconstruction methods, as they produce a slim uncertainty space around the I. The rest of this chapter is organized as follows. We rst explain two microstructural descrip- tors that we will use to evaluate the realizations of a complex material. Section 4.3 explains the methodology that we propose based on machine learning. We then describe four stochastic reconstruction algorithms that are evaluated in this chapter. The methodology is tested in Sec. 4.5 with three distinct types of materials and their images. Section 4.6 provides a summary of the paper. 4.2 Microstructural descriptors We rst describe two microstructural descriptors (Torquato (2002); Sahimi (2003a)) that have been used in some of the stochastic reconstruction of materials. To begin with, we dene a phase-indicator function of materials consisting of phases 1 and 2 with volumes 1 and 2 and volume fractions' 1 and' 2 , where 1 [ 2 = and 1 \ 2 = 0. Then, the indicator function of phase i is dened by I (i) (x) = 8 > < > : 1; x2 i 0; x2 i (4.1) with I (1) (x) +I (2) (x) = 1. The interface between the two phases is dened by the indicator function M(x) = rI (1) (x) = rI (2) (x) ; (4.2) which is nonzero when x is on the interface. An important microstructural descriptor is the lineal-path functionL (i) 2 (x 1 ; x 2 ) that provides information on the phase connectedness for short-range connectivities, and has been used in 79 many of the past reconstruction works. If we dene a function, (i) (x 1 ; x 2 ;) = 8 > < > : 1; x 1 ; x 2 2 i () 0; x 1 ; x 2 2 i (); (4.3) for a sample , then, the lineal-path function for phase i is given by L (i) 2 (x 1 ; x 2 ) = D (i) (x 1 ; x 2 ;) E ; (4.4) where the averaging is over the samples. The chord-length probability density functionL (i) c (z) for phasei is very similar toL (i) 2 (z) in the sense that it represents a line segment with its interior points in one of the two phases, and the end points on the interface between the two phases. More precisely, it represents the probability of nding a chord of length ` c between two points in phase i, which is related to L i 2 (z) via, L (i) 2 (z) = ' i R 1 0 (yz)L (i) c (y)H(yz)dy R 1 0 yL (i) c (y)dy ; (4.5) where H(x) is the Heaviside step function. We also dene a multiple-point connectivity function p (i) m (h;m) that quanties the long- range connectivity of a material, as it represents the probability of having a sequence of m connected points in phase i of a material in a specic direction h, and is dened by p (i) m (h;m) = ProbfI(x) = 1;I(x + h) = 1;:::;I(x +mh) = 1g; (4.6) where I(x) is the indicator function dened earlier. p (i) m (h;m) accounts for curvilinearity and complexity in a microstructure, as it calculates the probability of nding multiple connected points by considering a tolerance core around a target direction. 4.3 The deep-learning methodology Unsupervised learning is a type of learning that helps identifying previously unknown pat- terns in datasets without pre-existing labels, and is also known as self-organization that al- lows modeling of probability densities of given input data. In the study of this chapter we 80 use an autoencoder DL neural network (NN) to reduce the dimensionality of the realiza- tions of materials, which is an unsupervised DL algorithm that utilizes backpropagation in order to set the target values to be equal to the inputs. The algorithm maps the inputs X =fx (1) ;x (2) ;:::;x (i) g; x (i) 2< n onto the output y (i) by determining a function h w;b (x)x, where n is the number of pixels or voxels in the image, and w and b denote the weights and biases. Generally speaking, the NNs consist of three parts, namely, the encoder , the code or latent layer F and the decoder , such that :! F; : F!; (; ) = arg min ; kX ( )Xk 2 ; : (4.7) where denotes the convolution operation. Recall that the encoder operator compresses each realization to just three numbers (x;y;z) that, in the present work, is the same as F. The decoder starts from the three \coordinates" (x;y;z) and reconstructs the initial images in the second phase. As described in Ref. (\Unsupervised Feature Learning and Deep Learning Tutorial," n.d.), the autoencoder attempts to learn the function h w;b (x), which is an approx- imation to the identity function, in order to produce an output similar to x (i) . Although the identity function may seem trivial to learn, imposing constraints on the NN by, for example, limiting its number of hidden units, enables one to gain useful insights into the image and discover features in the input data. Before describing further the idea behind the autoencoder DL, we point out that the network consist of a few layers, with the rst one representing the input image. Next is the convolutional layer that consists of a set of internal layers that extract important features of the image and reduce the dimensionality of the data through a convolution function. After which an activation functionA(x), have been used. In this study we have applied the rectied linear unit (ReLU) that adds a small degree of nonlinearity to the convolved feature map, or the model. ReLU, A(x) = max(0;x), eliminates all the negative values sinceA(x) = 0 ifx< 0 and is only mildly nonlinear due to the dierences between the shapes ofA(x) for x< 0 and x> 0. An important problem with the output feature maps is that, they are sensitive to the features' location in the input image. To address the problem one may down-sample (coarsen) 81 the feature maps to reduce their dimensionality, hence making them more robust to changes in the features' positions. The pooling layer (PL), the next layer in the hierarchy of the layers in a typical autoencoder network, reduces the dimensions of the image further, as it passes through it. It condenses the feature maps by simplifying the information and, hence, reducing the computation, especially when one has a large number of features in the data. In particular, the computation time in the regression stage is reduced, where overtting is a major concern. The PL acts on each feature map separately in order to generate a new set of the same number of pooled feature maps, for which a pooling operation, much like a lter, must be selected and applied to the maps. The size of the pooling operation is smaller than that of the feature map. Note that the pooling operation is specied by the user, rather than being learned. Two common functions used in the pooling operation are average pooling that calculates the average value for each patch on the feature map, and maximum pooling, or max-pooling, which computes the maximum value for each patch of the feature map. In the study of this chapter we use the latter for down-sampling. Afterwards, the extracted and summarized features will be connected to a multilayer perceptron network to produce an output image. If the activation of hidden unit j in the autoencoder, using an input x, is represented by A (` h ) j , then, ^ j = max n A (` h ) j h x (i) io ; i = 1; 2;;m; (4.8) represents the maxium activation of neuron j, where m is the number of training examples, and ` h denotes the hiddren layer in which the neuron is located. Then, the goal is to satisfy the following constraint, ^ j =; (4.9) where is a parameter to be reproduced as the result of the activation of neuron j. To do so, one adds an extra penalty term to the cost function, the function to be minimized for the dierence between the ^ j and , in order to control the dierence between the two. One possible penalty term is the Kullback{Leibler (KL) divergence (Kullback and Leibler (1951)), sometimes called the relative entropy, which is a measure of how one probability distribution is dierent from a second reference probability distribution, and is given by D KL (jj^ j ) s 2 X j=1 log ^ j + (1) log 1 1 ^ j ; (4.10) 82 wheres 2 represents the number of neurons in the hidden layer. Note that D KL (jj^ j ) increases monotonically as ^ j deviates from and, therefore, D KL () = 0. With D KL as the penalty term, the overall cost function to be minimized is then given by, C O (W;b) =C(W;b) + p s 2 X j=1 D KL (k^ j ); (4.11) where C(W;b) is an average sum of the square errors for activation of the hidden layer with respect to the input data, with the error being the dierence between the output of the NNs and the actual values, which is calculated by, C(W;b) = 1 2m m X i=1 jjO w;b [x (i) ]y (i) jj 2 : (4.12) Here, O w;b is the output of the NN for the given input data x (i) , y (i) are the actual values (or labels), and p andb are the weights and biases that are adjusted by the CNN in order to reduce the error produced by it. The weight of the penalty term, which was taken to be, p = 0:5, controls how strongly the term in uences C O (W;b). Using the autoencoder DL, one reduces the dimensionality of the images to some manageable scale. In fact, this is the main role of the DL in this study, i.e., transforming, for example, a 500 500 matrix, to a 1 3 one, which, computationally, is very signicant. Although, as described below, we have implemented the algorithm with 2D images, the DL can accomplish the same for 3D realizations just as eciently. None of the current methods for dimensionality reduction can do the same for large matrices or images I, as their output for such severe dimensionality reduction is completely distorted. But, due to its use of an iterative scheme and its ability for discovering latent patterns, the DL is capable of doing so by an advanced approach and producing very accurate results. As a simple example, consider (\Unsupervised Feature Learning and Deep Learning Tuto- rial," n.d.) a 10 10 image with the input data being the pixel values. Thus, n = 100, and we assume that there are s 2 = 50 neurons in the hidden layer of the NN. With only 50 neuron in the hidden layer, the NN must learn a \compressed" representation of the input image with n = 100. That is, given only the vector of the hidden unit activation in< 50 , it is forced to reconstruct the input image with 100 pixel values. If the pixel values are completely random 83 with no correlations between them, the compression and reconstruction would be very dicult, if not impossible. In practice, however, any real material, and thus its image, contain correla- tions in its morphology and, therefore, the algorithm can discover at least some of them. Thus, the encoder compresses the input image into a latent-space representation, while the decoder reconstructs the initial image from the representation, which is why the two NNs with suitable constraints are excellent tools for dimensionality reduction (compression), as well as learning data projection. Schematic representation of the networks is shown in Figure 4.1. Normalization Convolution + ReLU Max-Pooling Up-Sampling Soft-max Figure 4.1: Schematic representation of an autoencoder deep learning algorithm The outcome is a set of points that represents an image with a lower dimension. Thus, after acquiring the point data, similar to analysis of variance in statistics, we propose to quantify two typess of variabilities, or uncertainty spaces: (i) internal, which represents the variability or the uncertainty space between the realizations Re, and (ii) external, which is the uncertainty space for the dierences between Re and the input - the I. The most accurate reconstruction algorithm is then one that produces very dierent realizations that share the basic features with I. In other words, producing a set of diverse realizations that do not have any common features with the I is not a useful excercise, rather the aim is to maximize the internal variability between the realizations, so as to generate as many plausible realizations of a complex material and, at the same time, minimize their dierences with the I. The reconstruction algorithms are then ranked based on such concepts. 84 We dene the internal variability as the average distance between all pairs of the realizations, I = P Nr i=1 P Nr j=1 D(Re i ; Re j ) N r (N r 1) (4.13) whereD is the Euclidean distance between two points in a pair of realizations, and N r is the number of realizations. Similarly, the external variability is quantied by E = P Nr i=1 D(Re i ; I) N r : (4.14) The nal score s for each reconstruction algorithm is then dened by s = I E : (4.15) An accurate reconstruction method generates realizations that have high internal variability I and small external variability E . Thus, in principle, the higher s, the more accurate are the reconstruction method and the realizations that it produces. Note that computation of I and E are done after the DL algorithm compresses the realizations and extracts their most important features. 4.4 The reconstruction methods Using the algorithm described in the previous section, we carried out computations with four distinct reconstruction methods. They are the cross correlation-based simulation (CCSIM) method, the single-normal equation simulation (SNESIM) algorithm, the sequential indicator simulation (SISIM) approach, and what is referred to as the lter simulation (FILTERSIM). The four methods were originally developed for reconstructing geomaterials but, similarly to all the reconstruction methods, they can be used for generating plausible realizations of any type of complex material and media, given a limited amount of data or one or a few of their images. The same type of computations can, obviously, be carried out with the reconstruction methods that use, for example, simulated annealing (see, e.g., (Yeong and Torquato (1998b,a); Hamzehpour et al. (2007); Jiao et al. (2008, 2009, 2013)). One motivation for carrying out the computations for the aforementioned algorithms is that they were originally developed for 85 reconstructing realizations of large-scale porous media that are typically highly heterogeneous and, thus, complex. The aim of this chapter is not to evaluate and rank any specic recon- struction algorithm but only to compare various stochastic algorithms based on the method proposed in order to demonstrate how the proposed concepts numerically rank reconstruction methods when they produce very dierent or similar realizations. In what follows, we describe brie y each of the four methods used in this chapter. 4.4.1 The CCSIM algorithm We have described the CCSIM algorithm in chapter 2 and its extension to 3D in chapter 3 as well. 4.4.2 The SNESIM algorithm The algorithm was proposed by Strebelle (Strebelle (2002))(Strebelle, 2002). Brie y, one scans the I once, computes all the conditional probabilities for a given pixels' (voxels') conguration (the template), and stores them in a dynamic search tree. In this way, there is no need to rescan the entire I in order to reconstruct a block, as was done in the methods that had been developed prior to Strebelle's, because one can directly use the tree structure to retrieve the conditional probabilities. At each step of the reconstruction the method utilizes the original data and the previously reconstructed grid blocks in order to advance. The algorithms can be used only with discrete multiphase I (Strebelle and Zhang (2005)) and was improved signicantly by (Straubhaar et al. (2011)), who used a structure list for storing the data, instead of a search tree. Since, compared with a tree, a list is much less computationally demanding, the method improved on the SNESIM. 4.4.3 The SISIM algorithm This method classies each grid block into a category of phases of a disordered multiphase material with specic characteristics. The category is built up based on a variety of the available data. It is assumed that no two identical phases coexist in the same grid block. After each grid block is assigned to its phase category, its property value is attributed to it from the probability distribution function (PDF) of the corresponding phase. Thus, the method is also 86 called sequential indicator simulation-probability distribution function. The overall PDF of the phases represents the pattern of their occurrence over the length scale of the material to be reconstructed. The overall procedure for the algorithm is as follows. (i) A random walk is taken through the computational grid that represents the material, such that the unconditioned blocks, i.e., those that do not contain any hard data that must be honored, are visited once and only once. (ii) For each visited unconditioned grid block the prespecied number of conditioning phase data from the already reconstructed blocks, as well as any other available data, are identied. (iii) A process called indicator kriging is carried out in order to estimate the conditional probability for each phase category. Originally dveloped for geostatistical applications, kriging (Straubhaar et al. (2011)) is a method for interpolating properties for which the interpolated values are modeled by a Gaussian process and governed by the prior covariances. With suitable assumptions on the prior covariances, kriging provides the best linear unbiased prediction of the interpolated values. Indicator kriging uses indicator functions in order to estimate the transition probabilities from one block to the next. It proceeds by rst constructing an indicator semivariogram I . Suppose that one has N h pairs of data points x(y i ) and x(y i +h) with i = 1; 2;;N h that are separated by a distance h. Then, the semivariogram (h), which is a measure of the spatial correlations between the data, is dened by (h) = 1 N h N h X i=1 [x(y i )x(y i +h)] 2 : (4.16) The indicator semivariogram is contructed the same way by rst introducing a critical threshold x c that varies between a minimum and maximum value. Then, an indicator function I(x i ), a generalization of the indicator function dened earlier, is dened by I(x i ) = 8 > < > : 1 x(y i )x c ; 0 x(y i )>x c ; (4.17) where x i = x(y i ). The cumulative probability distribution function (CPDF) ^ F (x c ) is then constructed by ^ F (x c ) = 1 M M X i=1 I(x i ); (4.18) 87 where M < N h . The indicator semivariogram I is then constructed based on I(x i ), and the CPDF is used for estimating the conditional probabilities. (iv) Each phase's probability is then normalized by the sum of the probabilities of all the phases. The result is then used to construct a local CPDF. (v) A random numberr, distributed uniformly in [0,1], is generated and used together with the local CPDF in order to determine the reconstructed phase in the visited unconditioned grid block. (vi) For each unconditioned block, in the sense dened earlier, the random path steps (ii)- (v) are repeated. The nal result is a computational grid as the model of the material with the distribution of the phases. 4.4.4 The FILTERSIM algorithm To address the diculties of the SNESIM approach for 3D media, as well as to make it appli- cable to continuous images, several renements were proposed (Arpat and Caers (2007); Zhang et al. (2006)). Though such methods are accurate in 2D, their high computational cost makes them impractical for 3D applications. One of the main reasons for their high computational cost is that for each grid block one must compare a data event with all the patterns of hetero- geneity in the database or image. To address this problem (Zhang et al. (2006)) introduced the FLITERSIM algorithm. They used a set of six and nine lters for 2D and 3D reconstruction, respectively, in order to coarsen (\summarize") the basic spatial properties of the heterogene- ity patterns contained in the I, which reduce signicantly the dimensionality of the patterns' space and, hence, the computation time. The patterns are rst ltered using linear lters and, according to some similarity criteria, are grouped in distinct clusters. Then, for each cluster, a prototype pattern is computed that represents some sort of the average of all the patterns in that cluster. The rest of the algorithm proceeds, in each grid block, by selecting the most similar prototype and randomly drawing a pattern from that cluster and is repeated for all the blocks. Clearly, the use of a limited number of lters reduces the computation cost of the algo- rithm. The method does, however, have its shortcomings, the most important of which is that, it uses a limited set of linear lters that may not always convey all the important information and variability of the heterogeneity of the I. 88 4.5 Results and discussion The proposed DL method was implemented with three types of materials with dierent number of phases. They include a wide range of microstructures, from a relatively simple binary material, to a continuous image of the microstructure of a highly heterogeneous one. All the computations were carried out on a GPU, a NVIDIA GeForce GTX 1660 Ti with a total memory of 38,661 MB and shared memory of 32,670 MB, with 1,536 CUDA cores. The total computation time was about 2 hrs. Since the main idea of the proposed method is to capture the features at small and large scales, one can speed up the computations further by upscaling the realizations. 4.5.1 A two-phase microstructure As the rst example we consider a two-phase functionally-graded material, with one phase being macrocopically connected. Each phase is uniform, but its clusters are spatially distributed. The I, taken from Srividhya et al. (Srividhya et al. (2018)) and shown in Figure 4.2, contains a combination of long- and short-range features. To demonstrate the performance of the proposed DL algorithm, three realizations of the I were generated by the CCSIM, SNESIM, and SISIM algorithms, and are shown in Figure 4.2. Visual inspection indicates that the realizatioins generated by the CCSIM and SNESIM are far more similar to the I than those produced by the SISIM. Looks can, however, be deceiving and, thus, we need to quantify the dierences. Let 1 and 2 denote, respectively, the white and black phases of the material whose image is shown in Figure 4.2. To make a quantitative comparison between the realizations and the I, we computed the two microsructural descriptors described earlier, namely, the chord-length density function p (1) (z) and the multiple-point connectivity function p (1) (h) (for m = 150), both for phase 1. The results, shown in Figure 4.3, indicate that the chord-length density function does not dierentiate the dierences between the realizations generated by the CCSIM, SISIM, and the I. Although the multiple-point connectivity function produces more realistic variations, it is still inadequate for distinguishing the various patterns generated by the three reconstruction algorithms. Furthermore, the range of uncertainty indicated by the chord-length density function is very narrow, which is not the case in reality. Using the proposed DL method, the "distances" between the various realizations were com- 89 Digital Image CCSIM SNESIM SISIM Figure 4.2: Comparison between the original digital image I of a material with a binary mi- crostructure and the realizations generated by three stochastic reconstruction algorithms 90 pared. Recall that each realization is compressed by the DL to just three numbers, which we take them to be its "coordinates" (x;y;z). Figure 4.4 presents the results, in which the original I has been given the coordinates (0; 0; 0) at the center of the plot. The results indicate a large uncertainty space for the SISIM method (shown by green) far from the I, hence showing that the apparent similarity between the realizations generated by the SISIM algorithm and the I is not signicant. Likewise, the results for the SNESIM algorithm indicate reasonable similarity with the I, though they are not well distributed around the I. On the other hand, the results for the CCSIM algorithm indicate an acceptable uncertainty space consistent with Figures 4.2 and 4.3, as they are well scattered all around the I. The computed results for the internal similarity between the realizations, without consid- ering the I, are shown in Table 4.1. They are normalized based on the results for the SNESIM approach. The computed results for the external similarity - between the I and the realizations - are shown in Table 4.2. They indicate, for example, that the realizations generated by the SISIM algorithm are very dissimilar to the I, which is consistent with Figures 4.2 and 4.4. The algorithms are then ranked based on the nal score s, dened by Eq. (4.15), for which the results are presented in Table 4.3. Thus, accuracy of the algorithms is ranked as follow: CCSIM > SNESIM > SISIM. 91 Figure 4.3: Comparison between (top) the computed chord-length density function and (bot- tom) multiple-point connectivity function p(h) for the digital image shown in Fig. 4.2 and its realizations generated by two reconstruction methods: [(a) and (c)] the CCSIM and [(b) and (d)] the SISIM. The black curves are the computed results for the I, while the colored areas indicate the uncertainty space for the realizations generated by the CCSIM and SISIM. 92 0 -0.2 -0.4 -0.6 -0.8 0.2 0.4 0 -0.4 -0.8 0 -0.4 -0.8 0.4 0.8 0 0 -0.2 -0.4 -0.6 -0.8 0.2 0.4 0.2 0.4 0.6 -0.2 -0.4 CCSIM SNESIM SISIM Digital Image Figure 4.4: Uncertainty space representation of the realizations of the digital image I of Fig.4.2, after they were processed by the deep-learning algorithm. The I is shown as a black circle at the center. The others are for the realizations generated by the CCSIM (blue, those located around I), SNESIM (red, those located at the bottom of the right image), and SISIM (green, those located on the left side of the right image) algorithms. 93 Table 4.1: Comparison of the internal similarities I of the three algorithms. Algorithm SNESIM CCSIM SISIM SNESIM 1 1.24 0.41 CCSIM - 1 0.85 SISIM - - 1 Table 4.2: Comparison between the external similarities E of the three algorithms and the I. Algorithm SNESM CCSIM SISIM SNESIM 1 1.52 0.34 CCSIM - 1 0.21 SISIM - - 1 Table 4.3: Comparison between the nal scores s of the three algorithms. Algorithm SNESIM CCSIM SISIM SNESIM 1 0.47 1.24 CCSIM - 1 4.05 SISIM - - 1 4.5.2 A multiphase material As the second example we used the image of a yttria- dispersed ferritic stainless steel sample (Shashanka and Chaira (2016)). The material represents a multiphase medium with stationary properties. We used the CCSIM and SISIM algorithms to generate multiple realizations for the I, examples of which are shown in Fig. 4.5. Visual inspection of the results indicates that the CCSIM method produces higher-quality realizations. To make the comparison quantitative, we computed the two microstructural descriptors, the chord-length density, and the multiple- point connectivity functions for the I and its realizations. The results are presented in Fig. 4.6 and indicate, for example, that the chord-length density function generates a very small uncertainty space, whereas the realizations produced by the SISIM algorithm indicate wide variations and appear to be very dierent from the I. Similarly, the computed multiple- point connectivity function does not manifest the actual properties of the realizations. For example, the long- range connectivity that we computed for the realizations generated by the SISIM method is not exhibited by Fig. 4.6(d). 94 Digital Image CCSIM SISIM Figure 4.5: Comparison between the original digital image I of a multiphase material and the realizations generated by the two stochastic reconstruction algorithms. 95 Furthermore, the short-range connectivity of the realizations produced by the CCSIM al- gorithm is underestimated. As such, both functions are unable to dierentiate the variability among the realizations, as well as between them and the I. Using the proposed DL method, the conclusion is quantied and conrmed. Figure 4.7 presents the relative distances between the realizations and the I. Although the realizations generated by the SISIM algorithm manifest signicant variability, they all are very far from the I. On the other hand, the CCSIM algorithm produces realizations that not only exhibit reasonable variability but are also close to the I, as the spatial distribution of the patterns indicates. This is also conrmed by the computed nal scores s, compiled in Table 4.4, where the results were normalized by that of the CCSIM. Figure 4.6: Comparison between (top) the computed chord-length density function, and (bot- tom) multiple-point connectivity function p(h) for the image shown in Fig. 4.5 and its realiza- tions generated by two reconstruction methods: [(a) and (c)] the CCSIM and [(b) and (d)] the SNESIM. The black curves are the computed results for the I, while the colored areas indicate the uncertainty space for the realizations generated by the CCSIM and SISIM. 96 -0.6 0 -0.4 0.4 0 -0.5 0.5 0 -0.6 0.4 0 0 -0.4 -0.6 0.4 0.4 0.6 -0.4 0 -0.6 0 -0.4 -0.4 -0.6 0.4 0 -0.6 0 -0.4 -0.4 -0.6 0.3 CCSIM SNESIM Digital Image Figure 4.7: Uncertainty space representation of the realizations of the digital image of Fig. 4.5 after they were processed by the deep-learning algorithm. The I is shown as a black circle at the center. The others are for the realizations generated by the CCSIM (blue, those located in the right of the middle image) and SNESIM (red, those located in the left of the middle image) algorithms. Table 4.4: Comparison between the nal scores s for two algorithms. Algorithm CCSIM SISIM CCSIM 1 2.68 SISIM - 1 4.5.3 A continuous microstructure The third image, shown in Fig. 4.8, represents the image of a low-carbon steel whose mi- crostructure consists mostly of ferrite with the darker pearlite regions around the ferrite grains (\DoITPoMS - Micrograph and record," n.d.)(R F Cochrane (2002)). Because the image is on a gray scale, its reconstruction in its current form is dicult. Despite this, we used the CCSIM and FILTERSIM algorithms to generate realizations of the image. The SNESIM and SISIM approaches are not applicable to such images. The results are also shown in Fig. 4.8. The realizations generated by the FILTERSIM algorithm are not similar to the I, as they contain a signicant number of artifacts. For example, the spatial connectivity of the black phase is not 97 reproduced, and the spatial distribution of the gray phase does not match the given features in the I. In addition to the distinct visual dierences between the generated realizations, we also quantied the similarity and the uncertainty space using the proposed DL method. The nal scores s for the two algorithms were computed and are listed in Table 4.5 in which the results were normalized by that of the CCSIM, while the results for the uncertainty space are shown in Fig. 4.9. Consistent with Fig. 4.8, both sets of results indicate that drastic dierences exist between the realizations generated by the FILTERSIM algorithm and the I. Both CCSIM and FILTERSIM algorithms reproduce similar uncertainty space between the realizations, but their external uncertainty spaces are not comparable. Digital Image CCSIM FILTERSIM Figure 4.8: Comparison between the original digital image I of a material with a continuous microstructure and the realizations generated by two stochastic reconstruction algorithms. 98 0 0.6 -0.6 0 0.6 -0.6 0 -0.4 0 0 0.6 -0.4 0.4 -0.6 0.7 0.6 0.4 CCSIM FILTERSIM Digital Image Figure 4.9: Uncertainty space representation of the realizations of the digital image I of Fig. 4.8 after they were processed by the deep-learning algorithm. The I is shown as a black circle at the center. The others are for the realizations generated by the CCSIM (blue, those located in lower-left side of the right image) and FILTERSIM (red, those located in upper-right side of the right image) algorithms. Table 4.5: The nal scores for the two algorithms. Algorithm CCSIM FILTERSIM CCSIM 1 2.54 FILTERSIM - 1 4.6 Summary Despite the development of several reconstruction methods for modeling of heterogeneous ma- terials over the past two decades, the question of their validity for accurate representation of materials has remained open. At the same time, with advances in instrumentation, accurate 2D and even 3D images of heterogeneous materials are becoming increasingly available. This chapter presented a method based on machine learning for evaluating stochastic approaches for reconstruction of materials and making a comprehensive quantitative comparison between the realizations generated by them. The method reduces the dimensionality of the realizations using a deep-learning algorithm by coarsening the given image(s). Two criteria for evaluating the similarities between the realizations, as well as between them and the original digitized image, were introduced. First, for a reconstruction method to be eective, the dierences between the realizations that it produces must be reasonably broad, 99 so that they faithfully represent the possible spatial variations in the microstructure of the materials and their plausible representation. We refer to this as the internal uncertainty space. At the same time, the realizations and the I of materials must be completely similar. Thus, in a similar fashion, a second criterion, the external uncertainty space, is dened by which the similarity between each realization and the I is quantied. Finally, the ratio of two uncertainty indices associated with them is considered as the nal score of a stochastic algorithm, which provides a quantitative basis for comparing various approaches. The proposed method in this chapter was tested with images of various heterogeneous materials in order to evaluate four stochastic reconstruction algorithms. 100 Chapter 5 Phase transitions, percolation, fracture of materials, and deep learning 1 5.1 Introduction Two important problems in physics, applied physics, and materials science, as well as engineer- ing, are fracture and failure of materials, and the percolation phase transition. Nucleation and propagation of fractures (Lawn (1993); Fineberg and Marder (1999); Sahimi (2003b)) play a fundamental role in many systems of industrial importance, ranging from the safety of nuclear reactors (Andresen et al. (1990)) and aircraft wings (Paricharak et al. (2019)) to increasing production of oil reservoirs by hydraulic fracturing (Kwok et al. (2020)), cracking of disordered solids such as alloys (Cheung et al. (1994)), ceramics (Hasselman (1969)), superconductors (Chen et al. (2019a)), and glasses (Fan et al. (2014)), as well as earthquakes. The percolation problem (Stauer and Aharony (1994); Sahimi (1994); Saberi (2015)) is conceptually simple. A randomly selected fraction p of bonds or sites in a lattice are intact, while the rest are removed or blocked. Percolation represents the simplest fundamental model in statistical physics that exhibits phase transition, manifested by the formation of a sample- spanning cluster (SSC) of intact bonds or sites in a lattice at the percolation threshold p c , i.e., the smallest value ofp at which the SSC appears for the rst time. The most recent applications of percolation theory include mobile ad hoc networks (Mohammadi et al. (2009)), disruption 1 This chapter was published as a paper in Physical Review E 102, 011001 (2020). 101 of microbial communications (Silva et al. (2019)), cooperative mutational eects in colorectal tumorigenesis (Shin et al. (2017)), molecular motors (Alvarado et al. (2013)), protein sequence space (Buchholz et al. (2017)), and many more. The two problems, characterized by special points, i.e., p c and the incipient fracture point (IFP) that signals the formation of a SSC of microcracks, are not unrelated. The Poisson ratio of elastic percolation networks takes on a universal value at p c (Bergman and Kantor (1984); Schwartz et al. (1985); Arbabi and Sahimi (1988)), just as it does (Sahimi and Arbabi (1992)) at the incipient fracture point (IFP). The early stages of brittle fracture resemble percolation (Sahimi and Arbabi (1993); Malakhovsky and Michels (2006)) and the distribution of clusters of microcracks is qualitatively similar to that in percolation (Malakhovsky and Michels (2006)). The limit of innite disorder in models of fracture propagation represents a percolation process (Roux et al. (1988)). The approach to the IFP may represent a rst- (Boulbitch and Korzhenevskii (2020)) or second-order phase transition (Moreno et al. (2000)), just as the percolation transition is typically second order, but certain variations of it, such as bootstrap percolation, could be of the rst-order type (Chalupa et al. (1979); Aizenman and Lebowitz (1988)). In fact, it was suggested long ago that bootstrap percolation may be thought of as a model of quasi-static fracture propagation (Sahimi and Ray (1991)) in certain limits. The question that we address in this chapter is as follows. Given a limited amount of data for a physical property of a disordered solid that is undergoing fracturing, but is far from the IFP, such as its elastic moduli as a function of the extent of microcracking, can one predict the IFP and the elastic moduli as the IFP is approached? Likewise, given, for example, a limited amount of data for a ow or transport property of a porous medium far from its critical porosity c or the percolation threshold, can one predict c and the porosity dependence of the property? In the language of lattice models of percolation and fracture propagation (Sahimi and Goddard (1986); De Arcangelis et al. (1989)), if q is the fraction of bonds or sites removed from a percolating lattice, or the fraction of microcracks, with q being far from the IFP or (1p c ), can the percolation and physical properties be predicted all the way top c = 1q c and the IFP? Although one can write down a Hamiltonian Z for site (and bond) percolation, Z = P fcg p n c s (1p) Nn c s , where n c s is the number of occupied site in a cluster labeled by c, and N is the total number of sites,Z is typically used to study the behavior of the system close to 102 the transition point and estimating the scaling exponents. To our knowledge, there is currently no theoretical method that can use limited data for a region far from the transition point, p c or the IFP, and predict the physical properties of percolation and fracturing systems all the way from p = 1, or a perfectly unfractured medium, to p c or the IFP, including the location of the transition point. Thus, we aim to predict the physical properties near that point, as well as the location of the transition point itself. We may refer to this as machine-learning phases of matters, focusing on predicting phase transitions in materials with supervised or unsupervised learning (Zhang et al. (2019a)). 5.2 Methodology We present in this chapter an ecient deep neural network (DNN) that provides highly ac- curate predictions for such problems. Deep neural networks have proven to be powerful tools for extracting important information and patterns in high-dimensional data. The multilayer structure of the nonlinear elements in the DNNs allows regularizing a problem adaptively and developing complex relationships between the input data and the output without extracting the latter in an analytical form. Deep neural networks have numerous applications, from enhancing images of porous materials (see chapter 2) and linking their ow and transport properties to their morphology (see chapter 3) and (Wu et al. (2019)) to image classication (Nogueira et al. (2017); Zuo et al. (2015)), object (Uijlings et al. (2013)) and text detection (Xu and Su (2015)), and many other applications (Levine et al. (2019); Goy et al. (2018); Iten et al. (2020); Rogal et al. (2019); Snyder et al. (2012); Li et al. (2019); Liu et al. (2019c); Chua et al. (2019); Zhang et al. (2019a)). Zhang et al. (Zhang et al. (2019a)) studied the percolation transition and the XY model on two-dimensional lattices. First, they generated data for various values of p above, below, and near p c . The dimension of the data was then reduced and an unsupervised machine-learning (ML) algorithm, t-distributed stochastic neighbor embedding, was used to cluster the data into subsets corresponding to p<p c , p>p c , and pp c , from which they identied pc. Next, they used supervised ML methods, namely, convolutional and regular neural networks, to study the same. This is however completely dierent from the problem that we study in this chapter, since we use only limited data far from and above p c , which is what we may encounter in 103 practice, such as porous media or composite solids. To provide data for the training, as well as testing the accuracy of the DNN, we used Monte Carlo simulations to compute the percolation probability P(p), the fraction of intact sites in the SSC, from p = 1 to p = p c . We divided the interval [p c ; 1] into n segments with n = (1p c )=p and p = 0:01, so that n = 100(1p c ). For each p we computed P(p) and therefore obtained a sequence of P(p) values. We also calculated the bulk and shear moduli and hence the Poisson ratio of an elastic percolation network in which both central and bond- bending forces are present. The elastic energy E of the model is given by (Keating (1966); Kantor and Webman (1984); Feng et al. (1984); Feng and Sahimi (1985)). E = 1 2 X fijg g ij [(u i u j ) R ij ] 2 + 1 2 X fjikg g ij g ik [(u i u j ) R ij (u i u k ) R ik ] 2 : (5.1) Here, u i is the displacement of site i, R ij a unit vector from i to j, g ij a random variable that is equal to either 1 or 0 with probabilities p and 1p, respectively, and and are two force constants.In addition,fjikg indicates that the sum is over all triplets in which bonds ji and ik form an angle whose vertex is at i. The elastic moduli were computed at p i = 1ip, withi = 1;;n. We then used only a small portion of the computed properties near p = 1 to train the DNN, and used the rest to check the accuracy of the predictions by the DNN. The internal parameters of the DNN, as well as the weights and biases, are optimized in order to minimize the objective (loss) function, dened as the mean square errors between the predictions and the actual data. The main category of deep learning models consists of deep feedforward networks. Since we use a sequence of data, either P (p i ) or the elastic moduli, corresponding to various values of p = p i , we utilize a recurrent neural network (RNN), one in which the connections between the nodes form a directed graph along a temporal sequence, which allows it to exhibit temporal dynamic behavior by using their internal state (memory) to process variable-length sequences of input, where p =p i plays the role of time. Convolutional neural networks (CNNs) are not ecient for our purpose, since we deal with a sequence of data, whereas the CNNs are most ecient when the data are in the form of an image. For the study in this chapter we have used RNN. Recurrent neural network unlike a regular neural network, has feedback connections and can process not only single data points but also 104 entire sequences of data by replacing the regular neurons with memory blocks (Hochreiter and Schmidhuber (1997); Pascanu et al. (2013)) and is known to be most ecient when the data are in terms of a series. The memory blocks contain an operator, a sigmoid activation function, which controls the state and the blocks' output and encompasses a memory for the recent data sequences. Due to the complexity of the system, however, using a RNN leads to vanishing gradients for the back- propagating errors for multiple values of p i . Thus, we use a particular type of RNN, called a long short-term memory (LSTM) network. The LSTM network is described in detail in chapter one (1.3.1). The details of the computations are as follows. We carried out computation of P(p) for site percolation in the square lattice (p c 0:59) with a size 1000 2 and in the simple cubic lattice (p c 0:31) with a size 100 3 . For each p = p i the results were averaged over 250 realizations. As for calculation of the bulk and shear moduli, the elastic energy E was minimized with respect to u i and the resulting set of linear equations for nodal displacements was solved by the adaptive accelerated Jacobi-conjugate gradient method. Networks of size 20 3 were utilized and the elastic moduli were computed for numerous values ofp i , with the results averaged over 50 realizations. A LSTM neural network with 500 hidden cells was used. The Adam method (Kingma and Ba (2015)) was used for minimizing the objective function and optimizing the weights and biases. All the computations were carried out with a Dell desktop with a speed of 3 GHz. The entire computations with the DNN for each case took only a few CPU minutes. 5.3 Results Figure 5.1 presents all the data for the percolation probability P (p) in two dimensions, the portion that was used for the training, and the predictions of the DNN, while Fig. 5.2 depicts the same in three dimensions. The most remarkable aspect of these results is not the high accuracy of the predictions, but rather the fact that the DNN correctly predicts the sharp downward decline of P (p) near p c where, P (p) (pp c ) with = 5=36 in 2D and 0:41 in 3D, so that the slope dP=dp is innite as p! p c . In other words, although there is no evidence for the sharp turn in the portion of the data that was used for the training, the DNN still predicts it correctly. 105 Figure 5.1: Comparison of the computed percolation probability P(p) in the square lattice with the DNN predictions. Figure 5.3 presents the computed data for the bulk modulus in three dimensions, the portion that was used for the training, and the predictions of the DNN. Once again, only a small portion of the data nearp = 1 was used for the training. The agreement between the predictions and the data is excellent. Figure 5.4 presents the computed ratio K=, where is the shear modulus, for two values of =, the parameters of the elastic Hamiltonian E . As p c is approached, the ratio K= ows to the same universal xed point for both fracture propagation (Sahimi and Arbabi (1992)) and percolation (Bergman and Kantor (1984); Schwartz et al. (1985); Arbabi and Sahimi (1988)). Thus, the Poisson ratio = [3(K=) 2]=[6(K=) + 2] also ows to a universal xed point. The implication is that the LSTM neural network predicts physical properties of disordered and fracturing materials, given a small set of data far from p c or the IFP. We note that there are two types of possible errors in this type of calculation. One is the error in the data for training that has to do with the measurement or computation that generated the training data. The second type is, similar to all ML algorithms, due to the optimization of the weights and biases. Both are very small. The training data were obtained using large lattices and a large number of realizations. The errors in weights and biases are also very small, as in any ML algorithm, because the optimization is repeated multiple times in order to ensure that the true minimum of the MSE has been reached. 106 Figure 5.2: Comparison of the computed percolation probability P(p) in the simple cubic lattice with the DNN predictions. Figure 5.3: Comparison of the computed bulk modulus K of a simple cubic lattice with the DNN predictions. 107 Figure 5.4: Comparison of the computed ratio of the bulk K and shear moduli of a simple- cubic lattice with the DNN predictions. and are the force constants. 5.4 Summary In this chapter we presented a DNN for predicting the percolation and physical properties of two- and three- dimensional systems. All the predictions are in excellent agreement with the data, even though only a small portion of the data was used in the training of the DNN. In particular, the DNN predicts correctly the phase transition at pc, even though the training data were for the state of the system far from pc. This opens up the possibility of using the DNN for predicting physical properties of many types of materials that may undergo phase transformation, but the available data are far from the transition point. 108 Chapter 6 Machine Learning Algorithm for Identifying and Predicting Mutations in Amyotrophic Lateral Sclerosis 6.1 Introduction Amyotrophic lateral sclerosis (ALS) is a devastating adult-onset neurodegenerative disease, characterized by a progressive loss of motor neurons and denervation of muscle bers, which results in muscle weakness and paralysis. With an incidence of 2.7 cases per 100,000 individuals (Logroscino et al. (2010)), it is a common neurodegenerative disorder and is more prevalent in the later years of life. In the absence of a medical cure, average life expectancy is between 2 and 3 years after diagnosis. Only two drugs are currently available, namely, riluzole (Miller et al. (2012)) and radicava (Rothstein (2017)), which slow progression of the disease moderately. ALS pathology manifests a specic degeneration of the upper and lower motor neurons, leaving many other cell types unimpaired. The disease is multifactorial, as multiple processes play a role, including protein aggregation, excitotoxicity, and RNA processing impairments (Robberecht and Philips (2013)). Based on the knowledge of such processes, a broad range of therapeutic strategies have been attempted in rodent models of ALS, but, unfortunately, positive results have not been frequently reproduced in human trials (Goyal and Mozaar (2014)). Approximately 90 percent of ALS cases are sporadic (sALS), with the remaining 10 percent 109 of the cases having a family member with the disease (fALS) and, thus, they likely inherited a disease-causing mutation. ALS is mostly viewed as a monogenic disease, with SOD1 muta- tions accounting for approximately 10-20 percent of all familial cases. More than 100 dierent mutations in the SOD1 gene cause ALS with varying severity and penetrance (Andersen and Al-Chalabi (2011)). The genetic variation that explains the largest number of ALS cases is the GGGGCC repeat expansions in gene chromosome 9 open reading frame 72 (C9ORF72), also known as the C9ORF72 hexanucleotide repeat expansion, as the repeat expansion causes ALS in about 23.5 percent of fALS7. The intronic hexanucleotide repeat expansion in C9ORF72 is also the most common cause of ALS and frontotemporal dementia (FTD), as it accounts for over 50 percent of ALS in northern Europe and 10 percent of cases worldwide (Renton et al. (2014)). About 70 percent of fALS and 15 percent sALS can be explained by known ALS mutations (Renton et al., 2014). In addition, the heredity of sALS due to common variants is predicted to be about 18 percent (Van Rheenen et al. (2016)). This implies a strong basis for further genetic investigation into ALS, both in fALS and in sALS. Interestingly, approximately 30 genes have been associated with ALS, and the mutations can cause the disease by gain-of-function (GOF), as well as by loss-of-function (LOF), mechanisms and contribute to multiple pathways. Moreover, as fALS and sALS are clinically indistinguishable, common disease mechanisms may be expected. The clinical presentations are, however, quite variable between patients, including large dierences between age of the onset - from early 20s to 70s - site of the onset - limbs vs bulbar - and progression rate with survival of less than 6 months to 10 percent of the patients surviving longer than 10 years. Some mutations are correlated with particular ALS phenotypes, e.g., mutations in SETX frequently cause young onset ALS (Avemaria et al. (2011); Chia et al. (2018); DeJesus-Hernandez et al. (2011); Elden et al. (2010); Renton et al. (2011); Shi et al. (2018)). As ALS is a multifactorial disease where many biological mechanisms contribute to motor neuron death, and because it is not yet clear which mechanisms are crucial to eectively treat (specic) ALS patients, it is important to be able to study the pathophysiology induced by each new ALS mutation. In this chapter we put forth a hypothesis and describe a machine-learning (ML) algorithm for testing it. We hypothesize that many unexplained cases of ALS are caused by genetic 110 mutations, and that identication of the mutations is possible and predictable. Clearly, if the hypothesis is true, its implication would be imperative for the development of novel therapeutic strategies, especially as gene-correction or gene-editing strategies are now in advanced stages for patients with motor neuron disease. The ML that we propose has not been developed, or even suggested previously for ALS or any other disease with a similar genetic architecture. Algorithms that predict the \deleteriousness" of a coding genetic variant, such as SiFT and PolyPhen scores, which are referred to as Variant Eect Predictors (VEPs), may be excellent predictors of the eect of a coding variant on the encoded protein, but are poor predictors of whether a variant causes ALS. 6.2 The data Variants were curated from the collection held by the Amyotrophic Lateral Sclerosis Online database. Each variant entry included the original article in which the method for its discovery is described, which were utilized to lter the entries and keep those whose described variants were conrmed to have been inherited with full segregation, i.e., those for which both the diagnosis of disease and sequencing of the variant were conrmed in at least two consecutive generations in the proband's family. Additional papers were identied and obtained from PubMed and other sources. The results from both searches were then examined thoroughly. All the collected variants were non-synonymous single nucleotide polymorphism (non-SNPs), located in the exonic regions. Overall, twenty-two variants were collected that matched the criteria for high likelihood of causality that we set. Control variants that were designated as \non-ALS" were curated from supercentenarian patients in the previous sequencing data. Variants were ltered out in multiple increments, and the ltered and non-ltered groups were statistically compared to each other to see whether there are signicant dierences between their mean allele frequencies and quality control scores. Initially, the data were annotated using the ANNOVAR software available on the USC high- performance computing network, which provide each variant with pathogenicity scores from multiple algorithmic softwares, such as FATHMM, PolyPhen, MetaSVM, and others, when available. Additionally, variants were also annotated using an R script with REVEL, whose data le was downloaded from an open source (Ioannidis et al. (2016)). 111 The initial dataset of 359,292 variants from supercentenarian individuals was ltered into lower-frequency group (with frequency less than 0.05), which comprised of 18,639 cases, and a higher-frequency one with the remaining 340,550 cases, in order to remove the more high- frequency variants that are common to the population. The lower-frequency group was ltered from the overall set contained variants whose population frequency was higher than what is expected for ALS variants. The primary reason that such variants are still included is to have an adequate number of data points for the training and testing tests. Next, the lower frequency variants were removed if they were synonymous, intronic, and insertion-deletions (8,369 vari- ants), leaving only entries with the SNPs (10,271 variants) to match the characteristics of the ALS variants, and to be properly annotated. The remaining non-ALS variants served as the basis for deriving ve diering control datasets. One dataset, referred to as A, contained variants with the top 5000th score and above, including repeated scores (4,985 variants). Another one, dataset B, consisted of the 90th percentile and above (1473 variants), while dataset C included 958 variants that were present within at least two dierent individuals. Two datasets, D and E, contained variants from multiple individuals that overlapped with the top 5000th and top 90th percentile sets (622 and 237 variants, respectively). Before utilizing them in the training of the machine-learning algorithm and testing its accuracy, all variants with incomplete pathogenicity scores were re- moved, as the data could not be used with the algorithm if they did not contain scores in all the variables. We then added all the entries from dataset 1 to each of the ve non-ALS datasets, in order to generate the ve dierent test and training sets, which contained both types of data entries. The resulting ve datasets - ALS and non-ALS combined - were annotated using the ANNOVAR software, which provides each variant with pathogenicity scores from multiple algorithmic softwares, including FATHMM, PolyPhen, MetaSVM, and others. Additionally, variants were also annotated using an R script with REVEL and MCAP v.1.3, whose data les were obtained from their respective home websites. All variants with incomplete pathogenicity scores were removed as they could not be used within the machine-learning system if they did not contain scores in all the variables. 112 6.3 Methodology We use a highly accurate classifying technique that, based on their genetic information, distin- guishes the ALS cases from the healthy patients. The classier provides deep understanding of the changes in the variables that may cause familial form of ALS. We should, however, point out that while there are extensive data for non-ALS cases, the ALS data are very limited and, therefore, the majority class - the non-ALS cases - is large, whereas the minority one - the ALS class - is very small. When this happens in a classication problem, one has class imbalance, a well-known issue in machine learning. In such a case eective classication becomes a highly complex problem, because the traditional algorithms ignore or underweight the critical minority class, which is why most of the statistical learning methods have deceptively high accuracy, only if the majority class has high accuracy (Krawczyk et al. (2016)), which occurs because of the way some of such methods are designed. In imbalanced classication problems, however, the aim is to have high accuracy for the minority class. This implies that while having overall high classication accuracy is important, it is even more important to achieve the highest precision for classifying the minority class. Various techniques that have been used to address imbalanced class problems are either based on data sampling or boosting algorithms. In the rst group the class distribution is balanced by either randomly removing data from the majority class, which is referred to as random under-sampling (RUS), or by adding data to the minority class by duplicating the already available data, or by generating synthetic data. Removing data means, however, losing valuable information, whereas adding data to the minority class may cause overtting and increasing the training time. The simplest classication scheme is one in which a data point is picked and is classied at random. A classier is called weak if it performs slightly better than a completely random classier. Boosting algorithms improve the classication predictions by training a sequence of weak models, each of which compensates the weaknesses of its predecessors. They are based on either reweighting of the data, or resampling, or both. Those that are based on reweighting of the data assign higher weights to misclassied samples, which in most cases are from the minority class. Next, the data with modied weights are fed to the base learner and classication is again carried out. This is repeated iteratively during which new weights 113 are assigned, with the ultimate goal being to classify the input data correctly (Chen (2017); Kozlovskaia and Zaytsev (2018); Seiert et al. (2010)). Among the most common boosting methods that have been developed based on the RUS technique is the RUSBoost algorithm (Galar et al. (2013); Krawczyk et al. (2016); Seiert et al. (2008, 2010)), which randomly adjusts the class distribution in the dataset until a proper class distribution is generated. The algorithm is faster and simpler than the boosting methods that are based on over-sampling and, relative to other boosting algorithms, it requires smaller new training dataset (Seiert et al. (2010)). Therefore, we utilize RUSBoost algorithm in the study of this chapter which has been explained in chapter 1 (section 1.2.2). In the present problem we have only two possible classes of data, 1 ALScases and 0 non ALScases. 6.3.1 Performance Metrics To evaluate the accuracy of the RUSBoost algorithm we utilize what is called confusion or error matrix, one of most widely used performance metric for the boosting methods, which determines the accuracy of training of the weak learners. The confusion matrix evaluates separately the training accuracy of the two classes, as well as a combination of the two (Chen (2017)). The classier is applied to a set of examples for the testing, after which the confusion matrix is constructed. The ALS data that are classied correctly are placed into the true positive (TP) cell of the algorithm, while the incorrectly classied portion falls into the false negative (FN) cell. Similarly, the correctly classied non-ALS data are labeled true negative (TN), while the incorrectly classied part of this class of data constitutes the false positive (FP) group. Table 6.1 presents the input data that has two classes of output. Each column represents a predicted class, while each row indicates instances in an actual class, referred to as the ground truth. Based on the predictions for the test data contained in the confusion matrix, the accuracy of the classier is evaluated. The overall accuracy A of the classier is dened by A = TP + TN TP + FP + TN + FN : (6.1) Instead of the overall accuracy, one may use another quantity called recall or true positive 114 Table 6.1: The confusion matrix Predicted condition Positive prediction (ALS) Negative prediction (non-ALS) True condition Positive class (ALS) True Positive (TP) False Negative (FN) Negative class (non-ALS) False Positive (FP) True Negative (TN) rate (TPR)R + , which is a measure of ALS cases that are classied correctly. It is dened by R + = TP TP + FN : (6.2) Another quantity that is used is precisionP, which is indicative of the fraction of the true ALS cases among those classied as ALS, and is dened by P = TP TP + FP : (6.3) As pointed out earlier, when analyzing imbalanced datasets, it is important to not only have high overall accuracyA for the classier, but also high TPRR + for the minority class. Another measure of an accurate classier is the so-called F-measure F m , F m = (1 + 2 )R + P 2 R + +P (6.4) where is a tunable parameter that measures relative importance ofR + andP. Thus,F m indicates that a good classier should have both highR + andP. In our study we assumed that, = 0:2, which implies that F m = 1:04R + P 0:04R + +P (6.5) Varying enables one to place more emphasis on either the TPR or the precision. It is, of course, crucial to classify the ALS data correctly. Thus, the choice of = 0:2 was intended to attribute more weight to the ALS data, and to ensure that their accurate classication is more important. 115 6.4 Results All the datasets were compared with each other based on the same 24 variables. The training data for the RUSBoost algorithm contained nearly 80 percent of the datasets, while the rest were used for testing the trained ML algorithm. The three most important parameters based on the confusion matrix for the training and testing the ve datasets are presented in Table 6.2. Since it is crucial to correctly classify the ALS data - the minority class - the true positive rate (TPR)R + is the most important parameter in the study, because it indicates that the trained ML algorithm better classies the minority class, whereas A is indicative of the overall accuracy of the classier. Table 6.2: Classier evaluation on training and test data A R + P F-measure Dataset A 0.98 1 0.3 0.917 Dataset B 0.972 1 0.5 0.962 Dataset C 0.965 1 0.545 0.968 Dataset D 1 1 1 1 Dataset E 0.928 1 0.666 0.981 To better evaluate the performance of the RUSBoost, an analysis of the eect of the size of the datasets A - E on the accuracy of the trained ML algorithm for classifying majority class (non-ALS) was carried out for all the ve datasets. Thus, we also resized the test dataset to be around 20 percent of the main dataset. Figure 6.1 presents the eect of the size of the non-ALS dataset on the three measures of the accuracy of the predictions listed in Table 1 and described in the Method section below. We rst note that trends for dataset E is slightly dierent from those of the other four datasets. This is, however, due to the size of the dataset that is smaller than the other four. 116 50 60 70 80 90 100 50 550 1050 1550 2050 2550 Percent Significant Non-ALS Data Size Dataset A A R F_m F m 50 60 70 80 90 100 50 200 350 500 650 800 Percent Significant Non-ALS Data Size Dataset B A R F_m F m 50 60 70 80 90 100 50 150 250 350 450 550 Percent Significant Non-ALS Data Size Dataset C A R F_m F m 0 20 40 60 80 100 50 150 250 350 Percent Significant Non-ALS Data Size Dataset D A R F_m F m 0 20 40 60 80 100 50 70 90 110 130 150 Percent Significant Non-ALS Data Size Dataset E A R F_m F m Figure 6.1: The eect of Non-ALS data size on percent signicant 117 The accuracyR + of the predictions for the ALS variants is 100 percent, which is remarkable. The overall accuracy of the algorithm,A, is between 92.8 and 98 percent. Note that since the algorithm classies the ALS variants with 100 percent accuracy, the overall accuracy is essentially the accuracy of the predictions for the non-ALS variants, which is also excellent. The general trends in Figure 6.1 suggest that at the beginning, the rate of increase in the accuracy is very steep, but it slows down when the size of the datasets becomes relatively large. By increasing the size of the non-ALS datasets, the classier's overall accuracyA and the F-measureF m both increase up to a certain data size. With further increase in the dataset size, however,F m andA for the trained ML algorithm level o, and may even decrease slightly. Thus, it is crucial to select the size of the datasets carefully, in order to have the highest overall accuracyA, since it represents the accuracy of classifying the majority class, the non-ALS variants. To identify which of the 24 variables have the most eect on the classier of the two group, the ALS and non-ALS variants, we used two methods. First, we used Parallel Coordinates Plots (PCPs), which are the ideal tool for plotting multivariate data, as they allow comparing the many variables together and seeing the possible relationships between them. In a PCP each variable has its own axis, and all the axes parallel to each other. Each axis may have a distinct scale, because each variable may have a dierent unit of measurement. Alternatively, all the axes can be normalized to keep all the scales on all the axes uniform. Values of the variables are plotted as a series of lines that connected across all the axes, implying that each line is a collection of points placed on each axis that have all been connected together. The plot is made of the median values for each variable. Figure 6.2 shows how the RUSBoost classier has distinguished ALS from the non-ALS variants during the training process. We interpret Figure 6.2 as implying that the mutation assessors, FATHMM score, PROVEAN score, Vest3 score, CADD phred, DANN score, meta- SVM score, metaLR, and REVEL are the most important variables for separating the two classes. 118 Figure 6.2: Parallel coordinates plot 119 We also used saliency maps, usually rendered as heatmaps, for the training data, where being "hot" implies being in the regions that have very strong in uence on the algorithm's nal classication. They are particularly helpful when the algorithm incorrectly classies a certain data point or dataset, because one can look at the input features that led to that classication. Figure 6.3 presents the resulting maps, showing the increase or decrease in the magnitude of value of the various variables for a patient. Larger values of the variables indicate that they have more in uence on the results of the nal classication. Consistent with Figure 6.2, Figure 6.33 indicates that variables 7, 8, 9, 11, 12, 19, and 24, which correspond to FATHMM - score, PROVEAN score, VEST3 score, CADD phred, DANN score, phyloP7way vertebrate, and REVEL are the most in uencial mutation assessors for distinguishing the two classes of data. Thus, overall, the algorithm has identied 9 variables of mutation assessors as being the most in uencial. 120 Figure 6.3: Saliency Maps (heat maps) for 24 variables in the ve datasets, used for testing the trained algorithm. 121 Chapter 7 Physics- and image-based prediction of uid ow and transport in complex porous membranes and materials by deep learning 1 7.1 Introduction Porous membranes with controlled morphology have many applications in the chemical in- dustry, such as liquid and gas separation, battery separators, and food packaging. Flow and transport processes through porous membranes are controlled by their morphology, as well as the dynamics of the processes. Thus, the ability for predicting the velocity and pressure elds, as well as the concentration elds of the components in a uid mixture passing throughout a membrane is critical for designing new synthetic membranes, as well as utilizing ecient natural membranes for targeted applications. For example, knowledge of ow and transport properties of membranes, such as their eective permeability and diusivity, is very useful for their synthesis with distinct morphologies and a range of porosity distribution, from highly dense to relatively high porous membranes. Estimating the ow and transport properties is particularly important due to their in uence on the transport of specic biological and chemical 1 This chapter was published as a paper in Journal of Membrane Science, 119050 (2021) 122 agents through membranes that prevent other components in a mixture from penetrating their pore space, leading to an ecient separation process (Quartarone et al. (2002); Gibson et al. (2001); Kong et al. (2016); Suwanmethanond et al. (2000); Sedigh et al. (2000); Elyassi et al. (2007); Dudchenko and Mauter (2020); Farahbakhsh et al. (2019); Tjaden et al. (2016); Guo et al. (2019); Tahmasebi et al. (2020); Su et al. (2020)). Well-designed experiments do provide accurate information about the morphology of mem- branes and insights into their properties. But they can also be very time consuming and costly. The diculty is compounded by the fact that experiments can examine only a limited num- ber of samples. In addition, it is dicult and costly, though not impossible, to determine by experiments the velocity and pressure elds throughout membranes, which provide very useful insights into their ow and separation characteristics. Another way of characterizing a porous membrane's microstructure is by utilizing its two- or three-dimensional (3D) images, which may be obtained by focused ion beam scanning electron microscopy, X-ray computed tomogra- phy, or transmission electron microscopy. High-resolution images obtained by such techniques reveal detailed information for a membrane's morphology, which enables one to relate them to its important macroscopic properties (Kong et al. (2016); Dudchenko and Mauter (2020); Rall et al. (2020b); Zhou et al. (2020); Hu et al. (2020)). On the other hand, if developed based on rigorous theoretical foundations, computational methods can not only provide accurate estimates of the important macroscopic properties of membranes, as well as the details of the velocity and pressure elds in their pore space, they can also be used as guidelines for synthesizing optimal membranes for specic applications. The bottleneck is, of course, the computations' time, which could be prohibitive. Thus, the most accurate and ecient way of membrane characterization may combine computational modeling and suitable experiments. In this chapter we focus on the computational approaches to characterization of membranes and estimation of their ow properties. Over the past decade advances in machine-learning (ML) techniques have given rise to a fertile ground for their applications to numerous dicult problems in science and engineering (Park et al. (2019); Kamrava et al. (2020b); Zhang et al. (2020); Rall et al. (2020a,a); Barnett et al. (2020); Roehl et al. (2018); Bagheri et al. (2019); Ahmad et al. (2015); Libotean et al. (2009)). Generally speaking, the ML methods are classied into three groups, namely, supervised, unsupervised, and semi-supervised learning methods (SL, 123 USL, and SSL, respectively). In the SL the data, which are the pressure and uid velocity elds in the present chapter, are accompanied by labels and the goal is to predict labels for new data that were not supplied to the network for the learning purpose, where label refers to the property of interest that one seeks to predict. Unlike SL, in the USL the data do not have labels and the network identies patterns in the dataset. In SSL the network takes advantage of the data with or without labels as well, and the trained network predicts labels for the unseen data. The network structures in the SL and SSL methods are similar. The former methods are usually used for regression or classication, whereas the latter approaches are commonly utilized for clustering or association. It should, however, be pointed out that there are USL methods that are used for classication. Therefore, it has become feasible to use a SL, USL, or SSL algorithm for various purposes (Kamrava et al. (2020b)). While big data oer the opportunity for extracting valuable information on the relation between the input and output data, analyzing them for complex systems, such as porous media, is very dicult if done based on the traditional ML or other approaches. In particular, analyzing large data is typically very time consuming and complex, hence preventing one from taking advantage of their information and insight. Deep learning (DL) approaches, on the other hand, provide a concrete method for analyzing big data in complex systems through their intrinsic features, such as pattern recognition. Vari- ous networks, such as convolutional neural networks (CNNs), shown in Figure 7.1(a), recurrent neural networks shown in Figure 7.1(b), and autoencoder that is depicted in Figure 7.2, have been developed in the past few years for this and other proposed. Comprehensive discussions of such networks and their applications to porous media and materials are given by Tahmasebi et al. (Tahmasebi et al. (2020)). Several ML approaches have been used in studies related to many types of membranes, both natural and synthetic, in various engineering and science disciplines. In particular, the ML approach has been employed for optimizing nanocomposite membranes (Su et al. (2020)), transport process (Rall et al. (2020b); Zhang et al. (2020)) through them, their reconstruction and design (Zhang et al. (2020)), and predicting their performance (Hu et al. (2020); Rall et al. (2020a, 2019); Barnett et al. (2020)) and physical properties (Dudchenko and Mauter (2020)), modeling fouling growth and ux (Hu et al. (2020); Roehl et al. (2018); Bagheri et al. (2019)), forecasting the plasticization pressure (Ahmad et al. (2015)), modeling and characterization 124 of reverse osmosis membranes (Libotean et al. (2009); Farahbakhsh et al. (2019)), estimating their lifetime (Liu et al. (2019a)) and potential in living cells, and their activity, as well as their permeation and discriminating membrane proteins (Leonard et al. (2015); Lee et al. (2016); Gromiha and Yabuki (2008); Lee et al. (2018); Brocke et al. (2019)). None of such studies described, however, the membranes in terms of their uid ow and transport characteristics. (a) (b) Input Convolution + activation Pooling Convolution + activation Pooling Fully connected Output … Figure 7.1: (a) Schematic of convolutional neural network, and (b) recurrent neural network. Latent Layer Conv 3 3, stride 2 2 Conv 3 3, stride 1 1 Residual block 3 3 Transposed conv 3 3, stride 1 1 Transposed conv 3 3, stride 2 2 Figure 7.2: Schematic of an autoencoder network. The convolution block contains the convo- lution layer, batch normalization, the ReLU activation function, and pooling layer. The same settings are used for the decoder (right). The encoder and decoder blocks are represented by E and D, respectively. The subscripts indicate the corresponding layers. In this chapter we propose a ML method for developing a mapping between the morphology 125 of a porous membrane as the input, and the pressure and velocity elds throughout the mem- brane as the output. We study a particular case in which both the input and output data are represented by images, which allows us to take advantage of advanced DL methods for image- to-image mapping. Among the various DL networks, autoencoders have been shown to have excellent performance in image-to-image mapping between the input and output data. Such methods with conventional structure have been, however, used in the cases in which the input and output images were the same and the goal was to extract their most important features. If the input and output images are distinct, then DL networks with similar structures that consist of encoder and decoder have been developed, and are known as the U-Net and residual U-Net (RU-Net). Such networks are more ecient and accurate for image-to-image mapping than an autoencoder with a conventional structure. RU-Net was developed based on the U- Net architecture (Ronneberger et al. (2015); Wen et al. (2019)), and the two of them together enable one to carry out eciently encoding and decoding for cases in which the input and output data are not similar. U-Net and RU-Net have already been used in various applications with promising results (Wen et al. (2019); Ronneberger et al. (2015); Mo et al. (2019); Alom et al. (2019); Falk et al. (2019); Tang et al. (2020)). As described below, since in the present study the output data are the pressure and uid velocity elds in a membrane, both of which at every point of a membrane are functions of the previous time step and neighboring locations, a network with a recurrent structure will be utilized. Therefore, the network that we develop is a deep recurrent RU-Net, which is based on the RU-Net for developing a mapping between the input and output data. This type of network allows the system to receive feedback from the locations throughout, for example, a membrane. Thus, the network in the present study provides a mapping between the input and output data, while taking into account the relation between the output data from the same category in various sequences of data. A recurrent autoencoder (RAE) network can have very dierent structures, some of which include sequence-to-sequence RAE, encoder and decoder RAEs, separate encoder and decoder RAE, and latent feedback and output feedback RAEs, with each one of them having its own benets and drawbacks (Yang et al. (2020); Sutskever et al. (2014); Kieu et al. (2019); Wang et al. (2016); Cheng et al. (2019)). The data that we use in our study in this chapter are images of a porous membrane and the uid pressure, with the output date being the velocity distributions at four distinct times. The recurrent RU-Net discovers the correlation between the input and output data through encoding 126 and decoding, as well as the relation between the sequence of the output data. Consequently, we can compute the velocity and pressure distributions in new and distinct membranes' images in a matter of a few CPU seconds that, in particular, are very useful when one has to model and simulate a large number of membranes. The rest of this chapter is organized as follows. In section 7.2 we describe the data used in our study, which are in the form of images. We then describe the structure of the network that we develop and use, and the methodology based on which the image-to-image mapping is carried out. In section 7.3, we present evaluation of the performance of the proposed network for characterization of uid ow in porous membranes. Section 7.4 will summarize this chapter. 7.2 Materials and Methods We rst describe the porous membrane, after which the structure of the recurrent RU-Net will be explained. 7.2.1 The porous membrane Since the morphology of porous membranes plays the most essential role in ow and transport in their pore space, we targeted such materials and aim to show that the computation time for predicting their properties reduces signicantly with the proposed method. The porous material selected for this study is a UniSart R nitrocellulose membrane. From a high-resolution 3D image of the membranes we extracted a large number of 2D slices, ensuring that they were very diverse in order to represent the membrane's morphology faithfully. The sample size was 40 40 m 2 , divided into 175 175 cells. Furthermore, for the computational uid dynamics (CFD) part of the process, the uid was injected into the membrane on one side, and a xed pressure was imposed on the opposite side, the other two boundaries of the system were considered as impermeable, and no-slip condition on the solid surface was also assumed. The time-step during the CFD was 7:5 10 3 seconds, and data were collected after every two steps. The membrane's morphological and uid properties are listed in Table 1. 127 Table 7.1: The membrane's and uid's properties Value Porosity 0.77 Thickness 60 m Permeability 10 12 m 2 Average pore size 8 m Injected rate 3.610 12 m 3 /s Viscosity 0.001 m 2 /s Density 1000 kg/m 3 7.2.2 Recurrent RU-Net structure To start the ML computation, a large number of training datasets is required. The 2D images in our study were processed and segmented in order to develop binary images that represent the pore space and the solid matrix. Then, the output, i.e., the uid velocity and pressure elds throughout the pore space of all the images were computed by carrying out simulation of uid ow (see below). Using the results, the ML network discovers and learns the relationship between the morphology of the membranes and the velocity and pressure elds in their pore space. The trained network is then tested against datasets that were not used for its training. If the network \passes" the test, one can use it for analyzing other membranes without car- rying out the time-consuming computational modeling. Let us now describe the details of the computational approach. Recent advances in deep neural networks have led to successful paths for image processing, leading to signicant progress in studies in which the input data are in the form of images. The CNNs, for example, are a group of deep neural networks for which the input can be gray- scale or color images. They represent supervised feed-forward neural networks, and consist of several main layers, including convolutional, activation, pooling, and fully connected layers. In the convolutional layer important features of the input data are extracted by utilizing various lters. The equation that describes the application of the lters to 2D input data in the convolutional layer is given by, i;j =A k=m X k=1 l=n X l=1 W k;l x i+k1;j+l1 +b ! ; (7.1) where is the output of the layer after applying a lter with a size mn to the input x,W k;l 128 is the weight between the kth neuron in the previous layer and the jth neuron in the current layer, and b is bias for the layer. In the activation layer, extracted features are transformed by a nonlinear activation functionA that allows the network to learn more about the data. The extracted features are then down-sampled - reduced to a smaller number of important features - in the pooling layers. The nal layer, a fully connected one, completes the task that might be regression or classication, using the extracted features. The number of layers depends on the complexity of the input data and the target. Autoencoders are another group of deep neural networks that can use images as the input data, and represent USL-type networks. They extract the essential features of the input images, based on which they are reconstructed as the output. In other words, the input and output images are the same in autoencoder networks. Similar to the CNNs, an autoencoder network consists of convolutional, activation, and pooling layers, as well as a layer for up-sampling, i.e., reconstructing to a larger scale. More details on the structure and applications of various types of such networks are given elsewhere (Tahmasebi et al. (2020)), including a discussion of the generative adversarial networks, another type of deep neural networks that have recently gained popularity. To better understand the structure of the network in the present study, one must rst understand the structure of autoencoders. An autoencoder compresses the input to a minimum number of points/features in such a way that the data can still be reconstructed back to their original structure based on the extracted features. They consist of three main parts, namely, encoder , the code F , and decoder , as shown in Figure 7.2, and are expressed by :!F ; (7.2) :F! (7.3) so that the input data are mapped onto the output by passing through F to reproduce the same data. The decoder generates the output based on the extracted points/features in the code layer. The nature of the features and their number that are extracted in the latent layer, which enable the network to reconstruct the output, depend strongly on the complexity of the input data. In the present study, we used the Kullback-Leibler (KL) divergence, which is 129 often used as the penalty term for optimization in autoencoders, and provides great exibility in dealing with a diverse set of data (Kamrava et al. (2020a)). For the problems that involve large datasets and images, U-Net and RU-Net have produced promising results (Wen et al. (2019); Ronneberger et al. (2015)). The structure of a U-Net is shown in Figure 7.3(a), while RU-Net has a structure very similar to U-Net, but with some dierences; see Figure 7.3(b). They both consist of encoding and decoding paths similar to an autoencoder network. In an autoencoder, the encoder and decoder may be separated, with each one functioning individually. In U-Net, however, the output in the decoder depends directly on the input data in the encoder, implying that it is not possible to split the encoder and decoder parts of the network in a manner similar to a regular autoencoder in which compressing the input and its reconstruction are done separately. To produce the output by U-Net, both the latent layer and those in the encoder are involved, no separate computation is carried out, and the units are all connected and exchange information. In other words, U-Net and RU-Net skip connections, shown graphically in Figure 7.3, are used to link the layers in the encoder and decoder. The encoder and decoder parts of U-Net and RU-Net consist of some common layers, namely, the convolutional and activation layers whose functions were already described. The pooling layers in the encoder down-sample the images, while up-sampling or unpooling is performed in the decoder. In addition, the RU-Net performs normalization of the layers. In particular, batch normalization allows one to use higher learning rates and acts as a regularizer for reducing overtting as well (Ioe and Szegedy (2015)). In this operation the meanhxi of the data x and variance Var(x) of batches of the data are computed, and a new normalized variable y is dened by y = xhxi q Var(x) + + : (7.4) Here, and are learnable parameter vectors that have the same size as the input data, and is a small parameter that we set at 10 5 . During training, the layer keeps running estimates of its computed mean and variance, and uses them for normalization during evaluation. The variance is calculated by the biased estimator. Normalization accelerates the training of deep networks because it reduces internal covariant shift. 130 (a) (b) Latent Layer Latent Layer + + + + Copy and concatenate Conv 3 3, stride 2 2 Conv 3 3, stride 1 1 Residual block 3 3 Transposed conv 3 3, stride 1 1 Transposed conv 3 3, stride 2 2 Figure 7.3: Schematic of (a) U-Net and (b) RU-Net. The residual blocks (lower gray blocks) represent the convolution layer, batch normalization, and the ReLU activation function, such that the input of the rst convolution layer is added to the nal results produced by the batch normalization. 131 In addition to batch normalization, there are also residual convolutional blocks in the con- necting section, known as the code or latent layer in a regular autoencoder. To understand residual learning, assume that the underlying mapping by a few layers is L(x), where x is the input fed to the rst layer. Then, the layers t a residual mapping, such as F (x) = L(x){x, rather than estimating L(x). Therefore, the mapping changes to (He et al. (2016)) F (x) +x. The number of residual blocks depends on the performance of the network, as well as on the complexity of the input/output data. The residual blocks improve the network performance and carry out the same-size convolution. Moreover, adding the residual blocks and optimizing the residual mapping speed up signicantly the overall network optimization. The data in the present work are in form of time series representing the pressure and velocity distributions throughout the membrane, implying that each data point aects and is in uenced by its neighbors in the image and in the time series. Therefore, we use a recurrent neural network (RNN) whose main dierence with a feed-forward network is in the feedback loops in the former that enables the RNN to model the contextual information of the sequential data. When a sequence of data is long, a dierent form of the RNN, referred to as the long short-term memory (LSTM), is commonly used (Kamrava et al. (2020c); Du et al. (2015)). In a basic RNN structure, shown in Fig. 1(b), the sequence of input data is shown by Z = (Z 1 ;:::;Z T ), the hidden states of the network by h = (h 1 ;:::;h T ), and the output by O = (O 1 ;:::;O T ), with the relation between them being expressed as follows, h i =A(W Zh Z i +W hh h i1 +b h ); (7.5) O i =A 0 (W hO h i +b O ); (7.6) where W Zh and W hO denote the weights from input layer Z to the hidden layer h, and from the hidden layer h to the output layer O, respectively, and b h and b O denote two bias vectors. A andA 0 are the activation functions for which we use tanh(x). The recurrent part of the network allows it to meaningfully capture the relation between various data points. Inspired by such an architecture, the previously discussed networks may be used within such a framework. The structure of the deep recurrent RU-network proposed in this chapter is shown in Figure 7.4, which is similar to a RU-Net, but with some notable dierences, the most important of which is the recurrent structure used inside the network, as well as the elements utilized in 132 it. The input to the network are images that represent the morphology of a heterogeneous membrane, while the outputs are uid velocity along with the pressure at four dierent times. Since uid ow is a dynamic process, the pressure and uid velocity distributions throughout the membrane at each time step depend on the previous time steps. This feature is shown in Figure 7.4, which is why a recurrent structure is needed to capture the correlations in the dataset that are represented as time series in the pressures and velocities. Latent Layer RNN + + + + Conv 3 3, stride 2 2 Conv 3 3, stride 1 1 Residual block 3 3 Transposed conv 3 3, stride 1 1 Transposed conv 3 3, stride 2 2 Copy and concatenate Figure 7.4: Schematic of the proposed deep recurrent residual U-Net (recurrent RU-Net). Here, x refers to the morphologies of the membrane provided by its 2D images, and y denotes the output (the pressure p and the uid velocities v x and v y ) for that morphology at four distinct times. 7.3 Training Datasets and the Network We rst describe the data sets that we used to train and test the network, and then explain how the computations with the network were carried out. 7.3.1 The training data The data for training the network and testing its accuracy were generated using CFD softwares. Given the segmented images that represent the input data, i.e., the solid matrix and the pore 133 space of the membrane, the pressure and uid velocity elds were computed within the images. Assuming that the uid is incompressible and isothermal, mass conservation is described by the standard divergenceless velocity, r v = 0; (7.7) while, as usual, uid ow is governed by the momentum equation: @v @t + (vr)v = 1 rp 1 r + 1 f ; (7.8) where v is the uid velocity, is the density, p the pressure, is the stress tensor, and f is the body forces. A uid is injected into the membrane at its inlet on the left boundary side. It invades the pore space until it reaches the outlet on the image's right boundary; see Figure 7.5. The pore space is discretized using square blocks and, assuming that uid ow is slow, Eqs. (7.7) and (7.8) are solved in the computational grid using the volume-of- uid method, in order to compute the pressure and velocity proles. We used the open source computational package OpenFOAM to compute the two elds, with the computational grid generated by the SALOME package. The computations were carried out with 10 3 images, randomly selected from a 3D image of the membrane, in order to generate the pressure and velocity elds throughout each of them, after which 80 percent of the data were used for the training, while the rest was utilized to test the accuracy of the trained network. (a) (b) (c) Figure 7.5: (a) Computational grid based on which (b) the uid velocity eld/streamlines, and (c) the pressure eld are computed. 134 7.3.2 The recurrent RU-Net Table 7.2 lists the details of the recurrent U-Net that we used in this study. Note that the latent layer has 4 blocks, and what is listed is for each of its blocks. Generally speaking, if we consider input and output images with sizes of lw, then the network develops the mapping between the input and output data, an image-to-image regression in the time series that represents the two distributions. In the sequential structure of the recurrent RU-Net for carrying out image- to-image regression, the input x and output data are, of course, related, so that the input dataset x j and the corresponding output j are related through the following functional form, j =f(x j ; j1 ): (7.9) In the present study each input dataset (images) produces 8 output datasets, representing the velocity and pressure distributions at four dierent times. Since we use 80 percent of the data for training, we have 800 input images representing the morphology of the membrane. Therefore, the relation between the input and output data should be written more generally as, n x i ; j i o M;T i=1;j=1 ! n x i ; j i1 ; j i o M;T i=1;j=1 ; (7.10) whereM is the number of input datasets used in the training -M = 800 images in the present study - T = 4 is the number of time epochs at which the velocity and pressure distributions were computed and used, and i j 2 R lw is the output as a result of using the ith image in the numerical simulation for solving the mass and momentum conservation equations at the jth time step. Equation (7.9) may further be written in matrix form for all the input data and their corresponding outputs: x k = x i ; j i1 ; k = 1;;MT (7.11) k = j i ; i = 1;;M; j = 1;;T (7.12) (X; ) =f(x k ; k )g MT k=1 ; (7.13) where x k is a matrix that contains the input (image) number k and all the corresponding outputs, so that for 800 input images we have 800 4 = 3; 200 outputs for the velocity prole, 135 and similarly for the pressure distribution. X and are two matrices that contain all the inputs and all the outputs. Typically, a network is simplied and represented by a dataset, a cost function, an algorithm to minimize the cost function, and a model. The network is trained based on the input x to determine the output , Eq. (7.12), which is dened based on the trainable parameters, ^ i =f (x i ;) ; (7.14) where ^ i is the network prediction for the pressure and velocity proles, and is a set of trainable parameters, calculated by minimizing the cost function. The trainable parameters include the number of network's layers, their order and nature, and the learning rate (see below). In our network we used stochastic gradient descent (SGD) method to minimize the cost function and evaluate the trainable parameters. In the SGD method the data are divided into mini-batches, the local minimum of each of which is determined. Minimizing the negative log-likelihood in the cost function produces the maximum likelihood estimation (Bengio et al. (2016)). The negative conditional log-likelihood of the training data is given by J() = 1 M M X i=1 C(x i ; i ;); (7.15) whereC is the cost function, andi represents the index of the randomly-selected training data. The cost function that we have used is the mean-squared errors: C = 1 M M X i=0 ^ i i 2 ; (7.16) where ^ i is the network's prediction. Since the input data, the images, are large, the SGD reduces the computation cost by randomly selecting mini-batches of the samples in the dataset for updating the parameters, described by the following equation, @C(x i ; i ;) @ ; (7.17) with being the learning rate. We used Nvidia Tesla V100 GPU for the training, which took about 4 GPU hours. 136 Table 7.2: Details of the recurrent RU-Net structure. Number of layers Structure Encoder layer 4 Convolutional layer + Batch Normalization + Activation layer (ReLU) + Pooling layer Decoder layer 4 Transposed Convolutional layer + Convolutional layer + Batch Normalization + Activation layer (ReLU) Latent layer (Residual block) 4 Convolutional layer + Batch Normalization + Activation layer (ReLU) + Convolutional layer + Batch-Normalization 7.4 Results Extensive computations were carried out in order to generate the velocity and pressure elds for training the recurrent RU-Net using the two elds, and to test the performance of the trained ML network for the complex membrane whose image we utilized. As the rst step, the predictions of the ML network are compared with the ground-truth models, i.e., the numerical data computed for both the velocity and pressure elds by solving Eqs. (7.7) and (7.8) within the images. The results for the pressure eld are presented in Figure 7.6, which indicate that the pro- posed ML network predicts and captures the behavior of the pressure distribution with very high accuracy. For example, initially the pressure eld propagates very slowly and mildly, whereas it becomes broadly distributed at the nal time epoch at which it was computed, which the ML network reproduces this feature accurately. As the second row of Figure 7.6 indicates, the network has successfully learned the properties of the pressure eld without solving Eqs. (7.7) and (7.8). This is quantied by computing the dierences between the actual data and those predicted by the ML network. The results are shown in the third row of Figure 7.6, indicating exceedingly small errors. Similarly excellent agreement is obtained between the predicted uid velocity eld and the actual numerical data obtained by solving Eqs. (7.7) and (7.8); see Figure 7.7. Hereafter, by velocity we mean v x , the uid velocity in the macroscopic x direction of ow. v y , the uid velocity in the transverse direction is very small. Note that, unlike the gradual changes in the pressure eld in the membrane, as the uid is injected into the pore space, its velocity begins from a relatively high value at the shortest time, t 1 in Figures 7.6 and 7.7, since the invading uid must expel the resident air in the membrane. As the uid advances into the membrane, 137 however, the uid velocity begins to vary as well, which is indicated by the dynamic behavior manifested by Figure 7.7, which the proposed ML algorithm has captured with very negligible errors. Figure 7.6: Comparison between the actual numerical values of the pressurep and the proposed ML method ^ p at four distinct times. The error produced by the ML relative to the ground-truth data are also shown as (^ pp). For better illustration and clarity, the data are all normalized for the pressure and velocity proles. To further quantify the accuracy of the results, we also computed the coecient of deter- mination, R 2 , and root mean-squared error, , dened by R 2 = 1 P n m=1 m ^ m 2 2 P n m=1 m (1=n) P n j=1 j 2 2 ; (7.18) = v u u t 1 K Nc X m=1 m ^ m 2 2 ; (7.19) where K is the number of samples, with K = M = 800 for the training, and K = 200 for the testing, andjj 2 denotes the L 2 norm. As is well known, R 2 is a measure that indicates the 138 dierence between the predicted value and the model distribution, and for high accuracy one must have, R 2 1. is a metric that measures the deviation between the predicted and the model values. In a convergent algorithm, as the number of measurementsK increases, should decrease and approach 0, indicating that the predictions are accurate, and that the algorithm is robust. Figure 7.7: Same as in Figure 6, but for the uid velocity. The results for both measures are shown in Figure 7.8, indicating that both the uid pressure and velocity have relatively highR 2 and low, and both are improved as the training proceeds until they reach a plateau. 139 0.5 0.6 0.7 0.8 0.9 1 1.1 5 55 105 155 R 2 Epoch Training Test 0 0.2 0.4 0.6 0.8 1 5 55 105 155 205 R 2 Epoch Training Test 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 5 55 105 155 σ Epoch Training Test (a) (b) (c) (d) Figure 7.8: Comparison of the accuracy of the ML algorithm, as measured byR 2 and for the training and testing data, at many epochs for predicting (a) and (c) the pressure, and (b) and (d) the uid velocity. 140 Figure 7.9: Comparison between the actual distribution of the pressure and those predicted by the proposed ML method for a random point at four distinct times. (a) t 1 ; (b) t 2 ; (c) t 3 , and (d) t 4 . After evaluating the overall performance, we may consider a more stringent comparison by comparing the pressure and velocity for a randomly-selected point within the computational domain. To do so, a pixel, one at [44,75], in the pore space of the membrane was selected at random, the pressures and velocities for all the 200 testing images were determined at that point, and their distributions were constructed. The results were then compared with the actual data produced by the numerical simulation of Eqs. (7.7) and (7.8) at the aforementioned four distinct times. The comparisons are shown in Figures 7.9 and 7.10, which allow us to better evaluate the accuracy of the method in a point-wise fashion. As Figures 7.9 and 7.10 indicate, the pressure distributions are captured accurately by the ML network. For example, the long tails of distributions have been correctly reproduced by the ML network. Similarly accurate distributions were obtained for the ow velocity. 141 Figure 7.10: Same as in Figure 9, but for the uid velocity. 142 To further evaluate the accuracy of the proposed algorithm, we compare the overall per- formance of the network by computing the ensemble-averaged maps of the uid velocity and pressure at the four dierent times, based on the actual numerical data and those produced by the proposed ML algorithm. The results are shown in Figures 7.11 and 7.12. The pressure proles at the four times are very similar to the numerical data. Similarly, the velocity proles also exhibit trends very similar to those of the actual data. It should be emphasized that the results were all produced using the trained networks without solving the governing equations, Eqs. (7.7) and (7.8), and yet they are in excellent agreement with the data. Figure 7.11: Comparison between the actual ensemble-averaged proles of the pressure at four distinct times and the predictions. 143 Figure 7.12: Same as in Figure 11, but for the uid velocity. 7.5 Summary Predicting the physical properties of membranes and complex materials is an outstanding prob- lem that has been studied for several decades. Various accurate, but computationally expensive, methods have been used in such calculations and predictions. Depending on the size of the domain, the techniques used require employing high-performance computers. Parallel to such eorts, the ML techniques have also advanced to the point that they have had a profound on the predictive power that one wishes to have for problems dealing with complex media. In this chapter, we developed a ML method that takes the morphology of a complex mem- brane as its input and predicts its ow properties, such as the pressure and velocity distributions in the pore space of the membrane. To do so, a set of deep-learning algorithms were integrated within a single and interconnected framework that connects the pore space to completely dif- ferent output, i.e., the pressure and velocity distributions. Furthermore, the computations that train the network involve a higher level of complexity than the usual computations with the ML networks in that they use the dynamic evolution of the two distributions as the input. The proposed network was tested on 200 membrane images, with the results indicating 144 excellent agreement with those produced by the direct computational methods. Signicant acceleration of the computations for making the predictions was obtained, so much so that the network predicts the ow properties in a very short time, while it takes much longer for the the standard numerical simulators to do the same. In a separate chapter [3] we demonstrated that if a ML algorithm is trained by data for a variety of porous media, then it can also provide accurate predictions for other types of porous materials that may be even dissimilar to those used in the training. Clearly, the same idea may also be utilized here. Then, the demonstrated capability of the ML network should nd many applications in various other problems in chemical science, materials characterization, biological systems, etc. 145 Chapter 8 Simulating uid ow in complex porous materials: Integrating the governing equations with deep-layered machines 8.1 Introduction Fluid ow and transport in heterogeneous porous media are phenomena that arise in many important systems, from biological tissues, to composite materials, soil, wood and paper. With advances in instrumentations, high resolution images of porous materials can be used directly in the simulation of such phenomena. The computations are, however, highly intensive. On the other hand, although machine-learning (ML) algorithms have been used recently for predicting ow and transport properties of porous media, they lack a rigorous, physics-based foundation and rely mostly on correlations. We introduce a ML approach that incorporates mass conser- vation and the Navier-Stokes equations in its learning process. By training the algorithm to relatively limited data obtained from the solutions of the two equations over a time interval, we show that the approach provides highly accurate predictions for the properties of porous media at all other times and spatial locations, while reducing the computation time signi- cantly. More importantly, we show that when the algorithm is used for a completely dierent porous medium without using any data from it, it again provides very accurate predictions for its properties. Thus, one has a deep network for predicting the ow (and transport) properties 146 of complex porous media. Fluid ow and transport in heterogeneous porous media are of fundamental importance to the working of a wide variety of systems of scientic interest, as well as applications (Blunt (2017b); Sahimi (2011b)). Examples of such porous media include catalysts, membranes, l- ters, adsorbents, print paper, wood, nanostructured materials, and biological tissues, as well as soil, pavement, and oil, gas, and geothermal reservoirs. Such porous media are typically heterogeneous, with the heterogeneity manifesting itself in the shape, size, connectivity and surface structure of the pores at small scales, and in the spatial variations of the porosity, per- meability, and the elastic moduli at large length scales. Thus, any attempt to model ow and transport in porous media entails having the ability to handle big data that is used (Jablonka et al. (2020)) in computing the spatial distribution of the pressure, uid velocity, etc., through- out the pore space. The heterogeneity and the associated big data, contained in high-resolution two- or three-dimensional (3D) images of porous media used in their modeling, imply that the calculations are highly intensive. Thus, developing ecient predictive algorithms has always been an active area of research. Signicant advancements have been made over the last decade in the development of deep- learning (DL) approaches that have made considerable contributions to progress in image anal- ysis, which is also important to studying uid ow and transport in porous media (see chapter 2 and 3)(Wu et al. (2019), and other elds. The traditional neural networks, developed to han- dle large data, have great potential as approximators. Feed-forward neural networks (FFNNs), representing supervised learning methods, try to identify the relationship between the input and output iteratively by minimizing a cost function. To alleviate the computational bur- den associated with the minimization, advanced optimization methods have been developed (Bishop (1995)). There is, however, no systematic approach for increasing the accuracy of fully-connected FFNNs, as they rely on correlations between data and the properties to be predicted, hence requiring a large amount of training data for acceptable accuracy. Thus, it is most desirable to develop alternatives. Progress has been made very recently to develop such alternatives, commonly referred to as physics-informed machine learning (PIML) in which the network is trained partly based on the fundamental equations that govern the physics of the process under study. Thus, by incorporating the equations in the cost function one speeds up the convergence to, and produce, 147 accurate predictions, while the network requires training with much less data. The idea of using the governing equations that govern the physics of uid ow and transport in porous media in the cost function and optimization was rst proposed by Sahimi and co-workers (Hamzehpour et al. (2007)). Development of such techniques based on the ML methods was rst reported for solving dierential (Raissi et al. (2017)) and partial dierential equations (Bar-Sinai et al. (2019); Meade and Fernandez (1994); He et al. (2000); Aarts and Van Der Veer (2001); Han et al. (2018); Raissi et al. (2019b)), and has been recently proposed for analyzing hydrodynamic systems (Raissi et al. (2020)). In this chapter, we aim to ll the gap between the machine learning methods and the nu- merical physics-based techniques in a porous membrane, with an emphasis on uid ow, and combine the advantages of both sides to achieve a solution that is not only ecient but also provides accurate results. Machine learning methods are good estimator techniques when they have been eciently trained with large and enough variable datasets. Multiple studies have ap- plied physics informed feed-forward neural networks. However, since we are dealing with porous media images it is not practical to use fully connected neural networks due to the complexity of the images and the lack of eciency of such networks in discovering complicated patterns. To this end, we have trained a physics-informed Recurrent encoder-decoder network (PIRED) in which the cost function is dened based on the continuity and momentum equations borrowed from the numerical techniques to help the network and enrich the training. The network is trained based on images of a porous membrane as the input data and its corresponding veloc- ity and pressure images as the output data to accurately predict such elds for other porous material based on their morphology. 8.2 Methodology In this chapter we propose an approach based on a ML method that incorporates in its training the governing equations for uid ow in porous media, i.e., mass conservation (MC) and the Navier-Stokes (NS) equations. The result is a highly ecient method for predicting the ow properties. Our multiphysics approach integrates the MC and NS equations and digital images of a heterogeneous pore space with training of the ML algorithm. Since the input data, such as the images of a pore space, are complex and large, one needs a network that decreases the 148 computations time. Despite increasing applications of the ML methods, there has been only a limited number of attempts for addressing problems associated with porous materials (see chapter 1, 2, 3 and 4)(Wu et al. (2019); van der Linden et al. (2016); Erofeev et al. (2019); Mosser et al. (2017); Zhang et al. (2021); Wei et al. (2018); Alqahtani et al. (2020); Feng et al. (2018); Srinivasan et al. (2019)). In particular, to our knowledge there has been no attempt to develop an ML method that incorporates the MC and NS equations in its training. If the input and output are both in the form of images, as is the case in this chapter, an autoencoder network produces more accurate predictions. But when the input or output is represented by spatial and/or time series, which is the case when one solves the MC and NS equations, recurrent neural networks (RNNs) connect the data series. The RNs the last output is used as the input for generating new output at each step. Therefore, we couple a RNN to an encoder-decoder network, resulting in a physics-informed recurrent encoder-decoder (PIRED) network in which the cost function is dened partly based on the solutions of the MC and the NS equations. We use the reverse Kullback-Leibler (KL) divergence (relative entropy) (Kullback and Leibler (1951)) in the minimization of the cost function. Suppose thatp(x) is the true probabil- ity distribution of the input/output data, whileq(x) is an approximation to it. The reverse KL divergence from q to p is a measure of the dierence between p(x) and q(x), and the aim is to ensure thatq(x) representsp(x) accurately enough that it minimizes the reverse KL divergence D KL (qkp), given by D KL [q(x)kp(x)] = X x2X q(x) log " q(x) p(x) # ; (8.1) where X is the space in which p(x) and q(x) are dened. D KL = 0, if q(x) matches p(x) perfectly and, in general, it may be rewritten as D KL [qkp] =E xq [ logp(x)]H[q(x)]; (8.2) where H[q(x)] = E xq [ logq(x)] is the entropy of q(x), with E denoting the expected value operator and, thus,E xq [ logp(x)] is the cross-entropy betweenq andp. Optimization ofD KL 149 with respect to q is dened by arg minD KL [qkp] = arg minE xq [ logp(x)]H[q(x)] = arg minE xq [logp(x)] +H[q(x)]: (8.3) Thus, according to Eq. (3), one samples points from q(x) and does so such that they have the maximum probability of belonging top(x). The entropy term of Eq. (3) \encourages"q(x) to be as broad as possible. Thus, the autoencoder tries to identify a distributionq(x) that best approximates p(x). Since in the present problem the input and output images are distinct, the PIRED network, a supervised one, consists of encoder and decoder, known as the U-Net and residual U-Net (RU-Net); see Figure 1. The encoder has four blocks with block containing the standard convolutional (CL) and activation layers (AL), and the pooling and batch normalization layers (PL and BNL). The PL compresses the input to its most important characteristics, eliminating the unnecessary features, and stores them in the latent layer, which itself consists of the AL, CL, and BNL. The BNL not only allows use of higher learning rates by reducing internal covariate shift, but also acts as a regularizer for reducing overtting (Ioe and Szegedy (2015)). The meanhxi and variance Var[x] of batches of data x are computed in the BNL, and a new normalized variable y is dened by y = xhxi q Var[x] + + : (8.4) Here, and are learnable parameter vectors that have the same size as the input data, and is set at 10 5 . During training the layer keeps running estimates of its computed mean and variance, and uses them for normalization during evaluation. The variance is calculated by the biased estimator. 150 Code Gradient RNN , 1 2 3 4 ( − ) 1 2 3 4 ( − ) Figure 8.1: Schematic of the proposed PIRED network. E i and D i indicate the encoder and decoder blocks; 2 is the cost function, x i is the input, and the pressure P j and uid velocity jvj j are the output. 151 The decoder also consists of 4 blocks, with each block containing the CL, AL, and BNL, as well as a transposed CL (TCL), which is similar to a deconvolutional layer in that if, for example, the rst encoder has a size 128 64 64 (i.e., 128 features with a size 64 64), then, one has a similar size in the decoder. The TCL uses the features extracted by the PL to reconstruct the output, the pressure P and uid velocity v elds at each specied time epoch. The latent layer of the RNN that we use improves its performance, and speeds up signicantly the overall network's computations, because it is in the form of residual blocks, i.e., layers that, instead of having only one connection, are connected to further previous layers. We use a high-resolution 3D image of a polymeric membrane of size 5005001000 voxels. Its porosity, thickness, permeability, and mean pore size are, respectively, 0.77, 60 m, 10 12 m 2 , and 8 m. An image of a 2D slice of the membrane is shown in the Figure 11(a). We selected at random 700 2D slices of the image with size 175 175 pixels for the uid ow calculations and training the PIRED, and another 300 slices for testing the accuracy. The 700 images were inserted in the PIRED network's rst layer, while the last layer contained the output, the distributions of P and v. The uid density and viscosity were set at 0.997 gr/cm 3 and 1:89 10 3 gr/(cm.s), with the uid injection velocity being 1 cm/s. We computed the P and v elds at four times and represented them by images. Note that the amount of the data needed for computingP and v is signicantly smaller than what would be needed by the standard ML methods. If L and v 0 represent the characteristic length scale and uid velocity in the medium, we introduce dimensionless variables, x = x=L, y = y=L, v = v=v 0 , t = tv 0 =L, and P = PL=(v 0 ). Deleting superscript for convenience, the MC equation,r v =@v x =@x +@v y =@y = 0 remains unchanged. The NS equation becomes Dv Dt = @v @t + vrv = Re 1 rP +r 2 v ; (8.5) where Re = v 0 L= is the Reynolds number. We dene three functions, 1 = r v, 2 = Dv x =Dt Re 1 (@P=@x +r 2 v x ), and 3 = Dv y =Dt Re 1 (@P=@y +r 2 v y ), and incorporate them in the cost function 2 , minimized by the PIRED network, instead of naively minimizing the squared error between the data and predicted values of v and P . For exact convergence to the actual (numerically calculated) values, we must have i = 0 with i = 1 3. Thus, the PIRED network learns that the mapping between the input and output must comply 152 with i = 0, which not only enriches its training, but also accelerates convergence to the actual values. 2 is dened by 2 = 1 n ( n X i=1 h (P i ^ P i ) 2 + (jv i jj^ v i j) 2 i ) + 3 X i=1 n X j=1 i (x j ;y j ;t j ) 2 ; (8.6) where n is the number of data points used in the training, and P i andjv i j are the actual pressure and magnitude of the velocity at point (x i ;y i ) at time t i , with superscript^denoting the predictions by the PIRED network. The derivatives in i were estimated using the Sobel operator (Sobel (2014); Zhu et al. (2019)), an inexpensive and eective way for computing the gradients, used commonly in image processing. It may be thought of as a smoothed nite-dierence operator consisting of two 33 convolution kernels for the horizontal (H) and vertical (V) directions, which convolve with the image I in order to estimate the H and V derivatives. The kernels are given by, G x = M I and G y = M T I, where T and represent the transpose and convolution operations. M = 2 6 6 6 6 6 4 1 0 1 2 0 2 1 0 1 3 7 7 7 7 7 5 : (8.7) We solved the mass conservation (MC) and the Navier-Stokes (NS) equations using the open source OpenFOAM. The uid was injected at one side, and a xed pressure was applied to the opposite side. The other two boundaries were assumed to be impermeable. Solving the MC and NS equations in each 2D image took about 6 CPU minutes. The computations for training the PIRED network on a Nvidia Tesla V100 GPU took about 2 GPU hours. Then, the tests took less than a second. 8.3 Results By applying our proposed PIRED network, we have reconstructed the velocity and pressure eld for new samples only using a very small of images without specifying any boundary conditions. In Figure 2 (a) and (b) the change in the cost function (here, the root-mean-square error (RMSE) or 2 ), for training and test datasets of the network can be seen, respectively. As can 153 be seen, the RMSE is decreasing for all variables, both for the training and test data indicating the predictions for both pressure and velocity manifest an excellent performance. Numerically speaking, the nal RMSE for training and test are 0.967 and 0.008 for the pressure dataset, respectively. The same quantities are 0.72 and 0.00176 for the velocity dataset, respectively. To make a more direct comparison, the results of a data-driven machine (DDML) learning are also presented in Figure 3(a) and(b) for training and test data, respectively. As can be seen, both the utilized statistics show a weak performance of DDML. To better illustrate the improvement produced by coupling the governing equations in the nal performance of ML, both PIRED and DDML are compared in a more comprehensive way such that the eect of the number of data on the discussed statistics are manifested; see Figure 4. Clearly, there exists a signicant dierence between the implemented methods, in particular when the number of data is small. Such discrepancy, however, becomes smaller when a larger number of data are used. Thus, the importance of including the governing equations can be seen, especially when a large amount of data is not available. 0 0.01 0.02 0.03 0.04 0.05 0.06 0 100 200 300 400 500 RMSE Epoch p v 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0 100 200 300 400 500 RMSE Epoch p v (a) (b) Figure 8.2: Comparison of RMSE for training (a) and test data (b) for the PIRED network. 154 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0 100 200 300 400 500 RMSE Epoch p v 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0 100 200 300 400 500 RMSE Epoch p v (a) (b) Figure 8.3: Comparison of the RMSE for training (a) and test data (b) for the DDML network. Figure 5 compares the predicted spatial distribution of the pressure ^ P at four (dimensionless) timest 1 = 15<t 2 <t 3 <t 4 = 185 with the actualP in one of the randomly-selected 2D images not used in the training, with the results for all other slices being just as accurate (see below). Figure 6 compares the corresponding results for the magnitude |v| of the uid velocity. As the spatial distributions of the dierencesP ^ P andjvjj^ vj indicate, the predictions agree very closely with the actual data. Therefore, not only are the distributions of P and v reproduced accurately, the correlations between their values with increasing time are also honored. 0 0.2 0.4 0.6 0.8 1 200 400 600 800 R 2 Number of samples PIRED DDML 0 0.02 0.04 0.06 0.08 0.1 0.12 200 400 600 800 RMSE Number of samples PIRED DDML (a) (b) Figure 8.4: Eect of data size on (a) R 2 score and (b) RMSE for PIRED network and DDML for the two developed networks, namely PIRED and DDML. 155 1 2 3 4 P P ̂ ( P − P ̂ ) Figure 8.5: Comparison of the predicted pressure ^ P with the numerically calculated P at four (dimensionless) times. 156 1 2 3 4 v v ̂ ( v − v ̂ ) Figure 8.6: Comparison of the predicted uid velocityj^ vj with the numerically calculted values at four (dimensionless) times. Another quantitative comparison is based on selecting at random a vertical line in one of the 300 testing images and comparing the PIRED-predictedP and v along that line with their actual values. One example is shown in Figure 7 that indicates very good agreement between the predictions and the actual data. The same accuracy was obtained for all other slices. 157 PIRED PIRED Positions Positions Figure 8.7: Comparison of predicted pressures and uid velocities with the numerical simula- tions in a randomly selected 2D image along a line perpendicular to the macroscopic direction of ow. We also dene an eective permeability K by, K = Lq=(SP ), where q, S and P are, respectively, the steady-state volume ow rate, and the surface area perpendicular to the macroscopic pressure drop P . K was computed for all the 300 testing slices, and predicted by the PIRED network as well. The comparison is shown in Figure 8(a). We also compared the calculated ensemble averaged maps of the PIRED-predictedjvj andjPj over the 300 testing images with the actual averages. The results are presented in Figure 9 and 10, indicating excellent agreement. 158 (a) (b) Figure 8.8: Comparison of the actual and predicted permeabilities K of 300 2D images of the membrane, and 100 images of the sandstone. K is normalized according to (KK min )=(K max K min ). 1 2 3 4 v ̅ 1 2 3 4 v ̂ ̅ Figure 8.9: Comparison of the actual ensemble-averaged uid velocities v (top) and those predicted by the PIRED (bottom), ^ v, in the polymeric network at four (dimensionless) times. 159 1 2 3 4 P ̅ 1 2 3 4 P ̂ ̅ Figure 8.10: Comparison of the actual ensemble-averaged uid velocities P (top) and those predicted by the PIRED (bottom), ^ P , in the polymeric network at four (dimensionless) times. But a most stringent test of the PIRED network is if we predict the properties of a completely dierent porous medium, without using any data associated with it. Thus, we used the image of a Fontainebleau sandstone [33] with a porosity of 0.14. Since the sandstone's morphology is completely dierent from the membrane's, we used a slightly larger number of 2D slices from the membrane (not the sandstone) to better train the PIRED network. Figure 8(b) compares the eective permeabilities of 100 2D slices of the sandstone with the predictions of the PIRED network. In Figure 11(a) and (b) we show a two-dimensional (2D) cut of the 3D image of the polymeric membrane and Fontainebleau sandstone respectively . 160 (a) (b) Figure 8.11: (a) and (b) A 2D cut from the original 3D image of the polymeric membrane and sandstone, respectively. Black and white represent, respectively, the solid matrix and the pores. Figure 12 compares the predicted spatial distribution of the pressure ^ P at four (dimension- less) times t 1 < t 2 < t 3 < t 4 with the actual P in one of the randomly-selected 2D images of the sandstone, with the results for all other slices being just as accurate. Figure 13 compares the corresponding results for the magnitude of the uid velocities v in the sandstone. As the spatial distributions of the dierences P ^ P andjvj ^ jvj in Figures 9 and 10 indicate, both ^ P and ^ jvj agree very closely with the actual distributions. 161 1 2 3 4 P P ̂ zzzz ( P − P ̂ ) Figure 8.12: Comparison of the predicted pressure ^ P with the numerically calculted P at four (dimensionless) times in a randomly-selected 2D cut of the sandstone. 162 1 2 3 4 v v ̂ ( v − v ̂ ) Figure 8.13: Comparison of the predicted uid velocities v with the numerically calculted values ^ v at four times. 8.4 Summary Summarizing, we presented a physics-informed encoder-decoder algorithm, the PIRED network, by incorporating the MC and NS equations in its learning process, in order to predict uid ow in a complex porous medium. The network provides highly accurate predictions for the uid velocity and pressure elds at every point of the medium that was not used in the training, as well as its eective permeability. Not only does the PIRED network require a signicantly smaller amount of data to make accurate predictions and, therefore, much less computations, it also provides accurate predictions for other types of porous media without using their data. As such, the PIRED network can be applied for predicting the ow properties and transport of porous media. 163 Bibliography Aarts, L. P. and Van Der Veer, P. (2001). Neural network method for solving partial dierential equations. Neural Processing Letters, 14(3):261{271. Adams, N., Cohen, E., Mosser, L., Bl evec, T. L., and Dubrule, O. (2018). Reconstruction of Three-Dimensional Porous Media: Statistical or Deep Learning Approach? In Statistical Data Science, pages 125{139. WORLD SCIENTIFIC (EUROPE). Adler, P. M., Jacquin, C. G., and Quiblier, J. A. (1990). Flow in simulated porous media. International Journal of Multiphase Flow. Agar, S. M. and Geiger, S. (2015). Fundamental controls on uid ow in carbonates: cur- rent work ows to emerging technologies. Geological Society, London, Special Publications, 406(1):1 LP { 59. Ahmad, A. L., Adewole, J. K., Leo, C. P., Ismail, S., Sultan, A. S., and Olatunji, S. O. (2015). Prediction of plasticization pressure of polymeric membranes for CO2 removal from natural gas. Journal of Membrane Science, 480:39{46. Aizenman, M. and Lebowitz, J. L. (1988). Metastability eects in bootstrap percolation. Journal of Physics A: Mathematical and General, 21(19):3801. Alom, M. Z., Yakopcic, C., Hasan, M., Taha, T. M., and Asari, V. K. (2019). Recurrent residual U-Net for medical image segmentation. Journal of Medical Imaging, 6(01):1. Alpaydin, E. (2020). Introduction to machine learning. MIT press. Alqahtani, N., Alzubaidi, F., Armstrong, R. T., Swietojanski, P., and Mostaghimi, P. (2020). Machine learning for predicting properties of porous media from 2d X-ray images. Journal of Petroleum Science and Engineering, 184:106514. 164 Alvarado, J., Sheinman, M., Sharma, A., Mackintosh, F. C., and Koenderink, G. H. (2013). Molecular motors robustly drive active gels to a critically connected state. Nature Physics, 9(9):591{597. Andersen, P. M. and Al-Chalabi, A. (2011). Clinical genetics of amyotrophic lateral sclerosis: What do we really know? Andr a, H., Combaret, N., Dvorkin, J., Glatt, E., Han, J., Kabel, M., Keehm, Y., Krzikalla, F., Lee, M., Madonna, C., Marsh, M., Mukerji, T., Saenger, E. H., Sain, R., Saxena, N., Ricker, S., Wiegmann, A., and Zhan, X. (2013). Digital rock physics benchmarks-Part I: Imaging and segmentation. Computers and Geosciences, 50:25{32. Andresen, P. L., Ford, F. P., Solomon, H. D., and Taylor, D. F. (1990). Monitoring and modeling stress corrosion and corrosion fatigue damage in nuclear reactors. JOM, 42(12):7{11. Angermueller, C., P arnamaa, T., Parts, L., and Stegle, O. (2016). Deep learning for computa- tional biology. Molecular Systems Biology, 12(7):878. Arbabi, S. and Sahimi, M. (1988). Elastic properties of three-dimensional percolation networks with stretching and bond-bending forces. Physical Review B, 38(10):7173{7176. Arns, C. H., Arns, J.-Y., Wang, Y., Arns, C. H., Sheik,., Rahman, S. S., and Arns, J.-Y. (2018). Porous Structure Reconstruction Using Convolutional Neural Networks. Mathemat- ical Geosciences, pages 1{19. Arns, C. H., Knackstedt, M. A., and Martys, N. S. (2005). Cross-property correlations and permeability estimation in sandstone. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. Arns, C. H., Knackstedt, M. A., Val Pinczewski, W., and Lindquist, W. B. (2001). Accurate estimation of transport properties from microtomographic images. Geophysical Research Letters, 28(17):3361{3364. Arpat, G. B. and Caers, J. (2007). Conditional simulation with patterns. Mathematical Geology, 39(2):177{203. 165 Avalos-Gauna, E. and Palafox-Novack, L. (2019). Heat Transfer Coecient Prediction of a Porous Material by Implementing a Machine Learning Model on a CFD Data Set. Canada. Avemaria, F., Lunetta, C., Tarlarini, C., Mosca, L., Maestri, E., Marocchi, A., Melazzini, M., Penco, S., and Corbo, M. (2011). Mutation in the senataxin gene found in a patient aected by familial ALS with juvenile onset and slow progression. Amyotrophic Lateral Sclerosis, 12(3):228{230. Babout, L., Maire, E., Bu ere, J., and Foug eres, R. (2001). Characterization by X-ray com- puted tomography of decohesion, porosity growth and coalescence in model metal matrix composites. Acta Materialia, 49(11):2055{2063. Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE transactions on pattern anal- ysis and machine intelligence, 39(12):2481{2495. Bagheri, M., Akbari, A., and Mirbagheri, S. A. (2019). Advanced control of membrane fouling in ltration systems using articial intelligence and machine learning techniques: A critical review. Banavar, J. R. and Johnson, D. L. (1987). Characteristic pore sizes and transport in porous media. Physical Review B, 35(13):7283{7286. Bar-Sinai, Y., Hoyer, S., Hickey, J., and Brenner, M. P. (2019). Learning data-driven discretiza- tions for partial dierential equations. Proceedings of the National Academy of Sciences of the United States of America, 116(31):15344{15349. Barnett, J. W., Bilchak, C. R., Wang, Y., Benicewicz, B. C., Murdock, L. A., Bereau, T., and Kumar, S. K. (2020). Designing exceptional gas-separation polymer membranes using machine learning. Science Advances, 6(20):eaaz4301. Baruchel, J., Bleuet, P., Bravin, A., Coan, P., Lima, E., Madsen, A., Ludwig, W., Pernot, P., and Susini, J. (2008). Advances in synchrotron hard X-ray based imaging. Comptes Rendus Physique, 9(5-6):624{641. 166 Bekri, S., Xu, K., Yousean, F., Adler, P. M., Thovert, J. F., Muller, J., Iden, K., Psyllos, A., Stubos, A. K., and Ioannidis, M. A. (2000). Pore geometry and transport properties in North Sea chalk. Journal of Petroleum Science and Engineering, 25(3-4):107{134. B elisle, E., Huang, Z., Le Digabel, S., and Gheribi, A. E. (2015). Evaluation of machine learn- ing interpolation techniques for prediction of physical properties. Computational Materials Science, 98:170{177. Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends R in Machine Learning, 2(1):1{127. Bengio, Y., Goodfellow, I., Courville, A., and Goodfellow Ian, Yoshua Bengio, Aaron Courville, Y. B. (2016). Deep learning, volume 1. Cambridge: MIT Press, Massachusetts. Bergman, D. J. and Kantor, Y. (1984). Critical properties of an elastic fractal. Physical Review Letters, 53(6):511{514. Bhandarkar, S. M. and Chen, F. (2005). Similarity analysis of video sequences using an articial neural network. Applied Intelligence, 22(3):251{275. Bishop, C. M. (1995). Neural networks for pattern recognition. Clarendon Press. Biswal, B. and Hilfer, R. (1999). Microstructure analysis of reconstructed porous media. Physica A: Statistical Mechanics and its Applications, 266(1-4):307{311. Biswal, B., Manwart, C., Hilfer, R., Bakke, S., and ren, P. E. (1999). Quantitative analysis of experimental and synthetic microstructures for sedimentary rock. Physica A: Statistical Mechanics and its Applications, 273(3-4):452{475. Biswal, B., ren, P. E., Held, R. J., Bakke, S., and Hilfer, R. (2007). Stochastic multiscale model for carbonate rocks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 75(6):061303. Blunt, M. J. (2017a). Multiphase Flow in Permeable Media. Cambridge University Press, Cambridge. 167 Blunt, M. J. (2017b). Multiphase ow in permeable media: A pore-scale perspective. Cambridge: Cambridge University Press. Borb ely, A., Csikor, F., Zabler, S., Cloetens, P., and Biermann, H. (2004). Three-dimensional characterization of the microstructure of a metal{matrix composite by holotomography. Ma- terials Science and Engineering: A, 367(1):40{50. Borders, W. A., Akima, H., Fukami, S., Moriya, S., Kurihara, S., Horio, Y., Sato, S., and Ohno, H. (2017). Analogue spin-orbit torque device for articial-neural-network-based associative memory operation. Applied Physics Express, 10(1). Boulbitch, A. and Korzhenevskii, A. L. (2020). Morphological transformation of the process zone at the tip of a propagating crack. I. Simulation. Physical Review E, 101(3):033003. Brandon, D. and Kaplan, W. D. (2013). Microstructural characterization of materials. Wiley. Brocke, S. A., Degen, A., Mackerell, A. D., Dutagaci, B., and Feig, M. (2019). Prediction of Membrane Permeation of Drug Molecules by Combining an Implicit Membrane Model with Machine Learning. Journal of Chemical Information and Modeling, 59(3):1147{1162. Buchholz, P. C. F., Fademrecht, S., and Pleiss, J. (2017). Percolation in protein sequence space. PLOS ONE, 12(12):e0189646. Cang, R., Xu, Y., Chen, S., Liu, Y., Jiao, Y., and Ren, M. Y. (2017). Microstructure Rep- resentation and Reconstruction of Heterogeneous Materials Via Deep Belief Network for Computational Material Design. Journal of Mechanical Design, Transactions of the ASME, 139(7). Chalupa, J., Leath, P. L., and Reich, G. R. (1979). Bootstrap percolation on a Bethe Lattice . Physics C Solid State Physics, 12(L31). Chapelle, O., Scholkopf, B., and Zien, Eds., A. (2009). Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Transactions on Neural Networks, 20(3):542{542. Chen, D. and Torquato, S. (2018). Designing disordered hyperuniform two-phase materials with novel physical properties. Acta Materialia, 142:152{161. 168 Chen, H. (2017). Novel machine learning approaches for modeling variations in semiconductor manufacturing. PhD thesis, Massachusetts Institute of Technology. Chen, H., Yong, H., and Zhou, Y. (2019a). XFEM analysis of the fracture behavior of bulk superconductor in high magnetic eld. Journal of Applied Physics, 125(10):103901. Chen, S., Kirubanandham, A., Chawla, N., and Jiao, Y. (2016). Stochastic Multi-Scale Re- construction of 3D Microstructure Consisting of Polycrystalline Grains and Second-Phase Particles from 2D Micrographs. Metallurgical and Materials Transactions A, 47(3):1440{ 1450. Chen, S., Li, H., and Jiao, Y. (2015). Dynamic reconstruction of heterogeneous materials and microstructure evolution. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 92(2):023301. Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, volume 13-17-Augu, pages 785{794. Association for Computing Machinery. Chen, X. W. and Lin, X. (2014). Big data deep learning: Challenges and perspectives. Chen, Y.-Y., Lin, Y.-H., Kung, C.-C., Chung, M.-H., and Yen, I.-H. (2019b). Design and Imple- mentation of Cloud Analytics-Assisted Smart Power Meters Considering Advanced Articial Intelligence as Edge Analytics in Demand-Side Management for Smart Homes. Sensors, 19(9):2047. Cheng, F., He, Q. P., and Zhao, J. (2019). A novel process monitoring approach based on variational recurrent autoencoder. Computers and Chemical Engineering, 129:106515. Cheung, C., Erb, U., and Palumbo, G. (1994). Application of grain boundary engineering concepts to alleviate intergranular cracking in Alloys 600 and 690. Materials Science and Engineering A, 185(1-2):39{43. Chia, R., Chi o, A., and Traynor, B. J. (2018). Novel genes associated with amyotrophic lateral sclerosis: diagnostic and clinical implications. 169 Chua, A. J., Galley, C. R., and Vallisneri, M. (2019). Reduced-Order Modeling with Articial Neurons for Gravitational-Wave Inference. Physical Review Letters, 122(21):211101. Dahl, G. E., Sainath, T. N., and Hinton, G. E. (2013). Improving deep neural networks for LVCSR using rectied linear units and dropout. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8609{8613. IEEE. Daigle, H. (2016). Application of critical path analysis for permeability prediction in natural porous media. Advances in Water Resources, 96:43{54. David, C., Gueguen, Y., and Pampoukis, G. (1990). Eective medium theory and network the- ory applied to the transport properties of rock. Journal of Geophysical Research, 95(B5):6993. De Arcangelis, L., Hansen, A., Herrmann, H. J., and Roux, S. (1989). Scaling laws in fracture. Physical Review B, 40(1):877{880. DeJesus-Hernandez, M., Mackenzie, I. R., Boeve, B. F., Boxer, A. L., Baker, M., Rutherford, N. J., Nicholson, A. M., Finch, N. C. A., Flynn, H., Adamson, J., Kouri, N., Wojtas, A., Sengdy, P., Hsiung, G. Y. R., Karydas, A., Seeley, W. W., Josephs, K. A., Coppola, G., Geschwind, D. H., Wszolek, Z. K., Feldman, H., Knopman, D. S., Petersen, R. C., Miller, B. L., Dickson, D. W., Boylan, K. B., Gra-Radford, N. R., and Rademakers, R. (2011). Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9ORF72 Causes Chromosome 9p-Linked FTD and ALS. Neuron, 72(2):245{256. Deng, L. and Dong, Y. (2014). Deep learning: methods and applications. Foundations and Trends R in Signal Processing, 7(3-4):197{387. Devroye, L. (1986). SAMPLE-BASED NON-UNIFORM RANDOM VARIATE GENERA- TION. In Winter Simulation Conference Proceedings, pages 260{265, New York, New York, USA. IEEE. Doersch, C. (2016). Tutorial on Variational Autoencoders. arXiv preprint arXiv:1606.05908. Dong, C., Loy, C. C., He, K., and Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295{307. 170 Doyen, P. M. (1988). Permeability, conductivity, and pore geometry of sandstone. Journal of Geophysical Research, 93(B7):7729{7740. Du, Y., Wang, W., and Wang, L. (2015). Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1110{1118. Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121{2159. Dudchenko, A. V. and Mauter, M. S. (2020). Neural networks for estimating physical parame- ters in membrane distillation. Journal of Membrane Science, 610:118285. Ebrahimi, F. and Sahimi, M. (2004). Multiresolution wavelet scale up of unstable miscible displacements in ow through heterogeneous porous media. Elden, A. C., Kim, H. J., Hart, M. P., Chen-Plotkin, A. S., Johnson, B. S., Fang, X., Armakola, M., Geser, F., Greene, R., Lu, M. M., Padmanabhan, A., Clay-Falcone, D., McCluskey, L., El- man, L., Juhr, D., Gruber, P. J., R ub, U., Auburger, G., Trojanowski, J. Q., Lee, V. M., Van Deerlin, V. M., Bonini, N. M., and Gitler, A. D. (2010). Ataxin-2 intermediate-length polyg- lutamine expansions are associated with increased risk for ALS. Nature, 466(7310):1069{1075. Elyassi, B., Sahimi, M., and Tsotsis, T. T. (2007). Silicon carbide membranes for gas separation applications. Journal of Membrane Science, 288:290{297. Erofeev, A., Orlov, D., Ryzhov, A., and Koroteev, D. (2019). Prediction of Porosity and Permeability Alteration Based on Machine Learning Algorithms. Transport in Porous Media, 128(2):677{700. Faghmous, J. H. and Kumar, V. (2014). A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science. Big data, 2(3):155{163. Falk, T., Mai, D., Bensch, R., C i cek, O., Abdulkadir, A., Marrakchi, Y., B ohm, A., Deubner, J., J ackel, Z., Seiwald, K., Dovzhenko, A., Tietz, O., Dal Bosco, C., Walsh, S., Saltukoglu, D., Tay, T. L., Prinz, M., Palme, K., Simons, M., Diester, I., Brox, T., and Ronneberger, 171 O. (2019). U-Net: deep learning for cell counting, detection, and morphometry. Nature Methods, 16(1):67{70. Fan, Y., Iwashita, T., and Egami, T. (2014). How thermally activated deformation starts in metallic glass. Nature Communications, 5(1):1{7. Farahbakhsh, J., Delnavaz, M., and Vatanpour, V. (2019). Simulation and characterization of novel reverse osmosis membrane prepared by blending polypyrrole coated multiwalled carbon nanotubes for brackish water desalination and antifouling properties using articial neural networks. Journal of Membrane Science, 581:123{138. Feng, J., He, X., Teng, Q., Ren, C., Chen, H., and Li, Y. (2019). Accurate and Fast reconstruc- tion of Porous Media from Extremely Limited Information Using Conditional Generative Adversarial Network. Physical Review E, 100(3). Feng, J., Teng, Q., He, X., and Wu, X. (2018). Accelerating multi-point statistics reconstruction method for porous media via deep learning. Acta Materialia, 159:296{308. Feng, S. and Sahimi, M. (1985). Position-space renormalization for elastic percolation networks with bond-bending forces. Physical Review B, 31(3):1671{1673. Feng, S., Sen, P. N., Halperin, B. I., and Lobb, C. J. (1984). Percolation on two-dimensional elas- tic networks with rotationally invariant bond-bending forces. Physical Review B, 30(9):5386{ 5389. Fineberg, J. and Marder, M. (1999). Instability in dynamic fracture. Fokina, D., Muravleva, E., Ovchinnikov, G., and Oseledets, I. (2019). Microstructure synthesis using style-based generative adversarial network. Physical Review E, 101(4):043308. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Compu- tation, 121(2):256{285. Freund, Y. and Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learn- ing and an Application to Boosting. Journal of Computer and System Sciences, 55(1):119{ 139. 172 Friedman, J., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). Annals of Statistics, 28(2):337{407. Galar, M., Fern andez, A., Barrenechea, E., and Herrera, F. (2013). EUSBoost: Enhancing en- sembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 46(12):3460{3471. George, J., Cyril, A., I. Koshy, B., and Mary, L. (2013). Exploring Sound Signature for Vehicle Detection and Classication Using ANN. International Journal on Soft Computing, 4(2):29{ 36. Gerke, K. M., Karsanina, M. V., and Mallants, D. (2015). Universal Stochastic Multiscale Image Fusion: An Example Application for Shale Rock. Scientic reports, 5:15880. Gers, F. (2001). Long short-term memory in recurrent neural networks. PhD thesis, Verlag nicht ermittelbar. Ghanbarian, B. and Javadpour, F. (2017). Upscaling pore pressure-dependent gas permeability in shales. Journal of Geophysical Research: Solid Earth, 122(4):2541{2552. Ghanbarian, B., Torres-Verd n, C., and Skaggs, T. H. (2016). Quantifying tight-gas sandstone permeability via critical path analysis. Advances in Water Resources, 92:316{322. Gibson, P., Schreuder-Gibson, H., and Rivin, D. (2001). Transport properties of porous mem- branes based on electrospun nanobers. Colloids and Surfaces A: Physicochemical and En- gineering Aspects, pages 469{481. Glasner, D., Bagon, S., and Irani, M. (2009). Super-resolution from a single image. In Proceed- ings of the IEEE International Conference on Computer Vision, pages 349{356. IEEE. Glorot, X., Bordes, A., and Bengio, Y. (2010). Deep Sparse Rectier Neural Networks. In Journal of Machine Learning Research, volume 15. Goy, A., Arthur, K., Li, S., and Barbastathis, G. (2018). Low Photon Count Phase Retrieval Using Deep Learning. Physical Review Letters, 121(24):243902. 173 Goyal, N. A. and Mozaar, T. (2014). Experimental trials in amyotrophic lateral sclerosis: A review of recently completed, ongoing and planned trials using existing and novel drugs. Graves, A., Fern andez, S., and Schmidhuber, J. (2007). Multi-dimensional recurrent neural networks. In International conference on articial neural networks, pages 549{558, bERLIN. Springer Berlin Heidelberg. Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classication with bidirectional LSTM networks. In Proceedings of the International Joint Conference on Neural Networks, volume 4, pages 2047{2052. Gromiha, M. M. and Yabuki, Y. (2008). Functional discrimination of membrane proteins using machine learning techniques. BMC Bioinformatics, 9(1):1{8. Guo, L., Yang, Y., Xu, F., Lan, Q., Wei, M., and Wang, Y. (2019). Design of gradient nanopores in phenolics for ultrafast water permeation. Chemical Science, 10(7):2093{2100. Gupta, H. V. and Nearing, G. S. (2014). Debates-the future of hydrological sciences: A (com- mon) path forward? Using models and data to learn: A systems theoretic perspective on the future of hydrological science. Water Resources Research, 50(6):5351{5359. Hagita, K., Higuchi, T., and Jinnai, H. (2018). Super-resolution for asymmetric resolution of FIB-SEM 3D imaging using AI with deep learning. Scientic Reports, 8(1):5877. Hajizadeh, A., Safekordi, A., and Farhadpour, F. A. (2011). A multiple-point statistics al- gorithm for 3D pore space reconstruction from 2D images. Advances in Water Resources, 34(10):1256{1267. Hamzehpour, H., Rasaei, M. R., and Sahimi, M. (2007). Development of optimal models of porous media by combining static and dynamic data: The permeability and porosity distri- butions. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 75(5):056311. Han, J., Jentzen, A., and Weinan, E. (2018). Solving high-dimensional partial dierential equations using deep learning. Proceedings of the National Academy of Sciences of the United States of America, 115(34):8505{8510. 174 Haque, M. E. and Sudhakar, K. V. (2002). ANN back-propagation prediction model for fracture toughness in microalloy steel. International Journal of Fatigue, 24(9):1003{1010. Hasselman, D. P. (1969). Unied Theory of Thermal Shock Fracture Initiation and Crack Propagation in Brittle Ceramics. Journal of the American Ceramic Society, 52(11):600{604. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning The Elements of Statistical LearningData Mining, Inference, and Prediction. Springer Science & Business Media. He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. In IEEE conference on computer vision and pattern recognition (CVPR), pages 770{778. He, S., Reif, K., and Unbehauen, R. (2000). Multilayer neural networks for solving a class of partial dierential equations. Neural Networks, 13(3):385{396. Herman, G. T. (2009). Fundamentals of Computerized Tomography. Advances in Pattern Recognition. Springer London, London. Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8):1735{1780. Hu, J., Kim, C., Halasz, P., Kim, J. F. J., Kim, J. F. J., and Szekely, G. (2020). Articial intelligence for performance prediction of organic solvent nanoltration membranes. Journal of Membrane Science, 619:118513. Ioannidis, N. M., Rothstein, J. H., Pejaver, V., Middha, S., McDonnell, S. K., Baheti, S., Musolf, A., Li, Q., Holzinger, E., Karyadi, D., Cannon-Albright, L. A., Teerlink, C. C., Stanford, J. L., Isaacs, W. B., Xu, J., Cooney, K. A., Lange, E. M., Schleutker, J., Carpten, J. D., Powell, I. J., Cussenot, O., Cancel-Tassin, G., Giles, G. G., MacInnis, R. J., Maier, C., Hsieh, C. L., Wiklund, F., Catalona, W. J., Foulkes, W. D., Mandal, D., Eeles, R. A., Kote-Jarai, Z., Bustamante, C. D., Schaid, D. J., Hastie, T., Ostrander, E. A., Bailey-Wilson, J. E., Radivojac, P., Thibodeau, S. N., Whittemore, A. S., and Sieh, W. (2016). REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. American Journal of Human Genetics, 99(4):877{885. 175 Ioe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In 32nd International Conference on Machine Learning, ICML 2015, volume 1, pages 448{456. International Machine Learning Society (IMLS). Iten, R., Metger, T., Wilming, H., Del Rio, L., and Renner, R. (2020). Discovering Physical Concepts with Neural Networks. Physical Review Letters, 124(1):010508. Jablonka, K. M., Ongari, D., Moosavi, S. M., and Smit, B. (2020). Big-Data Science in Porous Materials: Materials Genomics and Machine Learning. James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to Statistical Learning, volume 112. Springer. Javadpour, F., Fisher, D., and Unsworth, M. (2007). Nanoscale gas ow in shale gas sediments. Journal of Canadian Petroleum Technology, 46(10):55{61. Jiang, Z., van Dijke, M. I. J., Sorbie, K. S., Couples, G. D., van Dijke, M. I. J., Sorbie, K. S., and Couples, G. D. (2013). Representation of multiscale heterogeneity via multiscale pore networks. Water Resources Research, 49(9):5437{5449. Jiao, Y., Padilla, E., and Chawla, N. (2013). Modeling and predicting microstructure evolu- tion in lead/tin alloy via correlation functions and stochastic material reconstruction. Acta Materialia, 61(9):3370{3377. Jiao, Y., Stillinger, F. H., and Torquato, S. (2008). Modeling heterogeneous materials via two-point correlation functions. II. Algorithmic details and applications. Physical Review E, 77(3):031135. Jiao, Y., Stillinger, F. H., and Torquato, S. (2009). A superior descriptor of random textures and its predictive capacity. Proceedings of the National Academy of Sciences of the United States of America, 106(42):17634{9. Jiao, Y. and Torquato, S. (2012). Quantitative characterization of the microstructure and transport properties of biopolymer networks - IOPscience. Physical biology, 9(3):036009. Johnson, D. L., Koplik, J., and Schwartz, L. M. (1986). New pore-size parameter characterizing transport in porous media. Physical Review Letters, 57(20):2564{2567. 176 Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, volume 9906 LNCS, pages 694{711. Springer, Cham. Johnson, M. E. (1987). Multivariate Statistical Simulation: A Guide to Selecting and Generating ... - Mark E. Johnson - Google Books, volume 192. John Wiley & Sons. Julleh, M., Rahman, J., Mccann, T., Abdullah, R., and Yeasmin, R. (2011). Sandstone diage- nesis of the Neogene Surma Group from the Shahbazpur Gas Field, Southern Bengal Basin, Bangladesh. Austrian Journal of Earth Sciences, 104(1):114{126. Kainourgiakis, M. E., Kikkinides, E. S., Galani, A., Charalambopoulou, G. C., and Stubos, A. K. (2005). Digitally reconstructed porous media: Transport and sorption properties. Transport in Porous Media, 58(1-2):43{62. Kak, A. C., Slaney, M., Wang, G., and Avinash C. Kak, M. S. (2002). Principles of Comput- erized Tomographic Imaging. Medical Physics, 29(1). Kalchbrenner, N., Danihelka, I., and Graves, A. (2015). Grid Long Short-Term Memory. arXiv:1507.01526. Kamrava, S., Sahimi, M., and Tahmasebi, P. (2020a). Quantifying accuracy of stochastic meth- ods of reconstructing complex materials by deep learning. Physical Review E, 101(4):043301. Kamrava, S., Tahmasebi, P., and Sahimi, M. (2019). Enhancing images of shale formations by a hybrid stochastic and deep learning algorithm. Neural Networks, 118:310{320. Kamrava, S., Tahmasebi, P., and Sahimi, M. (2020b). Linking Morphology of Porous Media to Their Macroscopic Permeability by Deep Learning. Transport in Porous Media, 131(2):427{ 448. Kamrava, S., Tahmasebi, P., Sahimi, M., and Arbabi, S. (2020c). Phase transitions, percolation, fracture of materials, and deep learning. Physical Review E, 102(1):011001. Kantor, Y. and Webman, I. (1984). Elastic properties of random percolating systems. Physical Review Letters, 52(21):1891{1894. 177 Karimpouli, S. and Tahmasebi, P. (2019). Image-based velocity estimation of rock using Con- volutional Neural Networks. Neural Networks, 111:89{97. Karimpouli, S., Tahmasebi, P., and Saenger, E. H. (2019). Coal Cleat/Fracture Segmentation Using Convolutional Neural Networks. Natural Resources Research. Karkanis, S. A., Iakovidis, D. K., Karras, D. A., and Maroulis, D. E. (2001). Detection of lesions in endoscopic video using textural descriptors on wavelet domain supported by arti- cial neural network architectures. In IEEE International Conference on Image Processing, volume 2, pages 833{836. Katz, A. J. and Thompson, A. H. (1986). Quantitative prediction of permeability in porous rock. Physical Review B, 34(11):8179{8181. Katz, A. J. and Thompson, A. H. (1987). Prediction of rock electrical conductivity from mercury injection measurements. Journal of Geophysical Research, 92(B1):599{607. Keating, P. N. (1966). Relationship between the macroscopic and microscopic theory of crystal elasticity. I. Primitive crystals. Physical Review, 152(2):774{779. Kieu, T., Yang, B., Guo, C., and Jensen, C. S. (2019). Outlier Detection for Time Series with Recurrent Autoencoder Ensembles. In Joint Conference on Articial Intelligence, pages 2725{2732. Kim, J., Lee, J. K., and Lee, K. M. (2015). Deeply-Recursive Convolutional Network for Im- age Super-Resolution. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1637{1645. Kim, K. G. (2016). Book Review: Deep Learning. Healthcare Informatics Research, 22(4):351. Kingma, D. P. and Ba, J. L. (2015). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Pro- ceedings. International Conference on Learning Representations, ICLR. Kinney, J. H. and Nichols, M. C. (1992). X-Ray Tomographic Microscopy (XTM) Using Syn- chrotron Radiation. Annual Review of Materials Science, 22(1):121{152. 178 Kolen, J. F. and Kremer, S. C. (2010). Gradient Flow in Recurrent Nets: The Diculty of Learning LongTerm Dependencies. In A Field Guide to Dynamical Recurrent Networks. IEEE. Kong, W. L., Bao, J. B., Wang, J., Hu, G. H., Xu, Y., and Zhao, L. (2016). Preparation of open-cell polymer foams by CO2 assisted foaming of polymer blends. Polymer, 90:331{341. Koplik, J., Lin, C., and Vermette, M. (1984). Conductivity and permeability from microgeom- etry. Journal of Applied Physics, 56(11):3127{3131. Kozlovskaia, N. and Zaytsev, A. (2018). Deep ensembles for imbalanced classication. In Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, volume 2018-Janua, pages 908{913. Krawczyk, B., Galar, M., Jele n, L., and Herrera, F. (2016). Evolutionary undersampling boosting for imbalanced classication of breast cancer malignancy. Applied Soft Computing, 38:714{726. Krishnan, S. and Journel, A. G. (2003). Spatial connectivity: From variograms to multiple- point measures. Mathematical Geology, 35(8):915{925. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). ImageNet Classication with Deep Convolutional Neural Networks. Communications of the ACM, 60(6):84{90. Kullback, S. and Leibler, R. A. (1951). On Information and Suciency. The Annals of Math- ematical Statistics, 22(1):79{86. Kwok, C. Y., Duan, K., and Pierce, M. (2020). Modeling hydraulic fracturing in jointed shale formation with the use of fully coupled discrete element method. Acta Geotechnica, 15(1):245{264. Lawn, B. (1993). Fracture of brittle solids. Cambridge University Press, London, 2 edition. LeCun, Y. (1989). Generalization and network design strategies. Connectionism in perspective, 19:143{155. 179 Lecun, Y., Bottou, L., Bengio, Y., and Haner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278{2324. LeCun, Yann, Yoshua Bengio, G. H., Lecun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436{444. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., and Shi, W. (2016). Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. CVPR, 2(3):4. Lee, E. Y., Fulan, B. M., Wong, G. C., and Ferguson, A. L. (2016). Mapping membrane activity in undiscovered peptide sequence space using machine learning. Proceedings of the National Academy of Sciences of the United States of America, 113(48):13588{13593. Lee, E. Y., Wong, G. C., and Ferguson, A. L. (2018). Machine learning-enabled discovery and design of membrane-active peptides. Bioorganic and Medicinal Chemistry, 26(10):2708{2718. Leonard, A. P., Cameron, R. B., Speiser, J. L., Wolf, B. J., Peterson, Y. K., Schnellmann, R. G., Beeson, C. C., and Rohrer, B. (2015). Quantitative analysis of mitochondrial mor- phology and membrane potential in living cells using high-content imaging, machine learn- ing, and morphological binning. Biochimica et Biophysica Acta - Molecular Cell Research, 1853(2):348{360. Levine, Y., Sharir, O., Cohen, N., and Shashua, A. (2019). Quantum Entanglement in Deep Learning Architectures. Physical Review Letters, 122(6):065301. Levitz, P. (1998). O-lattice reconstruction of porous media: Critical evaluation, geometrical connement and molecular transport. Advances in Colloid and Interface Science, 76-77:71{ 106. Li, X., Yang, Z., Catherine Brinson, L., Choudhary, A., Agrawal, A., and Chen, W. (2018). A deep adversarial learning methodology for designing microstructural material systems. In Proceedings of the ASME Design Engineering Technical Conference, volume 2B-2018. Li, Y., Xu, Y., Jiang, M., Li, B., Han, T., Chi, C., Lin, F., Shen, B., Zhu, X., Lai, L., and 180 Fang, Z. (2019). Self-Learning Perfect Optical Chirality via a Deep Neural Network. Physical Review Letters, 123(21):213902. Liang, X., Shen, X., Feng, J., Lin, L., and Yan, S. (2016). Semantic object parsing with graph LSTM. In Lecture Notes in Computer Science (including subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), volume 9905 LNCS, pages 125{ 143. Springer Verlag. Libotean, D., Giralt, J., Giralt, F., Rallo, R., Wolfe, T., and Cohen, Y. (2009). Neural network approach for modeling the performance of reverse osmosis membrane desalting. Journal of Membrane Science, 326(2):408{419. Liu, H., Chen, J., Hissel, D., and Su, H. (2019a). Remaining useful life estimation for proton exchange membrane fuel cells using a hybrid method. Applied Energy, 237:910{919. Liu, S. (2017). Prediction of Capillary Pressure and Relative Permeability Curves using Con- ventional Pore-scale Displacements and Articial Neural Networks. PhD thesis, University of Kansas. Liu, S., Zhong, Z., Takbiri-Borujeni, A., Kazemi, M., Fu, Q., and Yang, Y. (2019b). A case study on homogeneous and heterogeneous reservoir porous media reconstruction by using generative adversarial networks. Energy Procedia, 158:6164{6169. Liu, Z., Yan, S., Liu, H., and Chen, X. (2019c). Superhigh-Resolution Recognition of Optical Vortex Modes Assisted by a Deep-Learning Method. Physical Review Letters, 123(18):183902. Logroscino, G., Traynor, B. J., Hardiman, O., Chi o, A., Mitchell, D., Swingler, R. J., Millul, A., Benn, E., Beghi, E., and for EURALS (2010). Incidence of amyotrophic lateral sclerosis in Europe. Journal of Neurology, Neurosurgery & Psychiatry, 81(4):385{390. Ma, Z. and Torquato, S. (2018). Precise algorithms to compute surface correlation functions of two-phase heterogeneous media and their applications. Physical Review E, 98(1):013307. Makhzani, A. and Frey, B. (2013). k-Sparse Autoencoders. arXiv preprint arXiv:1312.5663. 181 Malakhovsky, I. and Michels, M. A. (2006). Scaling and localization in fracture of disordered central-force spring lattices: Comparison with random damage percolation. Physical Review B - Condensed Matter and Materials Physics, 74(1):014206. Malmir, H., Sahimi, M., and Jiao, Y. (2018). Higher-order correlation functions in disordered media: Computational algorithms and application to two-phase heterogeneous materials. Physical Review E, 98(6):063317. Malmir, H., Sahimi, M., and Rahimi Tabar, M. R. (2017). Statistical characterization of microstructure of packings of polydisperse hard cubes. Physical review. E, 95(5-1):052902. Mao, X., Shen, C., and Yang, Y.-B. (2016). Image Restoration Using Very Deep Convolu- tional Encoder-Decoder Networks with Symmetric Skip Connections. Advances in Neural Information Processing Systems 29, pages 2802{2810. Masci, J., Meier, U., Cire san, D., and Schmidhuber, J. (2011). Stacked convolutional auto- encoders for hierarchical feature extraction. In International conference on articial neural networks, pages 52{59. Springer, Berlin, Heidelberg. Maturana, D. and Scherer, S. (2015). 3D Convolutional Neural Networks for landing zone detection from LiDAR. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 3471{3478. IEEE. Meade, A. J. and Fernandez, A. A. (1994). The numerical solution of linear ordinary dierential equations by feedforward neural networks. Mathematical and Computer Modelling, 19(12):1{ 25. Mehrabi, A. R. and Sahimi, M. (1997). Coarsening of heterogeneous media: Application of wavelets. Physical Review Letters, 79(22):4385{4388. Michel, A. N. and Farrell, J. A. (1990). Associative Memories via Articial Neural Networks. IEEE Control Systems Magazine, 10(3):6{17. Miller, R. G., Mitchell, J. D., and Moore, D. H. (2012). Riluzole for amyotrophic lateral sclerosis (ALS)/motor neuron disease (MND). Cochrane Database of Systematic Reviews, 3. 182 Mittal, A., Soundararajan, R., and Bovik, A. C. (2013). Making a 'completely blind' image quality analyzer. IEEE Signal Processing Letters, 20(3):209{212. Mo, S., Zabaras, N., Shi, X., and Wu, J. (2019). Deep Autoregressive Neural Networks for High- Dimensional Inverse Problems in Groundwater Contaminant Source Identication. Water Resources Research, 55(5):3856{3881. Mohaghegh, S. (2018). Data-Driven Analytics for the Geological Storage of CO2. CRC Press, Florida. Mohaghegh, S. D. (2017). Shale Analytics. Springer International Publishing, Cham. Mohammadi, H., Oskoee, E. N., Afsharchi, M., Yazdani, N., and Sahimi, M. (2009). A per- colation model of mobile ad-hoc networks. International Journal of Modern Physics C, 20(12):1871{1902. Monteleoni, C., Schmidt, G. A., and McQuade, S. (2013). Climate Informatics: Accelerating Discovering in Climate Science with Machine Learning. Computing in Science & Engineering, 15(5):32{40. Moreno, Y., G omez, J. B., and Pacheco, A. F. (2000). Fracture and second-order phase tran- sitions. Physical Review Letters, 85(14):2865{2868. Mosser, L., Dubrule, O., and Blunt, M. J. (2017). Reconstruction of three-dimensional porous media using generative adversarial neural networks. Physical Review E, 96(4):043309. Mosser, L., Dubrule, O., and Blunt, M. J. (2018a). Conditioning of three-dimensional generative adversarial networks for pore and reservoir-scale models. arXiv preprint arXiv:1802.05622. Mosser, L., Dubrule, O., and Blunt, M. J. (2018b). Stochastic Reconstruction of an Oolitic Limestone by Generative Adversarial Networks. Transport in Porous Media, 125(1):81{103. Mostaghimi, P., Blunt, M. J., and Bijeljic, B. (2013). Computations of Absolute Permeability on Micro-CT Images. Mathematical Geosciences, 45(1):103{125. Mourzenko, V. V., Thovert, J. F., and Adler, P. M. (2011). Trace analysis for fracture networks with anisotropic orientations and heterogeneous distributions. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. 183 Mukhopadhyay, S. and Sahimi, M. (2000). Calculation of the eective permeabilities of eld- scale porous media. Chemical Engineering Science, 55(20):4495{4513. Nair, V. and Hinton, G. E. (2010). Rectied Linear Units Improve Restricted Boltzmann Machines. International Conference on Machine Learning, pages 807{814. Nasrabadi, N. M. and Choo, C. Y. (1992). Hopeld Network for Stereo Vision Correspondence. IEEE Transactions on Neural Networks, 3(1):5{13. Nicolas Remy, Alexandre Boucher, and Jianbing Wu (2009). Applied Geostatistics with SGeMS: A User's Guide - Nicolas Remy, Alexandre Boucher, Jianbing Wu - Google Books. Cambridge University Press. Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. Nogueira, K., Penatti, O. A., and dos Santos, J. A. (2017). Towards better exploiting convolu- tional neural networks for remote sensing scene classication. Pattern Recognition, 61:539{ 556. Okabe, H. and Blunt, M. J. (2004). Prediction of permeability for porous media reconstructed using multiple-point statistics. Physical Review E, 70(6):066135. Olmez, T. and Dokur, Z. (2003). Classication of heart sounds using an articial neural network. Pattern Recognition Letters, 24(1-3):617{629. Padilla, E., Jakkali, V., Jiang, L., and Chawla, N. (2012). Quantifying the eect of porosity on the evolution of deformation and damage in Sn-based solder joints by X-ray microtomography and microstructure-based nite element modeling. Acta Materialia, 60(9):4017{4026. Paricharak, H. N., Lotake, A. A., Sudhakar V., M., and Gaikwad., D. R. (2019). Analysis of Crack on Aeroplane Wing at Dierent Positions using ANSYS Software Heat transfer in microchannel View project. Article in International Journal of New Technology and Research. Park, S., Baek, S. S., Pyo, J. C., Pachepsky, Y., Park, J., and Cho, K. H. (2019). Deep neural networks for modeling fouling growth and ux decline during NF/RO membrane ltration. Journal of Membrane Science, 587:117164. 184 Park, S. C., Park, M. K., and Kang, M. G. (2003). Super-resolution image reconstruction: A technical overview. Pascanu, R., Mikolov, T., and Bengio, Y. (2013). On the diculty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310{1318. Pires de Lima, R. (2019). Petrographic analysis with deep convolutional neural networks. PhD thesis, University of Oklahoma. Prasad, B. K., Eskandari, H., and Reddy, B. V. (2009). Prediction of compressive strength of SCC and HPC with high volume y ash using ANN. Construction and Building Materials, 23(1):117{128. Prodanovi c, M., Mehmani, A., and Sheppard, A. P. (2015). Imaged-based multiscale network modelling of microporosity in carbonates. Geological Society, London, Special Publications, 406(1):95 LP { 113. Qi, Y., Wang, Y., Zheng, X., and Wu, Z. (2014). Robust feature learning by stacked autoen- coder with maximum correntropy criterion. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pages 6716{6720. Institute of Electri- cal and Electronics Engineers Inc. Quartarone, E., Mustarelli, P., and Magistris, A. (2002). Transport properties of porous PVDF membranes. Journal of Physical Chemistry B, 106(42):10828{10833. R F Cochrane (2002). DoITPoMS - Micrograph and record. Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv preprint arXiv:1511.06434. Raissi, M., Babaee, H., and Givi, P. (2019a). Deep Learning of Turbulent Scalar Mixing. Physical Review Fluids, 4(12):124501. Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2017). Machine learning of linear dierential equations using Gaussian processes. Journal of Computational Physics, 348:683{693. 185 Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019b). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial dierential equations. Journal of Computational Physics, 378:686{707. Raissi, M., Wang, Z., Triantafyllou, M. S., and Karniadakis, G. E. (2019c). Deep Learning of Vortex Induced Vibrations. Journal of Fluid Mechanics, 861:119{137. Raissi, M., Yazdani, A., and Karniadakis, G. E. (2018). Hidden Fluid Mechanics: A Navier- Stokes Informed Deep Learning Framework for Assimilating Flow Visualization Data. arXiv preprint arXiv:1808.04327. Raissi, M., Yazdani, A., and Karniadakis, G. E. (2020). Hidden uid mechanics: Learning velocity and pressure elds from ow visualizations. Science, 367(6481):1026{1030. Rall, D., Menne, D., Schweidtmann, A. M., Kamp, J., von Kolzenberg, L., Mitsos, A., and Wessling, M. (2019). Rational design of ion separation membranes. Journal of Membrane Science, 569:209{219. Rall, D., Schweidtmann, A. M., Aumeier, B. M., Kamp, J., Karwe, J., Ostendorf, K., Mitsos, A., and Wessling, M. (2020a). Simultaneous rational design of ion separation membranes and processes. Journal of Membrane Science, 600:117860. Rall, D., Schweidtmann, A. M., Kruse, M., Evdochenko, E., Mitsos, A., and Wessling, M. (2020b). Multi-scale membrane process optimization with high-delity ion transport models through machine learning. Journal of Membrane Science, 608:118208. Ranzato, M., Huang, F. J., Boureau, Y. L., and LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Rasaei, M. R. and Sahimi, M. (2009a). Upscaling of the geological models of large-scale porous media using multiresolution wavelet transformations. Journal of Heat Transfer, 131(10):1{12. Rasaei, M. R. and Sahimi, M. (2009b). Upscaling of the permeability by multiscale wavelet transformations and simulation of multiphase ows in heterogeneous porous media. Compu- tational Geosciences, 13(2):187{214. 186 Renton, A. E., Chi o, A., and Traynor, B. J. (2014). State of play in amyotrophic lateral sclerosis genetics. Renton, A. E., Majounie, E., Waite, A., Sim on-S anchez, J., Rollinson, S., Gibbs, J. R., Schymick, J. C., Laaksovirta, H., van Swieten, J. C., Myllykangas, L., Kalimo, H., Pae- tau, A., Abramzon, Y., Remes, A. M., Kaganovich, A., Scholz, S. W., Duckworth, J., Ding, J., Harmer, D. W., Hernandez, D. G., Johnson, J. O., Mok, K., Ryten, M., Trabzuni, D., Guerreiro, R. J., Orrell, R. W., Neal, J., Murray, A., Pearson, J., Jansen, I. E., Sondervan, D., Seelaar, H., Blake, D., Young, K., Halliwell, N., Callister, J. B., Toulson, G., Richardson, A., Gerhard, A., Snowden, J., Mann, D., Neary, D., Nalls, M. A., Peuralinna, T., Jansson, L., Isoviita, V. M., Kaivorinne, A. L., H oltt a-Vuori, M., Ikonen, E., Sulkava, R., Benatar, M., Wuu, J., Chi o, A., Restagno, G., Borghero, G., Sabatelli, M., Heckerman, D., Rogaeva, E., Zinman, L., Rothstein, J. D., Sendtner, M., Drepper, C., Eichler, E. E., Alkan, C., Abdul- laev, Z., Pack, S. D., Dutra, A., Pak, E., Hardy, J., Singleton, A., Williams, N. M., Heutink, P., Pickering-Brown, S., Morris, H. R., Tienari, P. J., and Traynor, B. J. (2011). A hexanu- cleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron, 72(2):257{268. Revil, A. and Cathles, L. M. (1999). Permeability of shaly sands. Water Resources Research, 35(3):651{662. Reza Rasaei, M. and Sahimi, M. (2008). Upscaling and simulation of water ooding in heteroge- neous reservoirs using wavelet transformations: Application to the SPE-10 model. Transport in Porous Media, 72(3):311{338. Richesson, S. and Sahimi, M. (2019). Hertz-Mindlin Theory of Contacting Grains and the Eective-Medium Approximation for the Permeability of Deforming Porous Media. Geo- physical Research Letters, 46(14):8039{8045. Robberecht, W. and Philips, T. (2013). The changing scene of amyotrophic lateral sclerosis. Roehl, E. A., Ladner, D. A., Daamen, R. C., Cook, J. B., Safarik, J., Phipps, D. W., and Xie, P. (2018). Modeling fouling in a large RO system with articial neural networks. Journal of Membrane Science, 552:95{106. 187 Rogal, J., Schneider, E., and Tuckerman, M. E. (2019). Neural-Network-Based Path Collec- tive Variables for Enhanced Sampling of Phase Transformations. Physical Review Letters, 123(24):245701. Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomed- ical image segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), volume 9351, pages 234{241. Springer, Cham. Rothstein, J. D. (2017). Bench to Bedside Edaravone: A New Drug Approved for ALS. Cell, 171. Roux, S., Hansen, A., Herrmann, H., and Guyon, E. (1988). Rupture of heterogeneous media in the limit of innite disorder. Journal of Statistical Physics, 52(1-2):237{244. Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1985). Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science. Rumelhart, D. E. and McClelland, J. L. (1987). Learning Internal Representations by Error Propagation - MIT Press books. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, pages 318{362. MIT Press. Saberi, A. A. (2015). Recent advances in percolation theory and its applications. Sahimi, M. (1994). Applications Of Percolation Theory . Taylor & Francis, London, 2 edition. Sahimi, M. (2003a). Heterogeneous Materials I, volume 22 of Interdisciplinary Applied Mathe- matics. Springer-Verlag, New York. Sahimi, M. (2003b). Heterogeneous Materials II: Nonlinear and Breakdown Properties and Atomistic modeling. Springer, New York. 188 Sahimi, M. (2011a). Flow and Transport in Porous Media and Fractured Rock. John Wiley & Sons. Sahimi, M. (2011b). Flow and transport in porous media and fractured rock : from classical methods to modern approaches. John Wiley & Sons. Sahimi, M. and Arbabi, S. (1992). Percolation and fracture in disordered solids and granular media: Approach to a xed point. Physical Review Letters, 68(5):608{611. Sahimi, M. and Arbabi, S. (1993). Mechanics of disordered solids. III. Fracture properties. Physical Review B, 47(2):713{722. Sahimi, M. and Goddard, J. D. (1986). Elastic percolation models for cohesive mechanical failure in heterogeneous systems. Physical Review B, 33(11):7848{7851. Sahimi, M. and Ray, T. S. (1991). Transport through bootstrap percolation clusters. Journal de Physique I, 1(5):685{692. Saleh, K., Hossny, M., and Nahavandi, S. (2018). Intent prediction of vulnerable road users from motion trajectories using stacked LSTM network. In IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, volume 2018-March, pages 327{332. Institute of Electrical and Electronics Engineers Inc. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2):197{227. Scherer, D., M uller, A., and Behnke, S. (2010). Evaluation of pooling operations in convolu- tional architectures for object recognition. In International conference on articial neural networks, pages 92{101. Springer, Berlin, Heidelberg. Schmidhuber, J. (2015). Deep Learning in neural networks: An overview. Schwartz, L. M., Feng, S., Thorpe, M. F., and Sen, P. N. (1985). Behavior of depleted elastic networks: Comparison of eective-medium and numerical calculations. Physical Review B, 32(7):4607{4617. Sedigh, M. G., Jahangiri, M., Liu, P. K. T., Sahimi, M., and Tsotsis, T. T. (2000). Structural characterization of polyetherimide-based carbon molecular sieve membranes. AIChE Journal, 46(11):2245{2255. 189 Seiert, C., Khoshgoftaar, T. M., Van Hulse, J., and Napolitano, A. (2008). RUSBoost: Im- proving classication performance when training data is skewed. In 2008 19th International Conference on Pattern Recognition, pages 1{4. IEEE. Seiert, C., Khoshgoftaar, T. M., Van Hulse, J., and Napolitano, A. (2010). RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. SYSTEMS AND HUMANS, 40(1). Shams, R., Masihi, M., Boozarjomehry, R. B., and Blunt, M. J. (2019). Coupled generative adversarial and auto-encoder neural networks to reconstruct three-dimensional multi-scale porous media. Journal of Petroleum Science and Engineering, page 106794. Sharma, N., Ray, A., Sharma, S., Shukla, K., Pradhan, S., and Aggarwal, L. (2008). Segmen- tation and classication of medical images using texture-primitive features: Application of BAM-type articial neural network. Journal of Medical Physics, 33(3):119{126. Shashanka, R. and Chaira, D. (2016). Eects of Nano-Y2O3 and Sintering Parameters on the Fabrication of PM Duplex and Ferritic Stainless Steels. Acta Metallurgica Sinica (English Letters), 29(1):58{71. Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-C., and Kong Observatory, H. (2015). Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Advances in neural information processing systems, 28:802{810. Shi, Y., Lin, S., Staats, K. A., Li, Y., Chang, W. H., Hung, S. T., Hendricks, E., Linares, G. R., Wang, Y., Son, E. Y., Wen, X., Kisler, K., Wilkinson, B., Menendez, L., Sugawara, T., Woolwine, P., Huang, M., Cowan, M. J., Ge, B., Koutsodendris, N., Sandor, K. P., Komberg, J., Vangoor, V. R., Senthilkumar, K., Hennes, V., Seah, C., Nelson, A. R., Cheng, T. Y., Lee, S. J. J., August, P. R., Chen, J. A., Wisniewski, N., Hanson-Smith, V., Belgard, T. G., Zhang, A., Coba, M., Grunseich, C., Ward, M. E., Van Den Berg, L. H., Pasterkamp, R. J., Trotti, D., Zlokovic, B. V., and Ichida, J. K. (2018). Haploinsuciency leads to neurodegeneration in C9ORF72 ALS/FTD human induced motor neurons. Nature Medicine, 24(3):313{325. Shin, D., Lee, J., Gong, J. R., and Cho, K. H. (2017). Percolation transition of cooperative mutational eects in colorectal tumorigenesis. Nature Communications, 8(1):1{14. 190 Silva, K. P. T., Yusufaly, T. I., Chellamuthu, P., and Boedicker, J. Q. (2019). Disruption of microbial communication yields a two-dimensional percolation transition. Physical Review E, 99(4):042409. Skaggs, T. H. (2011). Assessment of critical path analyses of the relationship between permeabil- ity and electrical conductivity of pore networks. Advances in Water Resources, 34(10):1335{ 1342. Snyder, J. C., Rupp, M., Hansen, K., M uller, K. R., and Burke, K. (2012). Finding density functionals with machine learning. Physical Review Letters, 108(25):253002. Sobel, I. (2014). History and denition of the sobel operators. Retrieved from the World Wide Web, page 1505. Sola, J. and Sevilla, J. (1997). Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Transactions on Nuclear Science, 44(3 PART 3):1464{1468. Song, J., Tang, S., Xiao, J., Wu, F., and Zhang, Z. M. (2016). LSTM-in-LSTM for generating long descriptions of images. Computational Visual Media, 2(4):379{388. Srinivasan, S., Karra, S., Hyman, J., Viswanathan, H., and Srinivasan, G. (2019). Model reduction for fractured porous media: a machine learning approach for identifying main ow pathways. Computational Geosciences, 23(3):617{629. Srividhya, S., Basant, K., Gupta, R. K., Rajagopal, A., and Reddy, J. N. (2018). In uence of the homogenization scheme on the bending response of functionally graded plates. Acta Mechanica, 229(10):4071{4089. Stauer, D. and Aharony, A. (1994). Introduction To Percolation Theory. Taylor & Francis, London, 2 edition. Straubhaar, J., Philippe,., Gr egoire Mariethoz, R.., Froidevaux, R., Besson, O., Straubhaar, J., Renard, P., Mariethoz,. G., Froidevaux, R., and Besson, O. (2011). An Improved Parallel Multiple-point Algorithm Using a List Approach. Math Geosci, 43:305{328. 191 Strebelle, S. (2002). Conditional simulation of complex geological structures using multiple- point statistics. Mathematical Geology, 34(1):1{21. Strebelle, S. and Zhang, T. (2005). Non-Stationary Multiple-point Geostatistical Models. Springer, Dordrecht. Su, C., Yeo, H., Xie, Q., Wang, X., Zhang, S., Yeo, C. S. H., Xie, Q., Wang, X., and Zhang, S. (2020). Understanding and optimization of thin lm nanocomposite membranes for reverse osmosis with machine learning. Journal of Membrane Science, 606:118135. Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in neural information processing systems, pages 3104{3112. Suwanmethanond, V., Goo, E., Liu, P. K., Johnston, G., Sahimi, M., and Tsotsis, T. T. (2000). Porous silicon carbide sintered substrates for high-temperature membranes. Industrial and Engineering Chemistry Research, 39(9):3264{3271. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 07-12- June, pages 1{9. IEEE Computer Society. Tahmasebi, P. (2018a). Accurate modeling and evaluation of microstructures in complex ma- terials. Physical Review E, 97(2):023307. Tahmasebi, P. (2018b). Multiple Point Statistics: A Review. In Handbook of Mathematical Geosciences, pages 613{643. Springer International Publishing, Cham. Tahmasebi, P. (2018c). Nanoscale and multiresolution models for shale samples. Fuel, 217:218{ 225. Tahmasebi, P., Javadpour, F., and Sahimi, M. (2015a). Multiscale and multiresolution modeling of shales and their ow and morphological properties. Scientic Reports, 5(1):16373. Tahmasebi, P., Javadpour, F., and Sahimi, M. (2015b). Three-Dimensional Stochastic Charac- terization of Shale SEM Images. Transport in Porous Media, 110(3):521{531. 192 Tahmasebi, P., Javadpour, F., and Sahimi, M. (2017). Data mining and machine learning for identifying sweet spots in shale reservoirs. Expert Systems with Applications, 88:435{447. Tahmasebi, P. and Kamrava, S. (2018). Rapid multiscale modeling of ow in porous media. Physical Review E, 98(5):052901. Tahmasebi, P., Kamrava, S., Bai, T., and Sahimi, M. (2020). Machine Learning in Geo- and Environmental Sciences: From Small to Large Scale. Advances in Water Resources, 142:103619. Tahmasebi, P. and Sahimi, M. (2012). Reconstruction of three-dimensional porous media using a single thin section. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 85(6):066709. Tahmasebi, P. and Sahimi, M. (2013). Cross-correlation function for accurate reconstruction of heterogeneous media. Physical Review Letters, 110(7):078002. Tahmasebi, P. and Sahimi, M. (2015a). Geostatistical Simulation and Reconstruction of Porous Media by a Cross-Correlation Function and Integration of Hard and Soft Data. Transport in Porous Media, 107(3):871{905. Tahmasebi, P. and Sahimi, M. (2015b). Reconstruction of nonstationary disordered mate- rials and media: Watershed transform and cross-correlation function. Physical Review E, 91(3):032401. Tahmasebi, P. and Sahimi, M. (2016a). Enhancing multiple-point geostatistical modeling: 1. Graph theory and pattern adjustment. Water Resources Research, 52(3):2074{2098. Tahmasebi, P. and Sahimi, M. (2016b). Enhancing multiple-point geostatistical modeling: 2. Iterative simulation and multiple distance function. Water Resources Research, 52(3):2099{ 2122. Tahmasebi, P., Sahimi, M., and Caers, J. (2014). MS-CCSIM: Accelerating pattern-based geostatistical simulation of categorical variables using a multi-scale search in fourier space. Computers and Geosciences, 67:75{88. 193 Tahmasebi, P., Sahimi, M., and Shirangi, M. (2018). Rapid Learning-Based and Geologically Consistent History Matching. Transport in Porous Media. Tan, X., Tahmasebi, P., Caers, J., Tan, X., Tahmasebi,. P., and Caers,. J. (2014). Comparing Training-Image Based Algorithms Using an Analysis of Distance. Math Geosci, 46:149{169. Tang, M., Liu, Y., and Durlofsky, L. J. (2020). A deep-learning-based surrogate model for data assimilation in dynamic subsurface ow problems. Journal of Computational Physics, 413:109456. Tembely, M. and AlSumaiti, A. (2019). Deep Learning for a Fast and Accurate Prediction of Complex Carbonate Rock Permeability From 3D Micro-CT Images. In Abu Dhabi Interna- tional Petroleum Exhibition & Conference. Society of Petroleum Engineers (SPE). Thompson, A. H. (1991). Fractals in Rock Physics. Annual Review of Earth and Planetary Sciences, 19(1):237{262. Thovert, J. F., Yousean, F., Spanne, P., Jacquin, C. G., and Adler, P. M. (2001). Grain recon- struction of porous media: Application to a low-porosity Fontainebleau sandstone. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26{ 31. Tjaden, B., Lane, J., Withers, P. J., Bradley, R. S., Brett, D. J., and Shearing, P. R. (2016). The application of 3D imaging techniques, simulation and diusion experiments to explore transport properties in porous oxygen transport membrane support materials. Solid State Ionics, 288:315{321. Toda, H., Yamamoto, S., Kobayashi, M., Uesugi, K., and Zhang, H. (2008). Direct mea- surement procedure for three-dimensional local crack driving force using synchrotron X-ray microtomography. Acta Materialia, 56(20):6027{6039. Torquato, S. (2002). Random Heterogeneous Materials, volume 16 of Interdisciplinary Applied Mathematics. Springer New York, New York. 194 Touzani, S., Granderson, J., and Fernandes, S. (2018). Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy and Buildings, 158:1533{1543. Tran, A. and Tran, H. (2019). Data-driven high-delity 2D microstructure reconstruction via non-local patch-based image inpainting. Acta Materialia, 178:207{218. Uijlings, J. R., Van De Sande, K. E., Gevers, T., and Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2):154{171. van der Linden, J. H., Narsilio, G. A., and Tordesillas, A. (2016). Machine learning framework for analysis of transport through complex networks in porous, granular media: A focus on permeability. Physical Review E, 94(2):022904. Van Rheenen, W., Shatunov, A., Dekker, A. M., McLaughlin, R. L., Diekstra, F. P., Pulit, S. L., Van Der Spek, R. A., V~ osa, U., De Jong, S., Robinson, M. R., Yang, J., Fogh, I., Van Doormaal, P. T., Tazelaar, G. H., Koppers, M., Blokhuis, A. M., Sproviero, W., Jones, A. R., Kenna, K. P., Van Eijk, K. R., Harschnitz, O., Schellevis, R. D., Brands, W. J., Medic, J., Menelaou, A., Vajda, A., Ticozzi, N., Lin, K., Rogelj, B., Vrabec, K., Ravnik-Glava, M., Koritnik, B., Zidar, J., Leonardis, L., Gro selj, L. D., Millecamps, S., Salachas, F., Meininger, V., De Carvalho, M., Pinto, S., Mora, J. S., Rojas-Garc a, R., Polak, M., Chandran, S., Colville, S., Swingler, R., Morrison, K. E., Shaw, P. J., Hardy, J., Orrell, R. W., Pittman, A., Sidle, K., Fratta, P., Malaspina, A., Topp, S., Petri, S., Abdulla, S., Drepper, C., Sendtner, M., Meyer, T., Opho, R. A., Staats, K. A., Wiedau-Pazos, M., Lomen-Hoerth, C., Van Deerlin, V. M., Trojanowski, J. Q., Elman, L., McCluskey, L., Basak, A. N., Tunca, C., Hamzeiy, H., Parman, Y., Meitinger, T., Lichtner, P., Radivojkov-Blagojevic, M., Andres, C. R., Maurel, C., Bensimon, G., Landwehrmeyer, B., Brice, A., Payan, C. A., Saker-Delye, S., D urr, A., Wood, N. W., Tittmann, L., Lieb, W., Franke, A., Rietschel, M., Cichon, S., N othen, M. M., Amouyel, P., Tzourio, C., Dartigues, J. F., Uitterlinden, A. G., Rivadeneira, F., Estrada, K., Hofman, A., Curtis, C., Blauw, H. M., Van Der Kooi, A. J., De Visser, M., Goris, A., Weber, M., Shaw, C. E., Smith, B. N., Pansarasa, O., Cereda, C., Del Bo, R., Comi, G. P., D'Alfonso, S., Bertolin, C., Sorar u, G., Mazzini, L., Pensato, V., Gellera, C., Tiloca, C., Ratti, A., Calvo, A., Moglia, C., Brunetti, M., Arcuti, S., Capozzo, R., Zecca, C., Lunetta, C., Penco, S., Riva, N., Padovani, A., Filosto, M., Muller, B., Stuit, R. J., Blair, 195 I., Zhang, K., McCann, E. P., Fita, J. A., Nicholson, G. A., Rowe, D. B., Pamphlett, R., Kiernan, M. C., Grosskreutz, J., Witte, O. W., Ringer, T., Prell, T., Stubendor, B., Kurth, I., H ubner, C. A., Nigel Leigh, P., Casale, F., Chio, A., Beghi, E., Pupillo, E., Tortelli, R., Logroscino, G., Powell, J., Ludolph, A. C., Weishaupt, J. H., Robberecht, W., Van Damme, P., Franke, L., Pers, T. H., Brown, R. H., Glass, J. D., Landers, J. E., Hardiman, O., Andersen, P. M., Corcia, P., Vourc'H, P., Silani, V., Wray, N. R., Visscher, P. M., De Bakker, P. I., Van Es, M. A., Jeroen Pasterkamp, R., Lewis, C. M., Breen, G., Al-Chalabi, A., Van Den Berg, L. H., and Veldink, J. H. (2016). Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nature Genetics, 48(9):1043{1048. Vandal, T., Kodra, E., Ganguly, S., Michaelis, A., Nemani, R., and Ganguly, A. R. (2017). DeepSD: Generating High Resolution Climate Change Projections through Single Image Super-Resolution. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17, pages 1663{1672, New York. ACM Press. Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P. A. (2008). Extracting and compos- ing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096{1103. Vohra, R., Goel, K., and Sahoo, J. K. (2015). Modeling temporal dependencies in data using a DBN-LSTM. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015. Institute of Electrical and Electronics Engineers Inc. Wang, H., Shi, X., Yeung, D.-Y., and Kong, H. (2016). Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks. In Advances in Neural Information Pro- cessing Systems, pages 415{423. Wang, L., Guo, S., Huang, W., and Qiao, Y. (2015). Places205-VGGNet Models for Scene Recognition. arXiv preprint arXiv:1508.01667. Wang, M., Williams, J., Jiang, L., De Carlo, F., Jing, T., and Chawla, N. (2011). Dendritic morphology of -Mg during the solidication of Mg-based alloys: 3D experimental charac- 196 terization by X-ray synchrotron tomography and phase-eld simulations. Scripta Materialia, 65(10):855{858. Wang, Y., Rahman, S. S., and Arns, C. H. (2018a). Super resolution reconstruction of -CT image of rock sample using neighbour embedding algorithm. Physica A: Statistical Mechanics and its Applications, 493:177{188. Wang, Y., Teng, Q., He, X., Feng, J., and Zhang, T. (2018b). CT-image Super Resolution Using 3D Convolutional Neural Network. arXiv Computer Vision and Pattern Recognition. Weck, A., Wilkinson, D., Maire, E., and Toda, H. (2008). Visualization by X-ray tomography of void growth and coalescence leading to fracture in model materials. Acta Materialia, 56(12):2919{2928. Wei, H., Zhao, S., Rong, Q., and Bao, H. (2018). Predicting the eective thermal conductivities of composite materials and porous media by machine learning methods. International Journal of Heat and Mass Transfer, 127:908{916. Wen, G., Tang, M., and Benson, S. M. (2019). Multiphase ow prediction with deep neural networks. arXiv preprint arXiv:1910.09657. Werbos, P. J. (1990). Backpropagation Through Time: What It Does and How to Do It. Proceedings of the IEEE, 78(10):1550{1560. Wu, H., Fang, W. Z., Kang, Q., Tao, W. Q., and Qiao, R. (2019). Predicting Eective Diusivity of Porous Media from Images by Deep Learning. Scientic Reports, 9(1):1{12. Wu, J., Yin, X., and Xiao, H. (2018). Seeing permeability from images: fast prediction with convolutional neural networks. Science Bulletin, 63(18):1215{1222. Wu, Z., Jiang, Y.-G., Wang, J., Pu, J., and Xue, X. (2014). Exploring Inter-feature and Inter- class Relationships with Deep Neural Networks for Video Classication. In Proceedings of the ACM International Conference on Multimedia - MM '14, pages 167{176. Xu, H. and Su, F. (2015). Robust seed localization and growing with deep convolutional features for scene text detection. In ICMR 2015 - Proceedings of the 2015 ACM International 197 Conference on Multimedia Retrieval, pages 387{394, New York, New York, USA. Association for Computing Machinery, Inc. Xu, P. and Yu, B. (2008). Developing a new form of permeability and Kozeny{Carman constant for homogeneous porous media by means of fractal geometry. Advances in Water Resources, 31(1):74{81. Yang, Y., Dong, J., Sun, X., Lima, E., Mu, Q., and Wang, X. (2018a). A CFCC-LSTM Model for Sea Surface Temperature Prediction. IEEE Geoscience and Remote Sensing Letters, 15(2):207{211. Yang, Y., Sautiere, G., Ryu, J. J., and Cohen, T. S. (2020). Feedback Recurrent Autoencoder. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, volume 2020-May, pages 3347{3351. Institute of Electrical and Electronics En- gineers Inc. Yang, Z., Li, X., Brinson, L. C., Choudhary, A. N., Chen, W., and Agrawal, A. (2018b). Microstructural materials design via deep adversarial learning methodology. Journal of Me- chanical Design, Transactions of the ASME, 140(11). Yang, Z., Yabansu, Y. C., Al-Bahrani, R., Liao, W.-k., Choudhary, A. N., Kalidindi, S. R., and Agrawal, A. (2018c). Deep learning approaches for mining structure-property linkages in high contrast composites from simulation datasets. Computational Materials Science, 151:278{287. Yang, Z., Yabansu, Y. C., Jha, D., Liao, W.-k., Choudhary, A. N., Kalidindi, S. R., and Agrawal, A. (2019). Establishing structure-property localization linkages for elastic defor- mation of three-dimensional high contrast composites using deep learning approaches. Acta Materialia, 166:335{345. Yeong, C. L. and Torquato, S. (1998a). Reconstructing random media. II. Three-dimensional media from two-dimensional cuts. Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 58(1):224{233. Yeong, C. L. Y. and Torquato, S. (1998b). Reconstructing random media. Physical Review E, 57(1):495{506. 198 Yun, W. (2017). Deep Learning: Automated Surface Characterization of Porous Media to Understand Geological Fluid Flow. Technical report, Stanford University. Zachary, C. E. and Torquato, S. (2011). Improved reconstructions of random media using dilation and erosion processes. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 84(5):056102. Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. arXiv preprint arXiv:1212.5701. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818{833. Springer, Cham. Zhang, B., Kotsalis, G., Khan, J., Xiong, Z., Igou, T., Lan, G., and Chen, Y. (2020). Backwash sequence optimization of a pilot-scale ultraltration membrane system using data-driven modeling for parameter forecasting. Journal of Membrane Science, 612:118464. Zhang, F., Teng, Q., Chen, H., He, X., and Dong, X. (2021). Slice-to-voxel stochastic recon- structions on porous media with hybrid deep generative model. Computational Materials Science, 186:110018. Zhang, T., Switzer, P., and Journel, A. (2006). Filter-based classication of training image patterns for spatial simulation. Mathematical Geology, 38(1):63{80. Zhang, W., Liu, J., and Wei, T. C. (2019a). Machine learning of phase transitions in the percolation and XY models. Physical Review E, 99(3):032142. Zhang, Z., Hong, Y., Hou, B., Zhang, Z., Negahban, M., and Zhang, J. (2019b). Accelerated discoveries of mechanical properties of graphene using machine learning and high-throughput computation. Carbon, 148:115{123. Zhou, G. B., Wu, J., Zhang, C. L., and Zhou, Z. H. (2016). Minimal gated unit for recurrent neural networks. International Journal of Automation and Computing, 13(3):226{234. Zhou, M., Vassallo, A., and Wu, J. (2020). Toward the inverse design of MOF membranes for ecient D2/H2 separation by combination of physics-based and data-driven modeling. Journal of Membrane Science, 598:117675. 199 Zhou, Z. H., Jiang, Y., Yang, Y. B., and Chen, S. F. (2002). Lung cancer cell identication based on articial neural network ensembles. Articial Intelligence in Medicine, 24(1):25{36. Zhu, Y., Zabaras, N., Koutsourelakis, P. S., and Perdikaris, P. (2019). Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantication without labeled data. Journal of Computational Physics, 394:56{81. Zuo, Z., Wang, G., Shuai, B., Zhao, L., and Yang, Q. (2015). Exemplar based Deep Discrim- inative and Shareable Feature Learning for scene image classication. Pattern Recognition, 48(10):3004{3015. 200
Abstract (if available)
Abstract
In recent years, significant breakthroughs in exploring big data, recognition of complex patterns, and predicting intricate variables have been made. One efficient way of analyzing big data, recognizing complex patterns, and extracting trends is through machine-learning (ML) algorithms. The field of porous media has also witnessed much progress, and recent progress in developing ML techniques has benefited various problems in porous media across disparate scales. Thus, it is becoming increasingly clear that it is imperative to adopt advanced ML methods for the problems in porous media because they enable researchers to solve many difficult problems. At the same time, one can use the already existing extensive knowledge of porous media to endow ML algorithms and develop novel physics-guided methods. ❧ First, a comprehensive review of the basic concepts of ML and advanced methods, known as deep-learning algorithms is provided in Chapter 1. Then, the applications of such methods to various problems in porous media are reviewed and criticized. In this chapter, a variety of problems related to porous media, starting from fine- to large-scale systems are reviewed carefully where the emphasis has been put on how ML can help to solve or facilitate long-standing problems. ❧ Accounting for the morphology of nanoscale materials, which represent highly heterogeneous porous media, is a difficult problem. Although two- or three-dimensional images of such materials may be obtained and analyzed, they either do not capture the nanoscale features of the porous media, or they are too small to be an accurate representative of the media, or both. Increasing the resolution of such images is also costly. While high-resolution images may be used to train a deep-learning network in order to increase the quality of low-resolution images, an important obstacle is the lack of a large number of images for the training, as the accuracy of the network's predictions depends on the extent of the training data. Generating a large number of high-resolution images by experimental means is, however, very time consuming and costly, hence limiting the application of deep-learning algorithms to such an important class of problems. To address the issue we have proposed a novel hybrid algorithm in Chapter 2, by which a stochastic reconstruction method is used to generate a large number of plausible images of a nanoscale material, using very few input images at very low cost, and then train a deep-learning convolutional network by the stochastic realizations. We refer to the method as hybrid stochastic deep-learning (HSDL) algorithm. The results indicate promising improvement in the quality of the images, the accuracy of which is confirmed by visual, as well as quantitative comparison between several of their statistical properties. The results are also compared with those obtained by the regular deep learning algorithm without using an enriched and large dataset for training, as well as with those generated by bicubic interpolation. ❧ Flow, transport, mechanical, and fracture properties of porous media depend on their morphology and are usually estimated by experimental and/or computational methods. The precision of the computational approaches depends on the accuracy of the model that represents the morphology. If high accuracy is required, the computations and even experiments can be quite time-consuming. At the same time, linking the morphology directly to the permeability, as well as other important flow and transport properties, has been a long-standing problem. In Chapter 3, we develop a new network that utilizes a deep learning (DL) algorithm to link the morphology of porous media to their permeability. The input data include three-dimensional images of the porous material, hundreds of their stochastic realizations generated by a reconstruction method, and synthetic unconsolidated porous media produced by a Boolean method. To develop the network, we first extract important features of the images using a DL algorithm and then feed them to an ANN to estimate the permeabilities. We demonstrate that the network is successfully trained, such that it can develop accurate correlations between the morphology of porous media and their effective permeability. The high accuracy of the network is demonstrated by its predictions for the permeability of a variety of porous media. ❧ Time and cost are two main hurdles to acquiring a large number of digital image I of the microstructure of materials. Thus, use of stochastic methods for producing plausible realizations of materials' morphology based on one or very few images has become an increasingly common practice in their modeling. The accuracy of the realizations is often evaluated using two-point microstructural descriptors or physics-based modeling of certain phenomena in the materials, such as transport processes or fluid flow. In many cases, however, two-point correlation functions do not provide accurate evaluation of the realizations, as they are usually unable to distinguish between high- and low-quality reconstructed models. Calculating flow and transport properties of the realization is an accurate way of checking the quality of the realizations, but it is computationally expensive. In Chapter 4, a method based on machine learning is proposed for evaluating stochastic approaches for reconstruction of materials, which is applicable to any of such methods. The method reduces the dimensionality of the realizations using an unsupervised deep-learning algorithm by compressing images and realizations of materials. Two criteria for evaluating the accuracy of a reconstruction algorithm are then introduced. One, referred to as the internal uncertainty space, is based on the recognition that for a reconstruction method to be effective, the differences between the realizations that it produces must be reasonably wide, so that they faithfully represent all the possible spatial variations in the materials' microstructure. The second criterion recognizes that the realizations must be close to the original I and, thus, it quantifies the similarity based on an external uncertainty space. Finally, the ratio of two uncertainty indices associated with the two criteria is considered as the final score of the accuracy of a stochastic algorithm, which provides a quantitative basis for comparing various realizations and the approaches that produce them. The proposed method is tested with images of three types of heterogeneous materials in order to evaluate four stochastic reconstruction algorithms. ❧ Percolation and fracture propagation in disordered solids represent two important problems in science and engineering that are characterized by phase transitions: loss of macroscopic connectivity at the percolation threshold pc and formation of a macroscopic fracture network at the incipient fracture point (IFP). Percolation also represents the fracture problem in the limit of very strong disorder. An important unsolved problem is accurate prediction of physical properties of systems undergoing such transitions, given limited data far from the transition point. There is currently no theoretical method that can use limited data for a region far from a transition point pc or the IFP and predict the physical properties all the way to that point, including their location. In Chapter 5, a deep neural network (DNN) is used for predicting such properties of two- and three-dimensional systems and in particular their percolation probability, the threshold pc, the elastic moduli, and the universal Poisson ratio at pc. All the predictions are in excellent agreement with the data. In particular, the DNN predicts correctly pc, even though the training data were for the state of the systems far from pc. This opens up the possibility of using the DNN for predicting physical properties of many types of disordered materials that undergo phase transformation, for which limited data are available for only far from the transition point. ❧ Next, a machine learning (ML) method is proposed to classify the ALS and non-ALS variants based on 24 variables in five different datasets. This represents a highly imbalanced classification problem, as a large majority of the data represents non- ALS variants. As such, classifying the data is a difficult problem. In Chapter 6, the proposed ML method classifies the five datasets with very high accuracy. In particular, it predicts the ALS variants with 100 percent accuracy, while its accuracy for the non-ALS variants range from 92.8 to 98 percent. The trained classifier also identifies nine most influential mutation assessors that help distinguishing the two classes from each other. They are FATHMM score, PROVEAN score, Vest3 score, CADD phred, DANN score, meta-SVM score, phyloP7way vertebrate, metaLR, and REVEL. Thus, they may be used in future studies in order to reduce the time and cost of collecting data and carrying out experimental tests, as well as in studies with more focus on the recognized assessors. ❧ Although the morphology of porous membranes is the key factor in determining their flow, transport and separation properties, a general relation between the morphology and the physical properties has been difficult to identify. One promising approach to develop such a relation is through the application of a machine-learning (ML) algorithm to the problem. Over the last decade significant developments in the development of the ML approaches have led to many breakthroughs in various fields of science and engineering, but their application to porous media has been very limited. In Chapter 7, a deep network is developed for predicting flow properties of porous membranes based on their morphology. The predicted properties include the spatial distributions of the fluid pressure and velocity throughout the entire membranes, provided that the deep network is properly trained by using high-resolution images of the membranes and the pressure and velocity distributions in their pore space at certain points in time. The network includes a residual U-net for developing a mapping between the input and output images, as well as a recurrent network for identifying physical correlations between the output data at various times. The results demonstrate that the deep network provides highly accurate predictions for the properties of interest. Thus, such a network may be used for predicting flow and transport properties of many other types of porous materials, as well as designing membranes for a specific application. ❧ Finally, a novel physics-informed ML algorithm is introduced in Chapter 8 for studying fluid flow in fine-scale porous media. The network embeds the Navier-Stokes equations in its learning process. Because of the complexity of such systems, the high-resolution images showing the morphology of the systems used as the input to the network and outputs are the velocity and pressure data obtained by the Navier-Stokes equations over a time interval. ML algorithms are often criticized for being a black-box, where the governing equations embedded in the proposed network decrease the chaos and blindness within the network. Since the output data are a sequence of data and both input and output data are in form of images, the proposed network is a physics-informed recurrent encoder-decoder (PIRED). The developed PIRED network allows using less data for the training process compared to a data-driven network while providing highly accurate predictions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Efficient simulation of flow and transport in complex images of porous materials and media using curvelet transformation
PDF
Multiscale and multiresolution approach to characterization and modeling of porous media: From pore to field scale
PDF
Chemical and mechanical deformation of porous media and materials during adsorption and fluid flow
PDF
Effective flow and transport properties of deforming porous media and materials: theoretical modeling and comparison with experimental data
PDF
Simulation and machine learning at exascale
PDF
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
PDF
Molecular- and continuum-scale simulation of single- and two-phase flow of CO₂ and H₂O in mixed-layer clays and a heterogeneous sandstone
PDF
Deep learning architectures for characterization and forecasting of fluid flow in subsurface systems
PDF
High throughput computational framework for synthesis and accelerated discovery of dielectric polymer materials using polarizable reactive molecular dynamics and graph neural networks
PDF
Exploring properties of silicon-carbide nanotubes and their composites with polymers
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Continuum modeling of reservoir permeability enhancement and rock degradation during pressurized injection
PDF
Dynamic topology reconfiguration of Boltzmann machines on quantum annealers
PDF
Stability and folding rate of proteins and identification of their inhibitors
PDF
Efficient connectivity assessment of heterogeneous porous media using graph theory
PDF
Dynamics of water in nanotubes: liquid below freezing point and ice-like near boiling point
PDF
Machine learning for efficient network management
PDF
Molecular dynamics studies of protein aggregation in unbounded and confined media
PDF
Deep learning for subsurface characterization and forecasting
PDF
Feature learning for imaging and prior model selection
Asset Metadata
Creator
Kamrava, Serveh
(author)
Core Title
Machine-learning approaches for modeling of complex materials and media
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Chemical Engineering
Publication Date
03/20/2021
Defense Date
03/05/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
fluid flow,machine learning,morphology,OAI-PMH Harvest,porous materials
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Sahimi, Muhammad (
committee chair
), De Barros, Felipe (
committee member
), Kalia, Rajiv (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
SER.KM85@GMAIL.COM,servehka@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-428373
Unique identifier
UC11667568
Identifier
etd-KamravaSer-9336.pdf (filename),usctheses-c89-428373 (legacy record id)
Legacy Identifier
etd-KamravaSer-9336.pdf
Dmrecord
428373
Document Type
Dissertation
Rights
Kamrava, Serveh
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
fluid flow
machine learning
morphology
porous materials