Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
(USC Thesis Other)
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Latent Space Dynamics for Interpretation, Monitoring, and Prediction in
Industrial Systems
by
Yingxiang Liu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2024
Copyright 2024 Yingxiang Liu
Dedication
To my family.
ii
Acknowledgements
I would like to express my sincere appreciation to Professor S. Joe Qin, who was an exceptional
mentor during the early stage of my time at USC. I am deeply grateful to my advisor, Professor
Behnam Jafarpour, for his exceptional guidance and support throughout the later stage of my
PhD study.
I would also like to express my gratitude to my PhD qualifying exam committee members:
Professor Antonio Ortega, Professor Mahta Moghaddam, Professor Pierluigi Nuzzo from the Ming
Hsieh Department of Electrical Engineering, and Professor Pin Wang from the Mork Family Department of Chemical Engineering and Materials Science. Special thanks to Professor Ortega and
Professor Nuzzo for attending my dissertation defense and providing insightful comments.
I wish to acknowledge the members of Professor Qin’s research group and Professor Jafarpour’s research group for their valuable discussions and support. Additionally, I am grateful to
my girlfriend, my friends, my dog, and my car for the cherished memories we created, the places
we visited, and the journeys we undertook together.
Finally, I want to express my deepest gratitude to my parents for their endless love, encouragement, and support.
iii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Interpretation of High-dimensional Data and Process Troubleshooting . . 3
1.1.2 Condition Monitoring and Fault Detection . . . . . . . . . . . . . . . . . . 4
1.1.3 Monitoring and Forecasting Techniques in CO2 Geological Storage . . . . 5
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2: Exploring Latent Space Dynamics for Interpretation and Troubleshooting in
Industrial Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Dyanmic Latent Variable Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Dynamic Feature Analysis via DiCCA . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Dynamic Latent Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 DiCCAX: DiCCA with Exogenous Variables . . . . . . . . . . . . . . . . . 16
2.2.4 Latent Feature Contribution Analysis . . . . . . . . . . . . . . . . . . . . . 17
2.3 DELFA Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 DELFA for Troubleshooting Plant-wide Oscillations . . . . . . . . . . . . . . . . . 22
2.4.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Removing the Effect of the Variable TI8.PV and TC3.OP . . . . . . . . . . 26
2.4.3 Analysis of the Low-frequency Oscillation Feature . . . . . . . . . . . . . 27
2.4.4 Analysis of the High-frequency Oscillation Feature . . . . . . . . . . . . . 30
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 3: Leveraging Latent Space Dynamics for Multivariate Time Series Prediction
and Condition Monitoring in Industrial Systems . . . . . . . . . . . . . . . . . 32
iv
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Principal Component Analysis for Process Monitoring . . . . . . . . . . . 35
3.2.2 Long-Short Term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 Reversible Instance Normalization (RevIN) . . . . . . . . . . . . . . . . . . 40
3.3 LSTM Encoder-decoder with Regularized Hidden Dynamics for Fault Detection . 41
3.3.1 LSTM Encoder-Decoder with RevIN . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Regularizing Hidden States of LSTM Encoder-Decoder . . . . . . . . . . . 45
3.3.3 Fault Detection with Regularized LSTM Encoder-Decoder . . . . . . . . . 46
3.3.3.1 Monitoring Dynamic Variations . . . . . . . . . . . . . . . . . . 47
3.3.3.2 Monitoring Static Variations . . . . . . . . . . . . . . . . . . . . 48
3.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.1 Case Study 1: Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.1.1 Fault 1: Change in Static Relationship . . . . . . . . . . . . . . . 53
3.4.1.2 Fault 2: Change in Dynamic Relationship . . . . . . . . . . . . . 55
3.4.2 Case Study 2: Tennessee Eastman Process . . . . . . . . . . . . . . . . . . 56
3.4.2.1 Monitoring Result of Setpoint Change . . . . . . . . . . . . . . . 60
3.4.2.2 Monitoring Result of Disturbance Case 2 (IDV2) . . . . . . . . . 63
3.4.2.3 Monitoring Result of Disturbance Case 8 (IDV8) . . . . . . . . . 66
3.4.3 Case Study 3: Geothermal Power Plant . . . . . . . . . . . . . . . . . . . . 67
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Chapter 4: Learning Latent Space Dynamics for Estimating and Predicting System
Conditions with Limited Measurements . . . . . . . . . . . . . . . . . . . . . . 74
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Extract Latent Space Dynamics from Spatio-temporal Data . . . . . . . . . . . . . 76
4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.2 CO2-4DNet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.2.1 Spatial Encoder and Decoder with Depthwise Separable
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 77
4.2.2.2 Processor for Modeling Spatio-temporal Dynamics . . . . . . . . 79
4.2.3 CO2-4DNet for Reconstruction and Prediction . . . . . . . . . . . . . . . . 81
4.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.1 Numerical Simulation for Generating Training Dataset . . . . . . . . . . . 83
4.3.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.4 Comparison with Neural Network-based Proxy Model Requiring
Geological Information as Input . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.4.1 Comparison of Testing Errors . . . . . . . . . . . . . . . . . . . . 89
4.3.4.2 Comparison of Onset Time Errors . . . . . . . . . . . . . . . . . 92
4.3.5 Error Comparison of CO2-4DNet with Varying Input Sources . . . . . . . 95
4.3.6 Generalization to Unseen Scenarios . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
v
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
vi
List of Tables
4.1 Cases with varying types of measurements . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Model inputs and outputs and their corresponding cases . . . . . . . . . . . . . . 88
vii
List of Figures
2.1 DELFA: a plant-wide troubleshooting procedure using operational data . . . . . . 20
2.2 Schematic diagram of the studied plant: First letter: F(flowrate); P(pressure);
T(temperature). Second letter: C(control); I(indicator). Last number: serial number 23
2.3 Measurements collected from the industrial plant . . . . . . . . . . . . . . . . . . 23
2.4 Preliminary analysis: comparing latent variables extracted by DiCCA and PCA . . 24
2.5 Preliminary analysis: composite loadings and composite weights . . . . . . . . . . 25
2.6 Leading dynamic latent variables: deleting TI8.PV and TC3.OP (left subplot) and
treating TI8.PV and TC3.OP as exogenous variables (right subplot) . . . . . . . . . 26
2.7 Fast Fourier transform analysis of the dynamic latent features . . . . . . . . . . . 28
2.8 Composite loadings and weights for low-frequency oscillation in dynamic latent
variables 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.9 Composite loadings and weights for high-frequency oscillation in dynamic latent
variable 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 LSTM encoder-decoder structure with RevIN . . . . . . . . . . . . . . . . . . . . . 42
3.2 Proposed fault detection method using regularized LSTM encoder-decoder . . . . 47
3.3 Monitoring results for the synthetic case study with changes in static variation . . 54
3.4 Comparison of the distribution of the fault-affected variable: before and after
instance normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Comparison of reconstruction error distributions before and after fault
occurrence for unaffected and fault-affected variables . . . . . . . . . . . . . . . . 56
viii
3.6 Monitoring results for the synthetic case study with changes in dynamic variation 57
3.7 Comparison of the distribution of hidden states prediction errors from the
regularized and unregularized models on two fault-free datasets . . . . . . . . . . 59
3.8 Visualization of disturbance cases with PCA model . . . . . . . . . . . . . . . . . 60
3.9 Setpoint change: one-step-ahead prediction results for select variables obtained
from the LSTM encoder-decoder models without and with the application of RevIN 60
3.10 Setpoint change: comparison of monitoring results obtained from DeepSVDD,
VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed regularized LSTM
encoder-decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.11 Setpoint change: comparison of monitoring results between DiCCA, SFA, and
the proposed regularized LSTM encoder-decoder . . . . . . . . . . . . . . . . . . . 63
3.12 Disturbance case 2 (IDV2): comparison of monitoring results obtained from
DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed
regularized LSTM encoder-decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.13 Disturbance case 2 (IDV2): comparison of monitoring results between DiCCA,
SFA, and the proposed regularized LSTM encoder-decoder . . . . . . . . . . . . . 66
3.14 Disturbance case 8 (IDV8): comparison of monitoring results obtained from
DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed
regularized LSTM encoder-decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.15 Disturbance case 8 (IDV8): comparison of monitoring results between DiCCA,
SFA, and the proposed regularized LSTM encoder-decoder . . . . . . . . . . . . . 69
3.16 Visualization of measurements collected from a geothermal power plant . . . . . 69
3.17 One-step ahead prediction results of data collected from a geothermal power plant 70
3.18 12-step ahead prediction results of data collected from a geothermal power plant . 71
3.19 Monitoring results for data collected from a geothermal power plant . . . . . . . 72
4.1 Proposed framework: reconstruction and prediction of CO2 plume saturation
with CO2-4DNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Illustration of depthwise separable convolution. . . . . . . . . . . . . . . . . . . . 78
4.3 Subset of 3D log-permeability maps from generated realizations . . . . . . . . . . 84
ix
4.4 Samples of plume saturation at the end of injection . . . . . . . . . . . . . . . . . 85
4.5 Details of the CO2-4DNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.6 Reconstruction and prediction results: CO2-4DNet with varying inputs vs. proxy
models requiring geological information . . . . . . . . . . . . . . . . . . . . . . . 93
4.7 Evolution of CO2 plume saturation for one test case . . . . . . . . . . . . . . . . . 93
4.8 Comparison between onset time errors for CO2-4DNet with varying inputs
versus proxy models requiring geological information . . . . . . . . . . . . . . . . 94
4.9 Comparative analysis of errors across different model inputs and local measurement types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.10 Comparative analysis of errors for three scenarios: two fixed-location injection
wells, two random-location injection wells, and one fixed-location injection well . 99
4.11 Comparing the onset time for the scenario with two randomly located injection
wells (color indicates different onset times) . . . . . . . . . . . . . . . . . . . . . . 100
4.12 Comparing the onset time for the scenario with one fixed-location injection well
(color indicates different onset times) . . . . . . . . . . . . . . . . . . . . . . . . . 101
x
Abstract
Data collected from modern industrial systems – including sectors such as manufacturing, renewable energy generation, and carbon storage – are inherently high-dimensional and dynamic.
These datasets typically take the form of time series and spatio-temporal data, characterized by
strong correlations among variables and autocorrelation in time. Consequently, despite the complexity and high dimensionality of these datasets, underlying patterns can be compactly represented by a set of latent variables that capture the dynamics of these systems in a reduceddimensional latent space. This dissertation develops data-driven tools to extract and utilize these
latent space dynamics in industrial systems, focusing on several key aspects: visualizing and interpreting high-dimensional data, monitoring conditions and detecting anomalies, and estimating
and predicting system behavior.
In the first application, a workflow utilizing a linear dynamic latent variable model is proposed to visualize and interpret high-dimensional time-series data from chemical processes. The
extracted dynamic latent features capture the dominant dynamics within the process data, facilitating the visualization of high-dimensional data and enhancing the understanding of complex
process interactions. Furthermore, these dynamic latent variables are employed to identify and
troubleshoot the root causes of abnormal plant-wide oscillations, with interpretations informed
by model structure and domain knowledge.
xi
In the second application, an LSTM encoder-decoder model with regularized latent dynamics
is designed to monitor operating conditions and detect anomalies in industrial systems. Normalization modules are incorporated to address the non-stationary nature of the data. Additionally,
regularizing the latent states during training enables the hidden states of the LSTM encoderdecoder to define a low-dimensional latent space, effectively capturing the primary dynamics of
the original high-dimensional data. Furthermore, monitoring indices are proposed to identify
faults that disrupt normal dynamics and static relationships. As a result, these indices accurately
reflect the actual operating conditions of the system, reducing false alarms by effectively distinguishing between faults and normal operational adjustments.
In the third application, a spatio-temporal neural network model is proposed to learn the latent space dynamics in industrial applications involving high-dimensional 3D data with limited
measurements. The model is specifically designed to estimate the dispersion and predict the migration paths of CO2 plumes in geological CO2 storage applications, which are characterized by
spatially sparse and temporally infrequent data collection. The model continuously integrates
various forms of field measurements into a concise latent space representation, capturing and
learning the spatio-temporal dynamics of CO2 plume migration. As a result, the model is adept
at handling diverse data inputs and managing varying measurement frequencies, which are critical for accurately estimating and predicting the behavior of 3D CO2 plumes in geological CO2
storage.
xii
Chapter 1
Introduction
1.1 Background
Recent advancements in sensor technology and data collection methods have significantly increased the volume and complexity of data generated in modern industrial environments. This
data primarily consists of time series and spatio-temporal datasets, which capture complex interactions and dynamic changes over time. Time series data, in particular, is collected from numerous sensors positioned throughout various parts of industrial machinery and processes. Each
sensor logs data at high frequencies, generating extensive data streams that reflect fluctuations in
operational conditions. On the other hand, spatio-temporal data captures both temporal changes
and spatial relationships, ranging from interconnected measurements at various locations to complex image data or video streams.
Measurements from industrial processes are characterized by two key characteristics: their
dynamic nature and high dimensionality. The dynamic aspect arises from the inherent characteristics of industrial systems, which operate continuously and evolve over time. This dynamic
1
nature is captured through sequential data collection, where measurements are taken consecutively, reflecting the temporal fluctuations and variations within the system. Additionally, industrial operations typically employ extensive networks of sensors to monitor various parameters.
Each sensor generates a continuous stream of data points, producing high-dimensional time series data. Despite the complexity introduced by numerous sensors, the fundamental principles
and laws governing the system’s operation remain relatively simple and low-dimensional, regardless of the increasing sensor count. In the context of spatio-temporal data, each data point
collected at different time steps can often be represented as a two-dimensional image or a threedimensional volume. These representations are inherently high-dimensional but exhibit significant local correlations. These correlations indicate that, despite the data’s high dimensionality,
there are inherent structured patterns and relationships that can be leveraged to streamline the
analysis and modeling process.
Due to the inherent correlations among variables and autocorrelation across time intervals,
the dynamics of complex systems can be captured within a low-dimensional latent space. Despite
the apparent complexity and vastness of the data, underlying patterns can be described by a concise set of latent variables. These latent variables are essential as they encapsulate the significant
dynamics of the system, filtering out noise and redundant information, thereby simplifying the
multidimensional data into a more manageable form.
With rapid advancements in machine learning and data analysis technologies, data-driven
methods have become essential for establishing latent representations of data. These representations are built by mapping high-dimensional data into a lower-dimensional latent space, thereby
capturing the essential dynamics. Utilizing the dynamics within this latent space allows datadriven models to be applied across various applications, benefiting from the distilled information
2
these compact representations provide. This dissertation focuses on using data-driven methods
to extract latent space dynamics for various industrial system applications, details of which are
presented in the subsequent sections.
1.1.1 Interpretation of High-dimensional Data and Process Troubleshooting
In the Industrial Internet of Things era, the large-scale time series data collected from industrial systems and manufacturing processes are inherently complex and high-dimensional due to
the numerous sensors installed. As such, using data-driven models to interpret and visualize
this high-dimensional time series data becomes crucial for decision-making processes. Decisionmakers must trust and feel confident in the suggestions made by machine learning models, given
their responsibility for any erroneous outcomes. It is thus critical to develop algorithms that can
distill low-dimensional latent variables from complex data to facilitate easier visualization and
interpretation.
Modern industrial processes can be affected by various disturbances, including variations
in ambient conditions, equipment degradation, and malfunctions. To manage these disturbances
and ensure the efficiency and quality of products, plant-wide feedback control systems are widely
employed [6, 16]. However, controllers that are inadequately designed or fail to promptly address
disturbances may lead to abnormal oscillations [12, 36]. Moreover, oscillations within parts of
the process can propagate through mass and energy transport and control system interactions,
affecting other areas of the plant and leading to plant-wide oscillations. Such propagation results
in poor operational performance, reduced throughput, and increased equipment wear [84, 83,
102].
3
Statistical process monitoring (SPM) has been extensively studied for decades to detect anomalies and abnormal oscillations [46, 13, 59, 111]. Process monitoring techniques require fault-free
data collected during normal operations to build a model representing normal dynamics and establishing normal control regions. These models and control regions are then applied to new
data to check if they are consistent with the fault-free model. However, if anomalies have already
occurred, the effectiveness of SPM techniques is limited. In such cases, troubleshooting can be
directly applied to segments of data containing potential issues, as it does not require fault-free
data. Troubleshooting involves analyzing faulty data to extract anomalous features and diagnose
the root causes of these anomalies. The ability to perform troubleshooting is essential and highly
valued by industrial practitioners [51].
1.1.2 Condition Monitoring and Fault Detection
Condition monitoring and fault detection are critical components in various industries, such as
manufacturing and energy systems. These processes involve continuously monitoring systems
and equipment during operation to detect changes that may indicate malfunctions or failures.
By leveraging sensors and data analytics, condition monitoring systems collect real-time data,
which is analyzed using data-driven algorithms to identify patterns or anomalies that deviate
from normal operating conditions. Early fault detection facilitates timely maintenance actions,
preventing costly downtime and extending the lifespan of machinery. Moreover, this proactive
approach enhances safety by reducing the risk of catastrophic failures and ensuring operational
efficiency [56, 96].
4
Data-driven condition monitoring methods are typically divided into supervised and unsupervised approaches [1, 50, 82]. In supervised settings, models are trained using normal and faulty
scenario data to detect and diagnose faults through classification. One limitation of using classification for fault detection is the requirement for large amounts of labeled faulty data, which can
be challenging to gather in real-world industrial settings. In contrast, unsupervised approaches
detect anomalies solely using normal data. Unsupervised fault detection methods first build models using fault-free data. Faults are then detected by identifying deviations in model outputs that
reflect discrepancies between normal and faulty data distributions.
1.1.3 Monitoring and Forecasting Techniques in CO2 Geological Storage
Carbon capture, utilization, and storage (CCUS) technologies are increasingly viewed as crucial
for reducing greenhouse gas emissions by capturing CO2 directly from emission sources and
sequestering it underground. The success of large-scale commercial CCUS projects depends critically on accurate monitoring and forecasting of CO2 plume movements [37, 2, 30]. Traditionally,
numerical simulations have served as the main tools for monitoring and predicting CO2 plume
dynamics. However, these simulations require substantial computational resources, largely due
to nonlinearities introduced by heterogeneities in rock properties and the interaction of multiple
physical processes [58, 75, 108].
Recently, machine learning-based methods have gained traction as an effective strategy for reducing the computational demands of numerical simulations. These methods utilize the learning
capabilities of neural networks and their rapid inference times by developing surrogate models
trained with data from numerical simulations. Several techniques have been effectively utilized to
5
forecast the movement of CO2 plumes [92, 81, 74, 95, 90, 91, 38]. These models rely on the same set
of inputs as numerical simulations, including parameters such as permeability and porosity maps.
This dependence on rock properties as input data poses challenges in real-world applications, primarily due to the uncertainties linked with rock properties. As a result, these uncertainties may
cause deviations between the predicted and actual distributions of CO2 plumes.
In a different line of research, recent developments in geophysical measurement techniques,
including time-lapse seismic monitoring and cross-well seismic tomography [45, 3], have prompted
researchers to investigate new approaches for tracking CO2 plume migration using data collected
directly from the field. These models do not require rock properties as inputs and effectively
map field measurements to the spatial distribution of CO2, thus precisely monitoring the plume’s
spread and location[86, 97, 73, 27, 40, 53, 26].
1.2 Outline
The rest of the dissertation is organized as follows.
Chapter 2 proposes a framework using linear dynamic latent models and dynamic embedded
latent feature analysis (DELFA) for process data interpretation and plant-wide oscillation diagnosis. The proposed framework is tested using a real dataset collected from a chemical plant.
In Chapter 3, a condition monitoring method based on recurrent neural networks (RNN) with
regularized hidden dynamics is proposed. With regularization, the hidden states of the RNN
model represent a low-dimensional dynamic latent space. The proposed fault detection approach
can detect faults that disrupt normal dynamic variations and static relationships within the process data, distinguishing faults from disturbances that can be compensated for by feedback control
6
systems. The effectiveness of the proposed method is demonstrated using a synthetic dataset, the
Tennessee Eastman Process, and a real dataset collected from a geothermal power plant.
Chapter 4 introduces a spatio-temporal neural network designed to reconstruct and predict
three-dimensional CO2 plume migration with limited measured data in geological CO2 storage
applications. The model integrates a diverse set of measurements as inputs and captures the
spatio-temporal dynamics of CO2 plume evolution in the model’s latent space. The proposed
model is capable of continuously incorporating field measurements to update the estimation and
prediction of the distribution of the CO2 plume distribution in three-dimensional space. Simulation datasets generated from a three-dimensional CO2 storage site are used to assess the effectiveness of the proposed model.
Chapter 5 provides the general conclusions of the dissertation.
7
Chapter 2
Exploring Latent Space Dynamics for Interpretation and
Troubleshooting in Industrial Systems
2.1 Introduction
As pointed out in [61, 62], one of the essential purposes of building data-driven models of industrial processes is interpretation. In order to achieve interpretability, simple models are usually
preferred due to their parsimonious structures and mathematical representations. Linear dynamic
latent variable models can play important roles in extracting latent variables to provide insights
into the processes for interpretation. One aspect of interpreting process data is troubleshooting.
Troubleshooting is used to find the root cause of undesirable behaviors and abnormal dynamics
of the processes. Unlike fault detection and fault diagnosis, which require a set of normal data to
build a model for normal operation, troubleshooting analyzes data that possibly contain anomalies. One application of troubleshooting is identifying the root cause of abnormal oscillations in
feedback-controlled manufacturing plants. Many processes often have poor control performance
8
and exhibit dynamic oscillations [25, 33, 78]. These oscillations are often caused by poor instrument and valve performance. Furthermore, undesirable oscillation patterns are often observed
plant-wide as they propagate through plant recycling loops and control system interactions. Under these circumstances, the data collected from the process are already contaminated by faults.
Therefore, fault detection and diagnosis techniques cannot be applied to the faulty data since
there is no normal data for modeling normal operating conditions. As a result, it is desirable to
develop troubleshooting techniques based on dynamic latent models to extract abnormal oscillation features from faulty data and interpret these features based on model structure and domain
knowledge to identify the root causes of the oscillations.
This chapter proposes a dynamic embedded latent feature analysis (DELFA) procedure for visualizing high-dimensional data and performing oscillation diagnosis using dynamic latent variable models. First, DELFA employs dynamic feature modeling to extract latent variables from
high-dimensional process data, identifying potential abnormal features. Subsequently, DELFA
analyzes anomalous latent features to identify the measured variables most significantly influenced by these features for root cause diagnosis. The remainder of this chapter is organized as
follows: Section 2.2 presents the dynamic feature modeling techniques and derives methods for
identifying the most interpreted variables of features contained in multiple latent variables. Section 2.3 presents the workflow of the DELFA procedure and demonstrates its effectiveness on two
datasets from an industrial plant. Section 2.5 provides a summary.
9
2.2 Dyanmic Latent Variable Modeling
2.2.1 Dynamic Feature Analysis via DiCCA
Time-series data collected from industrial processes are often high dimensional, cross-correlated,
and auto-correlated. Even though data dimensionality is high, the driving force behind data can
usually be represented in a low-dimensional space. For example, data collected from sensors in
chemical processing plants usually share some common oscillations caused by some common
latent factors. Therefore, it is desirable to extract a low-dimensional latent variable model from
the high-dimensional data to represent the dynamics.
In order to extract cross-correlations and auto-correlations from time series data, dynamicinner canonical correlation analysis (DiCCA) was proposed in [19, 17]. In this method, an autoregressive (AR) model, known as the inner part of the modeling, is introduced to represent the
time dependence in the latent variables. For the outer part, there is a linear mapping between the
extracted latent variables and the original variables. The objective function for extracting latent
variables is to maximize the predictability of the extracted variables evaluated by the AR model to
ensure the latent variables capture the dynamics. Let xk denote a vector of m variables measured
at time k. A latent variable tk is defined as follows,
tk = x
⊤
k w (2.1)
To represent the dynamics of the latent variable, tk is represented by the following AR model,
tk =
Xs
i=1
βitk−i + ek (2.2)
10
where s is the order of the AR model, and the residual ek is i.i.d random noise. Given time series
data {xk}
N+s
1
, data matrices can be formed X = [x1 x2 · · · xs+N ]
⊤ and
Xi = [xi+1 xi+2 · · · xi+N ]
⊤ ∈ ℜN×M for i = s, · · · , 1, 0. (2.3)
The corresponding latent score vectors are,
ti = [ti+1 ti+2 · · · ti+N ]
⊤ = Xiw for i = s, · · · , 1, 0. (2.4)
The predicted score vector for ts is expressed as follows based on (2.2) and (2.4),
ˆts =
Xs
i=1
βits−i =
Xs
i=1
βiXs−iw (2.5)
which has the following equivalent expression,
ˆts = Tβ = Xˆ βw (2.6)
where T = [ts−1 ts−2 · · · t0] and Xˆ β =
Ps
i=1 βiXs−i
.
In the DiCCA algorithm [19], the canonical correlation between ts andˆts is maximized, which
is equivalent to the following objective [17],
min J = ∥ts − ˆts∥
2 = ∥ts − Tβ∥
2 = ∥(Xs − Xˆ β)w∥
2
(2.7)
s.t. ∥ts∥
2 = w
⊤X⊤
s Xsw = 1, (2.8)
11
where ts = Xsw is applied. By using a Lagrange multiplier as
L = J + λ(1 − w
⊤X⊤
s Xsw) (2.9)
and setting the derivatives with respect to β and w to zero, we have
∂L
∂β
= 2T
⊤(ts − Tβ) = 0, (2.10)
∂L
∂w
= 2(Xs − Xˆ β)
⊤(Xs − Xˆ β)w − 2λX⊤
s Xsw = 0. (2.11)
The solution for β and w is obtained as follows
β = (T
⊤T)
−1T
⊤ts (2.12)
λX⊤
s Xsw = (Xs − Xˆ β)
⊤(Xs − Xˆ β)w (2.13)
where w is the generalized eigenvector corresponding to the smallest eigenvalue in order to
minimize J. Furthermore, the generalized eigenvector problem can be converted to a regular
eigenvector problem. The Xs can be decomposed using the economic form of singular value
decomposition (SVD) as,
Xs = UsDV⊤ (2.14)
As a result, we have
Us = XsVD−1
(2.15)
Uˆ
β = Xˆ
βVD−1
(2.16)
12
Define
U˜ β = Us − Uˆ
β (2.17)
The 2.13 can be rewritten as
U˜ ⊤
β U˜ βwu = λwu (2.18)
where wu = DV⊤w which is the eigenvector of the smallest eigenvalue of U˜ ⊤
β U˜ β. The latent
vector ts can be calculated as
ts = Xsw = UsDV⊤VD−1wu = Uswu. (2.19)
To get the value of β and w, the 2.12 and 2.19 are solved iteratively. After convergence, the latent
score vector is calculated from t = Xw and X is deflated as
X := X − tp⊤ (2.20)
where the loading vector
p = X⊤t/t
⊤t. (2.21)
Dynamic latent variables are extracted one by one. To derive the next latent variable, the deflated
matrix X is used.
Since the relation between t and p is bilinear and that between t and w is linear, one of
them must have a fixed norm. It has been shown that the DiCCA scores are orthogonal [20].
Therefore, the latent scores t are scaled to the unit norm so they become orthonormal, which can
13
be interpreted as a set of orthonormal bases. The following are details of the DiCCA algorithm
with SVD.
1. Scale X to zero mean and optionally to unit variance.
2. Perform SVD,
Xs = UsDV⊤
where D is diagonal with non-zero singular values only. Calculate Us−i = Xs−iVD−1
for
i = 1, · · · , s.
3. Initialize β with the first column of the identity matrix.
4. Iterate the following calculations until convergence.
Calculate U˜ β = Us −
Xs
i=1
βiUs−i
.
Calculate wu as the eigenvector of the smallest eigenvalue of U˜ ⊤
β U˜ β.
Calculate t = Uwu. Form ts−i for i = 0, 1, · · · , s and T = [ts−1 · · · t0].
Calculate β = (T
⊤T)
−1T
⊤ts.
5. Deflation:
p =X⊤t/t
⊤t
X :=X − tp⊤
14
6. Calculate loadings and scale t to unit norm,
p := p ∥t∥
w := VD−1wu/ ∥t∥
t := t/ ∥t∥
and go to Step 2 to compute the next factor.
2.2.2 Dynamic Latent Relations
The matrix representation of the the scores t, w, and loading p for l latent variables can be
formulated as
T = [t
(1) t
(2)
· · · t
(l)
]
W = [w1 w2 · · · wl
]
P = [p1 p2 · · · pl
]
the DiCCA scores are related to the data matrix as follows in [20],
X = TP⊤ + X˜ =
X
l
j=1
t
(j)p
⊤
j + X˜ (2.22)
T = XR (2.23)
where X˜ is the residuals and
R = W(P
⊤W)
−1
. (2.24)
15
For a given vector xk, the latent vector with all latent variables at time k can be calculated from
2.23 as,
xk = Ptk + x˜k (2.25)
tk = R⊤xk (2.26)
Denoting Bi = diag(β
(1)
i
, ..., β(l)
i
), the xk and xˆk can be written as
xk =
Xs
i=1
PBiR⊤xk−i + ek (2.27)
xˆk =
Xs
i=1
PBitk−i
(2.28)
2.2.3 DiCCAX: DiCCA with Exogenous Variables
The assumption made in DiCCA about the dynamics is that it is auto-regressive and driven by
noise only. If there are exogenous variables that are related to measurements, the dynamic model
can be extended as follows,
xk =
Xs
i=1
PBiR⊤xk−i +
Xsu
j=0
Cjuk−j + ek (2.29)
where uk is a vector of exogenous variables at time k and su is the number of lags. With the
added exogenous term, the DiCCA is extended as DiCCA with exogenous variables (DiCCAX) as
follows.
16
2.29 can be written in matrix form,
X⊤
s =
Xs
i=1
PBiR⊤X⊤
s−i +
Xsu
j=0
CjU⊤
s−j + E
⊤
s
(2.30)
where X⊤
s
, X⊤
s−i
, U⊤
s−j
, and E⊤
s
represent the extended matrices with N columns of data for xs,
xs−i
, uk−j
, and ek. Let U = [UsUs−1...Us−su
]
⊤ and post-multiplying the 2.30 by the following
projection matrix
Π
⊥
U = I − U(U
⊤U)
−1U
⊤ (2.31)
we have the following relationship that eliminates the exogenous variables,
X⊤
s Π
⊥
U =
Xs
i=1
PBiR⊤X⊤
s−iΠ
⊥
U + E
⊤
s Π
⊥
U
(2.32)
As a result, the matrix Π⊥
UXs in the DiCCAX algorithm, where the effect of the exogenous variables is removed from the rest of the variables, can be treated as the data matrix Xs in the DiCCA
algorithm.
2.2.4 Latent Feature Contribution Analysis
Once the data is decomposed into latent scores, the loading vectors link the latent variables with
the original measured variables. For the j
th latent variable, the i
th element in the loading vector
pj represents the extent to which the i
th variable is interpreted by the latent variable. If pij is
large, it indicates a strong relationship between the i
th variable and the j
th latent variable. As a
result, the loadings can be used to identify the variables that are related to the feature of interest.
17
If a feature of interest is present in several latent variables, composite loadings can be defined to
identify the variables most interpreted by the feature. The data matrix can be written as follows
X =
X
j∈S
t
(j)p
⊤
j + X˜ (2.33)
where S is the subset of latent variables containing the feature of interest. Using the orthogonal
properties of the DiCCA algorithm among the latent scores and residuals, we have
tr{X⊤X} = tr (X
j∈S
pj (t
(j)
)
⊤t
(j)p
⊤
j
)
+ tr{X˜ ⊤X˜ }
=
X
j∈S
p
⊤
j pj + tr{X˜ ⊤X˜ }
=
X
M
i=1
X
j∈S
p
2
ij + tr{X˜ ⊤X˜ }. (2.34)
where (t
(j)
)
⊤t
(j) = 1 since the latent scores are scaled to unit norm. The first term on the righthand side of (2.34) is the total variance in the data interpreted by the feature. Therefore,
p¯
2
i =
P
j∈S
p
2
P
ij
j∈S p
⊤
j pj
(2.35)
is defined as the composite loading of the i
th variable for the feature of interest. Since the latent
scores are scaled to unit norm, the composite loading also represents the proportion of the variance of the i
th variable explained by the latent feature. Larger p¯
2
i
implies that variable i can be
better represented by the latent feature of interest.
In DiCCA models, the weight matrix R is different from the loading matrix. For the j
th latent
variable, the i
th element in the weight vector rj represents the extent to which the i
th variable
18
contributes to the j
th latent variable. Similar to the composite loadings, the composite weight of
the i
th variable is defined as
r¯
2
i =
P
j∈S
r
2
P
ij
j∈S
r
⊤
j
rj
(2.36)
where rij is the ijth element of R. It represents the composition of the feature of interest from
each variable.
2.3 DELFA Procedure
The DELFA troubleshooting procedure initially eliminates variables exhibiting minimal or no
variations in multi-dimensional time-series data containing potential anomalies. The second step
involves performing an initial latent variable analysis to remove uninteresting latent features,
such as ambient temperature disturbances. The third step constructs a latent variable model
focusing on the remaining variables of interest, extracting oscillatory features caused by process
anomalies. The final step involves identifying the underlying cause of the oscillation through
latent feature contribution analysis. The DELFA procedure is illustrated in Figure 2.1, with further
details discussed in subsequent sections.
1. Step 1: Initial Variable Selection for Preliminary Analysis.
A plant dataset typically includes process variables (.PV), control outputs (.OP), and controller setpoint variables (.SP). In most control loops, the setpoint variables do not frequently change over time. For cascade and ratio-control loops, where setpoints vary, these
variables become redundant as they mirror primary loop controller outputs (measured as
19
Figure 2.1: DELFA: a plant-wide troubleshooting procedure using operational data
20
.OP). Thus, including .SP variables in the analysis is unnecessary. Conversely, process variables (.PV), which are direct measurements from the plant, and controller outputs (.OP),
which reflect input changes and process disturbances, should be retained for further analysis.
2. Step 2: Removing Uninterested Latent Features.
Following the exclusion of setpoint variables, a latent variable model is constructed with the
remaining variables. The scores of these latent variables are then plotted as time series, with
their loadings and weights analyzed to identify uninteresting latent variables. There are
mainly two types of latent variables that can be identified as uninterested. In some cases, the
weights and loadings of a latent variable are concentrated in one or a few variables. These
variables can be referred to as self-interpretative since they are unrelated to other variables.
If the self-interpretative latent feature does not contain the interested oscillatory features,
the associated variable(s) should be excluded. On the other hand, if the feature is of interest,
the associated variable(s) can be analyzed separately to find the causes of oscillation. In
another case, a latent feature can have weights (elements of R) that are concentrated in
one or a few variables, but the loadings (elements of matrix P) are distributed among other
variables. For example, the latent feature could reflect an ambient disturbance impacting
some measured variables. In this case, the associated variables of the uninterested feature
should be excluded either by directly deleting them or treating them as exogenous variables
using the DiCCAX algorithm.
3. Step 3: DiCCA or DiCCAX Analysis for Interested Features.
After excluding the variables associated with the uninterested feature, another round of
21
latent variable modeling using DiCCA or DiCCAX is performed on the remaining variables
to focus on extracting the interested latent variables. The interested latent variables should
contain sustained or intermittent oscillation features with a dominant frequency, resulting
from plant anomalies.
4. Step 4: Identifying Related Causes through Loadings.
The final analysis of the interested features is performed one after another by examining the
associated composite loadings. If the loadings of some variables are high, these variables
are best interpreted by the interest features. As a result, they can be identified as the causes
of abnormal oscillations.
2.4 DELFA for Troubleshooting Plant-wide Oscillations
The DELFA troubleshooting procedure is applied to a dataset collected from an industrial plant.
The diagram of the plant is shown in Figure 2.2. The plant operator reported that the plant
occasionally exhibits oscillation behavior, affecting the efficiency and throughput of the operation. The dataset comprises 48 variables collected over 96 hours, with data sampled every 30
seconds. In this section, DELFA is applied to the dataset to demonstrate the effectiveness of the
troubleshooting procedure.
2.4.1 Preliminary Analysis
Following step 1 of the DELFA procedure, 43 variables are identified after excluding four setpoint
variables (.SP) of cascaded control loops and one constant variable (TC2.OP), which is in an open
loop. The 43 variables are plotted in Figure 2.3. DiCCA is performed on the 43 variables, and
22
Figure 2.2: Schematic diagram of the studied plant: First letter: F(flowrate); P(pressure);
T(temperature). Second letter: C(control); I(indicator). Last number: serial number
Figure 2.3: Measurements collected from the industrial plant
23
Figure 2.4: Preliminary analysis: comparing latent variables extracted by DiCCA and PCA
the left subplot in Figure 2.4 presents the resulting leading latent scores from one day of data
containing 2880 samples. It can be observed that DiCCA extracts distinct dynamic features compared to the PCA model, where the oscillation features are mixed within multiple latent variables,
plotted in the right subplot. In particular, it can be observed that DLV1 and DLV2 extracted by
DiCCA exhibit a slow-changing feature over 24 hours. Figure 2.5 displays the composite loadings and weights of DLV1 and DLV2, indicating that the weights are concentrated on variables
TI8.PV and TC3.OP. Additionally, the composite loadings of DLV1 and DLV2 predominantly explain TI8.PV and TC3.OP, along with several other variables. As a result, DLV1 and DLV2 are
self-interpretative latent variables. Upon examining the process diagram, TI8.PV represents the
temperature measured at the condenser outlet, reflecting external disturbances from the ambient environment, while TC3.OP represents the stream input flow. Since these two variables are
related to the ambient conditions, they can be removed before further analysis.
24
Figure 2.5: Preliminary analysis: composite loadings and composite weights
25
Figure 2.6: Leading dynamic latent variables: deleting TI8.PV and TC3.OP (left subplot) and treating TI8.PV and TC3.OP as exogenous variables (right subplot)
2.4.2 Removing the Effect of the Variable TI8.PV and TC3.OP
In this section, two approaches are used to remove the effects of TI8.PV and TC3.OP. The first approach involves deleting these two variables and using the remaining variables to build a DiCCA
model for further analysis. The second approach uses the DiCCAX algorithm with five time lags,
treating TI8.PV and TC3.OP as exogenous variables. Figure 2.6 shows the resulting latent variables. In the left subplot, it can be observed that simply deleting TI8.PV and TC3.OP still leaves
the 24-hour cycle in DLV3. On the other hand, the right subplot in Figure 2.6 shows that treating
TI8.PV and TC3.OP as exogenous variables completely removes the effect of the slow-changing
feature. As a result, the approach that utilizes the DiCCAX algorithm is preferred for removing the undesired features. Both panels clearly show the two highlighted oscillation features,
indicated by blue and yellow colors, which should be further analyzed for root causes. These
important features are summarized below:
26
1. Both approaches show oscillation features with a period of 140 samples in DLV1 and DLV2.
In the following analysis, these oscillations are referred to as low-frequency features.
2. In DLV5 for the second approach, there is an oscillation feature with high frequency. The
same feature also appears in DLV 6 for the first approach. The period of this feature is
around 55 samples.
These features are further analyzed below to identify their root causes.
2.4.3 Analysis of the Low-frequency Oscillation Feature
DLV1 and DLV2 from DiCCAX, shown in the right panel of Figure 2.6, are the first to be analyzed
for root causes. To confirm that these DLVs contain the same feature, a fast Fourier transform
(FFT) is performed. The results, presented in Figure 2.7a, indicate that the peak frequency for
both features is 0.0002431 Hz, corresponding to approximately 68 minutes for the slow oscillation features. Given that the same feature appears in multiple DLVs, an investigation into the
composite loadings and weights of the DLVs is conducted to identify potential causes of the oscillation. Figure 2.8 depicts the composite loadings and weights for the low-frequency feature.
It is observed that LC2.PV and LC2.OP contribute significantly to the low-frequency oscillation
feature, with a minor contribution from FC4.OP. An examination of the process diagram reveals
that the variables LC2.PV and LC2.OP are part of the LC2 level control loop of the decanter, and
FC4.OP relates to the flow at the decanter’s exit, affected by oscillations in the LC2 loop. Consequently, the LC2 loop is identified as the cause of the oscillation.
27
(a) Fast Fourier transform of the high-frequency feature in DLV1 and DLV2.
(b) Fast Fourier transform of the high-frequency feature in DLV5.
Figure 2.7: Fast Fourier transform analysis of the dynamic latent features
28
Figure 2.8: Composite loadings and weights for low-frequency oscillation in dynamic latent variables 1 and 2
29
2.4.4 Analysis of the High-frequency Oscillation Feature
This analysis focuses on DLV5 from DiCCAX, as shown in the right panel of Figure 2.8. After
applying a fast Fourier Transform (FFT) to this DLV, the peak frequency, as depicted in Figure 2.7b,
is 0.0006597 Hz, indicating an oscillation period of approximately 25 minutes. To explore the root
cause of this oscillation, the loadings and weights of DLV5 are illustrated in Figure 2.9. The results
reveal that the high-frequency DLV feature predominantly weighs on FC3.OP. Additionally, the
DLV explains TC1.PV, FC3.PV, and TI4.PV through the loadings. Examining the process diagram
shows that these variables are situated around the FC3 flow control loop at the top of Column
1’s condenser. Therefore, it is concluded that FC3.OP causes the high-frequency oscillation, most
likely due to a valve stiction issue.
2.5 Summary
This chapter demonstrates the efficacy of the proposed DELFA procedure for troubleshooting,
especially in analyzing high-dimensional, plant-wide operational data. Dynamic latent variable modeling through DiCCA successfully extracts dynamic features, capturing both crosscorrelation and autocorrelation within the data. Additionally, an extended version of DiCCA
removes uninterested variations due to external disturbances to focus on interested oscillation
features and troubleshoot anomalies. By integrating domain process knowledge, the method utilizes composite loadings and weights to pinpoint abnormal variables associated with features
present across multiple latent variables. The effectiveness of the DELFA troubleshooting procedure is validated through an industrial case study.
30
Figure 2.9: Composite loadings and weights for high-frequency oscillation in dynamic latent
variable 5
31
Chapter 3
Leveraging Latent Space Dynamics for Multivariate Time
Series Prediction and Condition Monitoring in Industrial
Systems
3.1 Introduction
Utilizing process data for condition monitoring and fault detection is crucial in industrial enterprises, as it reduces unplanned shutdowns, prevents catastrophic failures, and improves efficiency
[60, 89, 67]. Time series data collected from industrial processes are inherently dynamic, correlated, and high-dimensional due to energy and mass balances, equipment interconnections, and
redundant sensors [62]. Traditionally, fault detection and monitoring have utilized multivariate statistical techniques such as Principal Component Analysis (PCA) and Partial Least Squares
(PLS), known for their effectiveness in modeling high-dimensional data [11, 29]. However, while
these approaches capture cross-correlations among variables, they overlook the inherent dynamics within the time series data. Consequently, they are adept at monitoring static variations in
the data but fail to model its dynamic characteristics.
32
To address this challenge and effectively capture the dynamics and cross-correlations present
in time series data, several methods have been developed. These include Slow Feature Analysis (SFA), Dynamic PCA (DPCA), Dynamic Inner Principal Component Analysis (DiPCA), and
Dynamic Inner Canonical Correlation Analysis (DiCCA) [76, 70, 18, 19]. Additionally, fault detection and condition monitoring methods based on neural networks (NN) have received increasing
attention due to their effectiveness in learning complex nonlinear features from large volumes
of process data. Various neural network architectures have been employed for fault detection
and condition monitoring, including Autoencoders (AE), Convolutional Neural Networks (CNN),
Variational Autoencoders (VAE), and Recurrent Neural Networks (RNN) [77, 100, 93, 10, 105].
Among these, due to their capabilities to model time series data dynamics [64, 55, 88], various
RNN models have been utilized in fault detection applications [106, 105, 42, 52, 54, 9, 47, 104, 99].
Among RNN-based fault detection methods, many approaches detect faults through classification that require both normal and faulty data, which is challenging to acquire in real-world
applications. As a result, unsupervised approaches based on RNN models are more useful in
real-world applications. In the unsupervised fault detection setting, models are first trained using fault-free data to generate predictions. Subsequently, faults are detected based on deviations
in the prediction error, which arise from discrepancies between normal and faulty data distributions. However, large prediction errors from changes in data distributions may not always
indicate faults, as setpoint adjustments and external disturbances can also alter data distribution,
leading to abnormal prediction errors. Therefore, such events should not be classified as faults,
as they result from normal operational changes or can be compensated for by feedback control
loops. For instance, numerous studies have demonstrated high detection rates across all disturbances in fault detection within the well-known Tennessee Eastman process (TEP) benchmark
33
dataset [22]. However, as pointed out in [63, 109, 110], not all disturbances should be identified as
faults because many of them can be rejected by the feedback control systems within the process
[49, 68]. Consequently, developing fault detection methods that accurately reflect underlying
operating conditions and distinguish faults from normal operational changes is crucial.
In addition to their inability to differentiate faults from disturbances and changes in operating
conditions, the aforementioned RNN-based monitoring techniques concentrate solely on analyzing multivariate measurements, neglecting the latent features in the models’ hidden states. Since
process data are often high-dimensional and correlated, properly regularizing the hidden states
during training can enable them to represent a low-dimensional latent space, thereby effectively
capturing the dynamics of the original data. As a result, monitoring can focus on these latent
states rather than the high-dimensional data.
This work introduces an unsupervised fault detection method utilizing an LSTM encoderdecoder structure with regularized hidden dynamics to overcome the previously mentioned limitations. By regularizing the model’s hidden states during training, a low-dimensional latent space
representation of the original multivariate time series data is created. Consequently, fault detection can leverage prediction errors of the hidden states instead of the prediction results of all
variables. Furthermore, alongside regularization, reversible instance normalization is employed
with the LSTM encoder-decoder model to ensure accurate predictions when there are changes in
data distribution due to shifts in operating conditions and disturbance rejections. Additionally,
two monitoring indices are introduced to assess dynamic and static data variations, effectively reflecting operating condition statuses and distinguishing faults from normal operational changes.
The remainder of this chapter is organized as follows. Section 3.2 presents the preliminaries
of principal component analysis-based fault detection and monitoring techniques, the structure
34
of long short-term memory neural networks, and reversible instance normalization. Section 3.3
presents the general framework for using RNN for fault detection. Section 3.4 presents a case
study of the effectiveness of the proposed method using a synthetic dataset, the Tennessee Eastman process dataset, and a dataset collected from a geothermal power plant. Section 3.5 provides
a summary.
3.2 Preliminaries
3.2.1 Principal Component Analysis for Process Monitoring
Principal Component Analysis (PCA) was first proposed by Pearson in [57]. It was later formulated in its multivariate form by Hotelling [35] and has since been widely used as a dimensionality reduction tool across various fields, including computer science, electrical engineering,
and biomedical engineering [23, 44, 65, 48]. By identifying a direction or subspace of the largest
variance within the original measurement space, PCA generates a low-dimensional representation of a dataset. Let X ∈ R
N×M denote a data matrix, with each row representing a sample
x ∈ R
M. The data matrix X can be decomposed into a score matrix T and a loading matrix using
the Singular Value Decomposition (SVD) algorithm. Assuming l principal components (PCs) are
retained in the model for dimension reduction, the loading matrix is partitioned accordingly,
P¯ = [P P˜ ] (3.1)
35
such that P contains the first l loading vectors representing the majority of the data variance,
and P˜ comprises the remaining M − l loading vectors. Using P and P˜ , the data matrix X can be
expressed as,
X = TP⊤ + T˜P˜ ⊤ (3.2)
The subspace spanned by P is referred to as the Principal Component Subspace (PCS), and the
one spanned by P˜ is known as the Residual Subspace (RS). As a result, the measurement space
is divided into the PCS and RS, with the PCS encompassing normal or significant variations and
the RS containing minor variations or noise.
The general approach to unsupervised fault detection begins with constructing models using
data collected during normal operations. Subsequently, control limits are established to define the
regions of normal operation. Finally, these models and control limits are applied to new data for
online fault detection. PCA is a popular option for modeling the normal static variation from data
collected during normal operating conditions. Utilizing a PCA model enables the formulation
of various fault detection indices, such as Hotelling’s T
2
index, the SPE (or Q) index, and the
combined index φ, to monitor different aspects of the process data.
1. Hotelling’s T
2
index
Hotelling’s T
2
index measures variations in the PCS,
T
2 = x
⊤PΛ−1P
⊤x (3.3)
36
where Λ is the covariance matrix of the latent scores matrix T. It can be proven that T
2
statistic follows a F distribution,
N(N − l)
l(N2 − 1)T
2 ∼ Fl,N−l
(3.4)
where Fl,N−l
is an F distribution with l and N −l degrees of freedom [85]. As a result, for a
given confidence level α, the control limit can be calculated based on the Fl,N−l distribution.
The index is considered normal if
T
2 ≤ T
2
α ≡
l(N2 − 1)
N(N − l)
Fl,N−l;α (3.5)
If the number of data points N is large, the T
2
index can be well approximated with a χ
2
distribution with l degrees of freedom [59] and
T
2
α = χ
2
l;α
(3.6)
The T
2
index measures the distance to the origin in the principal component subspace,
which contains normal process variations with large variance. The variation of the projection of a sample vector x on the PCS is considered normal if its T
2
index is less than the
control limit T
2
α
.
37
2. SPE (Squared Prediction Error) index
The SPE index measures the projection of a sample vector x ∈ R
M onto the residual space.
It is defined as the squared norm of the residual vector x˜.
SPE(x) = ||x˜||2 = x
⊤P˜P˜ ⊤x (3.7)
The control limit of the SPE index can be derived using the result in [7],
δ
2
α = gχ2
h;α
(3.8)
where
g =
PM
i=l+1 λ
2
i
PM
i=l+1 λi
, h =
(
PM
i=l+1 λi)
2
PM
i=l+1 λ
2
i
(3.9)
α is the confidence level. l is the number of PCs in the principal component subspace, and
λi
is the i
th eigenvalue of the sample covariance matrix 1
N−1X⊤X.
Since the SPE index focuses on the residual subspace, it measures the variability that breaks
the static process relationships. If the SPE index is above the control limit δα, it indicates a
fault that breaks the normal correlation structure.
3. Combined index
If both the T
2
index and SPE index are equally important, a global index can be used to
combine the two indices, such as the combined index φ [103, 21]. This results in monitoring
one index instead of two. The combined index is defined as follows,
φ = T
2
(x) + g
−1
SPE(x) ∼ χ
2
l+h
(3.10)
38
where g and h come from the calculation of the SPE control limit. With α as the confidence
level, the control limit of the combined index is χ
2
l+h;α
. As a result, a fault is detected if the
value of φ is greater than the control limit.
3.2.2 Long-Short Term Memory (LSTM)
The Long-Short Term Memory, a variant of the recurrent neural network model, is designed to
alleviate the gradient vanishing and explosion problems during the training of traditional RNNs
[34]. As a result, the LSTM can better model long-term dependencies within the data sequence.
An LSTM cell comprises a cell state c for storing long-term memory and a hidden state h for
short-term memory. Additionally, it features an input gate i, a forget gate f, and an output gate o
to control how information flows through the cell by modifying the cell state. Given the number
of features in the hidden state to be l, the LSTM cell is described by the following recursive
equations:
f(t) = σ(Wfx(t) + Ufh(t−1) + bf ) (3.11)
i(t) = σ(Wix(t) + Uih(t−1) + bi) (3.12)
o(t) = σ(Wox(t) + Uoh(t−1) + bo) (3.13)
c(t) = f(t) ⊙ c(t−1) + i(t) ⊙ tanh(Wcx(t) + Uch(t−1) + bc) (3.14)
h(t) = o(t) ⊙ tanh(c(t)) (3.15)
where x(t) ∈ R
m is the input vector to the LSTM cell, h(t) ∈ R
l
is the hidden state vector, and
c(t) ∈ R
l
is the cell state vector. W ∈ R
l×m, U ∈ R
l×l
, and b ∈ R
l
are the weight matrices and
39
bias vectors. f(t)
, i(t)
, o(t) are the forget, input, and output gate activation vectors, respectively. σ
and tanh are the sigmoid function and hyperbolic tangent function, respectively, and ⊙ denotes
the element-wise product of two vectors.
3.2.3 Reversible Instance Normalization (RevIN)
Discrepancies between training and testing data distributions can impact the prediction performance of neural network models. To address this problem, reversible instance normalization
(RevIN) was introduced to reduce the effects of data distribution discrepancies and improve prediction performance [39]. In a sequence-to-sequence prediction setup, a sequence of future time
series is predicted based on input data from past time steps. Let x ∈ R
m represent a vector of m
variables and X ∈ R
K×m represent a sequence of input data with window size K. The prediction
model aims to predict a future sequence of values Y ∈ R
T ×m, where T is the prediction horizon. Under this setting, RevIN comprises normalization and denormalization phases. Initially,
the variables in the input sequence are normalized before being inputted into the neural network
model. In the end, the model’s output sequence is denormalized to obtain the prediction results
using the statistics computed during the normalization stage.
In RevIN, the mean and variance for each variable x
i
, i = 1, 2, ..., m, in the input sequence
are first calculated as
E[x
i
] = 1
K
X
K
t=1
x
i
(t)
, Var[x
i
] = 1
K
X
K
t=1
(x
i
(t) − E[x
i
])2
(3.16)
40
Using the calculated statistics, the input data can be normalized as
x
i
(t) = γi(
x
i
(t) − E[x
i
]
p
Var[x
i
] + ϵ
) + βi
, for t = 1, 2, ..., K (3.17)
where γi and βi are learnable parameters for each variable. The normalization step ensures the
input sequence has a more consistent mean and variance to remove non-stationary information
from the data. The prediction model then uses the transformed input sequence X to predict
future values. After prediction, the output of the model Y = [y(1), ..., y(T)
]
⊤ is denormalized
using the same statistics calculated in the normalization step. For the ith output variable at step
t, the denormalization can be written as
yˆ
i
(t) =
p
Var[x
i
] + ϵ × (
y
i
(t) − βi
γi
) + E[x
i
] , for t = 1, 2, ..., T (3.18)
As a result, yˆ
i
(t)
becomes the final prediction of time step t.
3.3 LSTM Encoder-decoder with Regularized Hidden Dynamics
for Fault Detection
3.3.1 LSTM Encoder-Decoder with RevIN
The LSTM encoder-decoder architecture was initially introduced for machine translation [79].
Its ability to generate sequences of arbitrary length from input data sequences has led to its
widespread application in time series prediction [64, 55, 88]. The encoder-decoder structure
41
consists of two LSTM networks: an encoder for compressing the input sequence into a fixedlength vector and a decoder that decodes the vector into an output sequence. Due to its predictive capabilities, this study adopts the LSTM encoder-decoder architecture to predict time series
data collected from industrial processes. Furthermore, another reason for employing the LSTM
encoder-decoder model is its capability to monitor prediction errors in latent space, where the
encoder LSTM facilitates mapping from measurements to a latent space and the decoder architecture enables predictions within this space. Further details on monitoring in latent space will be
provided in Section 3.3.3. In addition to the LSTM encoder-decoder structure, reversible instance
normalization is utilized to deal with the nonstationary nature of the industrial time series data.
The input data sequence is first normalized and then fed to the encoder-decoder model. After the
decoder predicts future time steps, these predictions are denormalized to yield the final prediction outcomes. Figure 3.1 shows the LSTM encoder-decoder structure utilized in this study. Let
Figure 3.1: LSTM encoder-decoder structure with RevIN
x(t) ∈ R
m denote a vector consisting of a set of (measured/manipulated) variables at time t, and
42
y(t) ∈ R
m denote a vector of variables that need to be predicted. Assuming the lengths of the input sequence to the encoder and the prediction horizon of the decoder are K and T, respectively,
the sequences of x and y corresponding to the encoder and decoder can be defined as
XE = [x(1), x(2), ..., x(K)
]
⊤ (3.19)
YE = [y(1), y(2), ..., y(K)
]
⊤ (3.20)
XD = [x(K+1), x(K+2), ..., x(K+T)
]
⊤ (3.21)
YD = [y(K+1), y(K+2), ..., y(K+T)
]
⊤ (3.22)
For a given input sequence XE, the input data is first normalized using Eq.3.17. Then, a mapping
from the normalized input sequence to YE can be learned by passing the input sequence through
the encoder, a dense output layer, and then followed by denormalization. The mapping from the
normalized input x(t) to h(t) can be learned by the encoder h(t) = fE(h(t−1), x(t)), resulting in a
sequence of hidden states
HE = [h(1), h(2), ..., h(K)
]
⊤ (3.23)
where h(t) ∈ R
l
is the hidden state of the encoder at time t, and l is the dimension of the hidden
state. In addition, an output dense layer fout can be learned to represent the mapping from h(t)
to R
m, resulting in
Y
E
= [fout(h(1)), fout(h(2)), ..., fout(h(K))]⊤. (3.24)
43
Finally, Y
E
is denormalized using Eq.3.18, resulting in Y˜ E. Since the process of obtaining Y˜ E
does not involve the decoder part for prediction, it is named Y˜ E as the reconstructed values of
YE. Similarly, if the encoder and the normalization step are applied to XD, we have
HD = [h(K+1), h(K+2), ..., h(K+T)
]
⊤. (3.25)
To predict the future sequence YD, the decoder LSTM first generates predictions of the hidden
states hˆ
(t) as follows:
hˆ
(t) = fD(h(t−1), y(t−1)) , for t = K + 1 (3.26)
hˆ
(t) = fD(hˆ
(t−1), yˆ(t−1)) , for t = K + 2, ..., K + T (3.27)
For the first time step in the decoder, the hidden state and cell state passed from the encoder and
the vector y(K)
are used. For subsequent time steps, the model generates predictions of hidden
states using the previously predicted values as input. Subsequently, the dense layer fout, identical
to that used in the encoder, is applied to the predicted hidden states, resulting in:
Hˆ D =[hˆ
(K+1), hˆ
(K+2), ..., hˆ
(K+T)
]
⊤ (3.28)
Y
D
=[fout(hˆ
(K+1)), fout(hˆ
(K+2)), ..., fout(hˆ
(K+T))]⊤ (3.29)
Finally, the sequence Y
D
is denormalized using Eq.3.18 to obtain the final predicted results:
Yˆ D = [yˆ(K+1), yˆ(K+2), ..., yˆ(K+T)
]
⊤. (3.30)
44
3.3.2 Regularizing Hidden States of LSTM Encoder-Decoder
The LSTM encoder-decoder structure facilitates predictions within the hidden states by leveraging the dynamics learned by the LSTM decoder. Moreover, as the time series data from industrial
plants are typically high-dimensional and correlated, when the dimension of the hidden states l is
less than the number of measurements m, these hidden states can encapsulate a low-dimensional
latent space, efficiently representing the original data’s dynamics. As a result, when performing
fault detection, the prediction error of the latent states can be used instead of monitoring the prediction results for all variables. However, since the encoder and decoder are two different LSTMs
during training, without regularization, the hidden space mapped by the encoder can be very
different from the hidden space in which the predictions are made. Consequently, a deviation
between h(t) and hˆ(t) may occur, rendering the prediction errors of the hidden states unsuitable for monitoring purposes. To achieve a consistent latent space representation across both
the encoder and decoder and to facilitate monitoring within the low-dimensional latent space,
regularization terms are introduced to the training loss function. This approach ensures that the
predicted hidden states remain close to the encoded hidden states. For sequences of length K +T,
the loss function is formulated as follows:
Loss =
1
T
(||YD − Yˆ D||2
F
) + 1
K
(||YE − Y˜ E
||2
F
) + λ
1
T
(||HD − Hˆ D||2
F
) (3.31)
The first two terms in the specified loss function correspond to the mean squared error of samples
y from the decoder and encoder, respectively. Considering the employment of the identical dense
layer fout within both the encoder and decoder, upon training, this arrangement allows the hidden
states to encapsulate the major dynamic features of the original data. Concurrently, the dense
45
layer acquires a static transformation from these hidden states to the outputs y. The third term
enforces the dynamics learned by the decoder to stay close to hidden states learned by the encoder,
and λ is used to control the amount of penalty applied to this term. Combining these three terms
allows the decoder to learn the predictable latent dynamics in the hidden space of the encoder
and the dense layer to learn the mapping from the hidden space to the original output space.
Consequently, the predictive discrepancy within the hidden states becomes a viable metric for
monitoring purposes.
3.3.3 Fault Detection with Regularized LSTM Encoder-Decoder
In the previously published RNN-based fault detection methods, prediction errors of the RNN
models are used to indicate changes in data distribution caused by anomalies. However, normal
setpoint changes and feedback controller adjustments in response to disturbances can also alter
data distribution. As a result, methods from previous studies fail to differentiate between normal
adjustments and faults, potentially triggering false alarms. To address this issue, two indices are
proposed for monitoring prediction errors of hidden states and reconstruction errors of measured
variables using a regularized LSTM encoder-decoder model. These two indices reflect dynamic
and static variations within the data, which can be used to distinguish faults from other normal
adjustments. Figure 3.2 shows the general framework of the fault detection method. In this figure,
the decoder generates predicted hidden states Hˆ D by processing previous measurements through
the encoder and decoder. Encoded hidden states, HD, are obtained by applying the encoder to
XD. These hidden state prediction errors are then utilized to monitor dynamic variations. For
monitoring static variations, reconstruction errors are calculated by applying the output layer
46
and denormalization to HD to obtain Y˜ D, which is then compared to the actual observation YD.
The following subsections detail the indices for monitoring dynamic and static variations.
Figure 3.2: Proposed fault detection method using regularized LSTM encoder-decoder
3.3.3.1 Monitoring Dynamic Variations
Monitoring dynamic variations relies on the prediction errors of hidden states from an LSTM
encoder-decoder model trained on fault-free data. The prediction error for hidden states at time
t is defined as
d(t) = h(t) − hˆ
(t)
(3.32)
For N fault-free samples, the following is defined:
H = [h(1), h(2), ..., h(N)
]
⊤ (3.33)
Hˆ = [hˆ
(1), hˆ
(2), ..., hˆ
(N)
]
⊤ (3.34)
D = [d(1), d(2), ..., d(N)
]
⊤ = H − Hˆ (3.35)
with H ∈ R
N×l
, Hˆ ∈ R
N×l
, and D ∈ R
N×l
.
47
Due to reversible instance normalization, input sequences are normalized before being fed
into the LSTM encoder-decoder model. Consequently, the prediction performance remains unaffected by data distribution changes due to static variations. Moreover, the regularization applied
during training ensures that predicted hidden states Hˆ closely align with the encoder-mapped
hidden states H. Therefore, when applying the trained model to new data, significant hidden
state prediction errors D arise solely from abnormal dynamics differing from the normal data
used to train the model. As a result, D can be used to monitor abnormal dynamics that deviate
from normal dynamical variations within the data.
Monitoring of D is performed using principal component analysis. After applying PCA to D,
the prediction error matrix of the hidden states can be decomposed as,
D = TdP
⊤
d + T˜
dP˜ ⊤
d
(3.36)
where Td and Pd correspond to the principal component subspace (PCS) that contains the major
variations, and T˜
d and P˜
d correspond to the residual subspace (RS). Then the indices T
2
d
, Qd and
φd and their corresponding control limits can be calculated using Equations 3.3, 3.6, 3.7, 3.8, and
3.10.
3.3.3.2 Monitoring Static Variations
Since monitoring dynamic variations is performed using prediction errors of the hidden states, it
is not sensitive to the data distribution changes and static variations of the original measurements.
Static shifts in variables are not reflected in the hidden states’ prediction errors due to the use of
reversible instance normalization (RevIN). In addition, since the predicted values of the original
48
variables yˆ(t) are produced by a linear mapping from the predicted hidden states of the decoder
hˆ
(t)
followed by the denormalization step, the original variables’ prediction errors yˆ(t) − y(t)
can be affected by large prediction errors of the hidden states due to abnormal dynamics. As
a result, prediction errors of original variables yˆ(t) − y(t) from the decoder are unsuitable for
static variations monitoring. In contrast, the reconstruction of the original variable y˜(t) does not
involve prediction, thus remaining unaffected by data’s dynamic variations. In addition, as the
encoded hidden states capture common static variations among variables, any deviation from this
common trend is reflected in the reconstructed value y˜(t) once the hidden states are mapped onto
R
m. As a result, to monitor the changes that break the static relationship between the measured
variables y(t)
, the error between measurements y(t) and their reconstruction y˜(t) is employed,
defined as
s(t) = y(t) − y˜(t)
(3.37)
For all N normal samples, the static error matrix is expressed as
S = [s(1), s(2), ..., s(N)
]
⊤. (3.38)
The same PCA monitoring technique is applied to S to monitor the static error measure. With ls
principal components, we have
S = TsP
⊤
s + T˜
sP˜ ⊤
s
(3.39)
and the indices T
2
s
, Qs and φs and their corresponding control limits can be calculated using
Equations 3.3, 3.6, 3.7, 3.8, and 3.10.
49
The proposed fault detection framework is summarized in Procedure 1. Monitoring indices for
dynamic and static variations are designed to reflect operating conditions and distinguish faults
from normal adjustments. For example, in industrial processes with feedback controllers, disturbances may temporarily disrupt the normal dynamic and static relationships among variables,
which will return to normal if the controller compensates successfully. However, disturbances
that controllers cannot reject should be identified as faults, impacting both dynamic and static
relationships.
Algorithm 1 Proposed Fault Detection Framework
Offline training:
1. Train a regularized LSTM encoder-decoder model using fault-free data.
2. Build PCA models on hidden states prediction errors D and measurement reconstruction
errors S.
3. Calculate the control limits for fault detection indices φd, φs.
Online monitoring:
for t = K + 1, ... do
1. Calculate hˆ
(t)
, h(t) and y˜(t)
2. Apply PCA models on d(t) and s(t)
3. Calculate the fault detection indices φd, φs
4. If φd exceeds its control limit, indicate abnormal dynamic variations
5. If φs exceeds its control limit, indicate abnormal static variations
end for
3.4 Case Studies
To demonstrate the effectiveness of the proposed fault detection method, it is applied to three
different datasets: a synthetic dataset generated from a state-space model, showcasing its ability
to detect faults affecting both dynamic and static relationships; a simulated dataset of the TEP
benchmark, demonstrating its capability to distinguish faults from normal adjustments; and a
50
real-world dataset collected from a geothermal power plant, highlighting its effectiveness with
real data.
3.4.1 Case Study 1: Synthetic Data
In this synthetic experiment, multivariate time series data are generated from a linear state space
model, and faults that affect the dynamic and static variations are created by changing the elements in the state transition and measurement matrices. Fault-free measurements are generated
from the following linear state-space model:
x(t+1) = Ax(t) + Bu(t)
y(t) = Cx(t) + w(t)
(3.40)
where w(t)
is i.i.d. N (0, 0.022
I). The matrices representing the state transition, control, and measurements are defined as follows:
A =
−0.8 0 0 0
0.55 −0.8 −0.3 0
0 0.54 −0.4 0
0.73 0 0 −0.82
, B =
0 −0.1
0.58 0
0.47 0
0 0.59
,
C =
−3.33 −2.64 −1.54 −1.16 4.91 2.96 0 0 0 0
−3.35 3.99 3.89 4.69 −2.84 3.69 0 0 0 0
0 0 0 0 0 0 −2.18 −3.25 4.38 −4.46
0 0 0 0 0 0 −4.92 1.09 3.18 −0.78
⊤
.
51
The notations y(t) and u(t) represent the vectors of measured and manipulated variables, respectively. u(t)
is a vector of two manipulated variables, where one displays step change patterns
and the other exhibits sinusoidal oscillations.
A regularized LSTM encoder-decoder model is trained using 8000 samples simulated under
normal conditions. Grid searches are conducted to identify the model parameters that yield the
best prediction results. The range of input window size is selected from {2, 4,. . . , 24}, and the
latent space dimension from {1, 2,. . . , 10}. The grid search results show that the best prediction
performance is achieved by setting the latent space dimension of the LSTM encoder-decoder
model to be l = 4 and the window size for the encoder to be K = 12. Samples are scaled to
a range between 0 and 1 using min-max normalization. After performing grid searches for the
learning rate and λ in the objective function, the model is trained using the Adam optimizer in
PyTorch, with a learning rate of 0.01 for 500 epochs. The penalty for the regularizing term in the
objective function is set to 1. It is found that predicting a longer sequence of future data enhances
the model’s ability to generalize and learn dynamic propagation, leading to the selection of a
prediction horizon T of 6 during training, despite the need for only one-step-ahead predictions
in the monitoring stage. With the trained neural network model, two PCA models for D and
S are constructed, using Equation 3.35 and Equation 3.38. For both PCA models, the number
of principal components is selected to capture 95% of the variance. The confidence level for
calculating the control limits is chosen to be 99%.
For comparison, fault detection methods based on dynamic-inner canonical correlation analysis [17] and slow feature analysis [76] are applied to the synthetic dataset. Unlike the commonly
used PCA model, dynamic-inner canonical correlation analysis and slow feature analysis are latent variable models capable of modeling dynamics in process data. Furthermore, prior research
52
has introduced specific monitoring indices for each model to monitor dynamic and static variations in the data [21, 72, 71]. As a result, these two models are selected for comparison. Using the
same 8000 fault-free samples, models based on DiCCA and SFA are constructed. According to the
criteria stated in [21], the order of the AR model in the DiCCA model is selected to be 6, and the
number of dynamic latent variables used to represent dynamics is chosen to be 4. In addition, the
number of slow features for the SFA-based monitoring method is selected to be 6. The DiCCA,
SFA, and regularized LSTM encoder-decoder model are applied to two test datasets containing
two fault types to evaluate the performance of the fault detection methods.
3.4.1.1 Fault 1: Change in Static Relationship
One entry of the C matrix is changed after 300 normal samples are generated to simulate a fault
that only affects the static relationships between the measured variables. This modification leads
to shifts in one measured variable while the dynamic variations remain unaffected. The DiCCA,
SFA, and regularized LSTM encoder-decoder models built using fault-free data are applied to this
simulated data. The detection results are shown in Figure 3.3. For each method, indices related to
dynamic variations are shown in the first two rows, and the indices related to static variations are
plotted in the bottom two rows. For monitoring the static variations, all three methods can detect
the occurrence of the static fault, as their indices consistently rise above the control limits. However, when monitoring dynamic variations, the indices from DiCCA and SFA incorrectly signal
a fault after the static fault occurs, despite no changes to the dynamics within the data. In contrast, the indices for monitoring dynamic variations from the regularized LSTM encoder-decoder
model do not exhibit repeated abnormalities. This indicates that the dynamics of the new testing
53
data remain consistent with the normal data, which is consistent with the design of this simulation case. Figures 3.4 and 3.5 illustrate how the proposed method distinguishes abnormal static
Figure 3.3: Monitoring results for the synthetic case study with changes in static variation
variations from dynamic variations. In the design of this experiment, only one measurement is
influenced by introducing a static fault. The left plot in Figure 3.4 compares the distribution of
this variable before and after the fault occurs. This reveals a shift in the distribution due to the
fault. As a result, feeding this fault-affected measurement directly into the LSTM encoder-decoder
model without applying instance normalization can distort the latent states’ predictions, leading
to false alarms about dynamic variations due to abnormal prediction errors. However, as shown
in the right plot in Figure 3.4, instance normalization makes the distribution of this fault-affected
variable more consistent before and after introducing the static fault. Therefore, the latent states’
predictions remain accurate if the dynamic relationship remains unchanged before and after introducing the static fault. Consequently, the indices for monitoring dynamic variations remain
unaffected, allowing the proposed method to identify faults that impact only static variations. Figure 3.5 demonstrates the proposed method’s effectiveness in detecting the static fault by showing
distributions of reconstruction errors for both unaffected and affected variables. The left side of
54
the figure shows the distribution of reconstruction errors of the unaffected variable before and
after the fault occurs. Since this variable is unaffected by the static fault, its reconstruction errors
remain similar to the normal condition after the fault occurred. Conversely, the right subplot
shows that the distribution of reconstruction errors for the fault-affected variable deviates from
the normal condition. As a result, the indices derived from the PCA model, which monitors static
variation, will differ from those in the fault-free scenario, allowing the static fault to be detected.
Figure 3.4: Comparison of the distribution of the fault-affected variable: before and after instance
normalization
3.4.1.2 Fault 2: Change in Dynamic Relationship
To simulate fault-inducing dynamic variation changes, the last entry of the A matrix is altered
from -0.82 to -0.2 at t = 300. This change in the state transition matrix A shifts the poles of
the system, thus changing the dynamic variations of the measured variables. Additionally, the
static relationships among certain variables will differ from the training data due to changes in
dynamic relationships.
55
Figure 3.5: Comparison of reconstruction error distributions before and after fault occurrence for
unaffected and fault-affected variables
The DiCCA, SFA, and regularized LSTM encoder-decoder models are applied to this simulated
case, with detection results depicted in Figure 3.6. It can be seen that the DiCCA model can detect
changes in both dynamic and static data relationships. In the SFA model, indices monitoring
the static relationship exceed the control limit. However, indices for dynamic variations reveal
only a brief anomaly after the fault introduction at the 300th sample, despite the dynamic fault’s
persistence throughout the simulation. Compared to the DiCCA and SFA models, the monitoring
results from the regularized LSTM encoder-decoder model show that both the dynamic and static
variations are violated, which is consistent with the design of this experiment.
3.4.2 Case Study 2: Tennessee Eastman Process
The Tennessee Eastman Process (TEP) was initially developed as a challenging benchmark for developing and evaluating plant-wide control strategies [22]. Later, based on the process diagram
56
Figure 3.6: Monitoring results for the synthetic case study with changes in dynamic variation
and control strategies [43, 68], various simulated datasets were created by introducing different disturbances into the process [8, 66] and were used as benchmarks for testing fault detection
methods. Previously, numerous publications reported high fault detection rates for all disturbance
cases, despite many disturbances not being actual faults, as the feedback controllers within the
process are capable of rejecting these disturbances [63, 109, 110]. These normal control adjustments for dealing with disturbances lead to discrepancies between the distribution of new data
and the normal data used to build the models, which causes the previous methods to misidentify
some of the disturbances as faults. To illustrate the effectiveness of our proposed fault detection
method in distinguishing faults from changes in normal operating conditions, three simulation
cases are used in this case study. The first case includes a normal control setpoint change. The
second case involves a disturbance that can be compensated for by the feedback controller. Finally, the third case is a fault that the controllers cannot reject.
The dataset used for the analysis is from [66], which contains simulations of the TE process
under normal conditions, disturbances, and control setpoint changes. The dataset consists of two
57
sets of variables: 12 manipulated variables (XMV (1-12)) and 41 measured variables (XMEAS (1-
41)). Since some XMEAS variables are related to product qualities that are not directly measured
within the process, only XMEAS(1-22) are kept for analysis. Regarding the 12 manipulated variables in the simulation, because the compressor recycle valve and stripper steam valve remain
constant, they are excluded from the analysis. As a result, 31 variables collected under normal
conditions are used to train the LSTM encoder-decoder model.
To train a regularized LSTM encoder-decoder model, 2,000 simulated fault-free samples are
used, with 1400 samples used for training and 600 for validation. Before training, the variables are
scaled to a range between 0 and 1. Additionally, grid searches are performed to tune the model
parameters for optimal prediction performance. The range of input window size is chosen from
{10, 20,. . . , 100}, and the latent space dimension from {5, 10, . . . , 30}. Sensitivity analysis shows
that the optimal result is achieved by setting the latent space dimension of the LSTM encoderdecoder model to be l = 10 and the window size for the encoder to be K = 60. In addition,
λ in the objective function for regularizing the hidden states is chosen to be 0.1. Finally, the
prediction horizon T used during training is determined to be 10. The model is trained using
the Adam optimizer in PyTorch with a learning rate of 0.001 for 500 epochs. It takes around 6
minutes to train the model on an NVIDIA Tesla P100 GPU.
Utilizing the identical set of hyperparameters, another LSTM encoder-decoder model is trained
by excluding the third regularization term specified in Eq.3.31. Figure 3.7 illustrates the significance of hidden state regularization by comparing the distribution of prediction errors for hidden
states derived from regularized and unregularized models across two fault-free datasets. Notably,
without regularization, the distribution of prediction errors for fault-free testing data diverges
58
from that of the fault-free training data despite both datasets being collected under normal operating conditions. Consequently, under these circumstances, hidden states’ prediction errors
prove unsuitable for fault detection. Applying statistics derived from fault-free training data to
fault-free testing data may result in false alarms owing to distribution discrepancies. Conversely,
the distributions of prediction errors from the regularized model for both datasets are more consistent. Hence, through regularization, it is feasible to employ hidden states’ prediction errors for
identifying abnormal dynamic variations.
Figure 3.7: Comparison of the distribution of hidden states prediction errors from the regularized
and unregularized models on two fault-free datasets
To compare the performance of the proposed method, in addition to the DiCCA and SFA models referenced in 3.4.1, four other popular anomaly detection methods were implemented: Deep
One-Class Classification (DeepSVDD) [69], Variational Autoencoder (VAE) [4], LSTM Autoencoder (LSTM-AE) [47], and Multivariate Time-Series Anomaly Detection with Graph Attention
Network (MTAD-GAT) [107].
59
(a) Setpoint change (b) Disturbance case (IDV2) (c) Disturbance case (IDV8)
Figure 3.8: Visualization of disturbance cases with PCA model
3.4.2.1 Monitoring Result of Setpoint Change
In this case, the simulation incorporates a normal control setpoint change related to production
adjustments. The setpoint change, represented by a ramp signal, is introduced at the 600th sample
and persists for 400 samples, indicating a transition to a new normal operating condition starting
from the 1000th sample. This adjustment in the setpoint leads to a shift in data distribution, which
is visualized in Figure 3.8a by plotting the first three principal components of a PCA model built
from all measured variables.
Figure 3.9: Setpoint change: one-step-ahead prediction results for select variables obtained from
the LSTM encoder-decoder models without and with the application of RevIN
60
The DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed regularized
LSTM encoder-decoder model are applied to this simulation to assess their ability to accurately
reflect this change. Figure 3.9 compares the prediction results of the LSTM encoder-decoder
model with and without the application of RevIN, showcasing the effectiveness of the proposed
method in adapting to changes in data distribution. Figure 3.10 illustrates the detection results
for all methods. Moreover, given that DiCCA, SFA, and the proposed method are equipped with
mechanisms for monitoring dynamic and static variations, their monitoring indices are depicted
in Figure 3.11 for better visualization and comparison.
The results show that the indices for DeepSVDD, VAE, LSTM-AE, MTAD-GAT, and the dynamic and static indices from DiCCA trigger alarms once the setpoint starts to change, and all the
indices stay above the control limit even after the process settles at a new steady state. As a result,
these methods cannot deal with the changes in data distribution caused by the setpoint change.
As for the SFA model, its index for dynamic variation indicates abnormality after the setpoint is
changed at the 600th sample and drops below the control limit once the setpoint becomes constant. Therefore, the monitoring of dynamic variations from the SFA model can accurately reflect
the transient period during the setpoint change. However, its indices used for static variations
do not return to normal after the process is moved to a new steady state. On the contrary, the
proposed method accurately represents the true operating condition. The index φd used for monitoring dynamic variations rises above the control limit, indicating changes in dynamics within
the data during the transient period. Once the setpoint change ends and the process reaches
a new steady state, the indices used for dynamic variations return to normal. In addition, the
indices used for monitoring the static variations show abnormalities during the transition and
become normal once the process is at a new steady state. As a result, the proposed method can
61
Figure 3.10: Setpoint change: comparison of monitoring results obtained from DeepSVDD, VAE,
LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed regularized LSTM encoder-decoder
62
Figure 3.11: Setpoint change: comparison of monitoring results between DiCCA, SFA, and the
proposed regularized LSTM encoder-decoder
deal with the changes in data distribution caused by normal setpoint changes, and the proposed
indices can better reflect both the dynamic and static variations.
3.4.2.2 Monitoring Result of Disturbance Case 2 (IDV2)
This disturbance is a step change in the composition of component B that occurs at the stripper
inlet stream. According to the feedback control strategies [68] on which the simulation is based,
the controller design is capable of compensating for this disturbance. After the composition
of B changes, the percentage of B in the purge and the feed to the reactor will increase first.
Subsequently, an imbalance between the reactant feed and liquid production rates causes the
reactor pressure to increase. Finally, the purge rate will escalate to mitigate the disturbance due
to the cascade control design. As a result, there will be temporary impacts on the stripper after the
disturbance occurs until the purge adjusts. Eventually, the controls will correct the disturbance.
Fault detection methods based on DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA,
and the proposed regularized LSTM encoder-decoder model are applied to the simulation data
for IDV2, where the disturbance is introduced at the 600th sample. The detection outcomes for
63
these methods are depicted in Figure 3.12 and Figure 3.13. It can be observed that the indices from
DeepSVDD, VAE, LSTM-AE, MTAD-GAT, and both dynamic and static variations of DiCCA remain above the control limits after the disturbance is introduced, even though the controller can
settle the disturbance. This is due to the variable "purge rate" being moved by the controller
to compensate for the disturbance, leading to a new data distribution that is different from the
fault-free data. The shift in data distribution is visualized in Figure 3.8b by plotting the first three
principal components of a PCA model built from all measured variables before and after the control adjustment. Consequently, these models generate false alarms after the disturbance has been
settled and fail to reflect the true operating conditions. As for the monitoring results from the
SFA model and the regularized LSTM encoder-decoder model, their indices used for monitoring
dynamic variations show minor abnormalities at the beginning of the introduction of the disturbance and the end of the controller compensation, indicating that the dynamic variations are not
violated during the control adjustment. However, the indices used for static variations of the SFA
model stay above the control limits. On the contrary, the indices for monitoring static variations
from the LSTM encoder-decoder model first rise above the control limit after introducing the
disturbance and then drop below the limit, indicating that the static variations return to normal
after the controller settles the disturbance. As a result, for the disturbance that the controllers
are designed to compensate for, the monitoring results from the proposed method can better represent the process’s operating conditions and reflect the effect of controller adjustments when
compensating for disturbances.
64
Figure 3.12: Disturbance case 2 (IDV2): comparison of monitoring results obtained from
DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed regularized LSTM
encoder-decoder
65
Figure 3.13: Disturbance case 2 (IDV2): comparison of monitoring results between DiCCA, SFA,
and the proposed regularized LSTM encoder-decoder
3.4.2.3 Monitoring Result of Disturbance Case 8 (IDV8)
This disturbance comprises random fluctuations in the feed composition of components A, B, and
C. This random variation causes oscillations throughout the entire process, which are difficult to
compensate for by the feedback control systems. In addition, this disturbance will affect the
product qualities of the process. As a result, it should be identified as a fault. The changes caused
by this fault are visualized in Figure 3.8c.
Similar to the previous cases, DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the
proposed regularized LSTM encoder-decoder model are applied to the simulation data for IDV8.
Figures 3.14 and 3.15 show the monitoring results. Since the data distribution after the fault occurs
is different from the normal data used in training, DeepSVDD, VAE, LSTM-AE, and MTAD-GAT
models can detect the anomaly caused by this disturbance that the controller cannot reject. In
contrast to the IDV(2) case, the index for dynamic variations of the regularized LSTM encoderdecoder model does not return to normal and oscillates around the control limit, indicating that
the controllers cannot reject the disturbance, which leads to persistent abnormal dynamics. In
addition, the indices used for static variation also show that abnormal static variation persists
66
after introducing the disturbance. The results from the DiCCA and SFA models also show similar
results where the disturbance constantly affects both the dynamic and static variations, which
indicates a fault.
3.4.3 Case Study 3: Geothermal Power Plant
The field data are collected from a binary cycle geothermal power plant. The dataset contains
five years of hourly measurements in the form of time-series data. After removing the missing
data points and data collected during shutdown periods, around 20500 data points are used for
training and validation, and the remaining 4000 data points are used for testing. Nineteen measurements are collected from the primary cycle, secondary cycle, and the turbine of the power
generation unit. Among the 19 variables, brine outlet flow, the turbine inlet guide vane (IGV) setpoint, and the R134a pump speed are selected as part of the exogenous variables for incorporating
the changes in the operational settings. In addition, ambient temperature is selected as the fourth
exogenous variable since the efficiency of the plant is highly related to the ambient temperature
because of the air-cooling system. Figure 3.16 shows a segment of the data. It can be seen that the
variables are correlated and dynamic, resulting from the control adjustment changes and ambient
temperature variations highlighted with a blue background.
A slight modification is made to the regularized LSTM encoder-decoder model to incorporate
exogenous variables during prediction. The input to the decoder model is changed to the four
exogenous variables instead of the prediction from the previous time step. The latent space dimension of the LSTM encoder-decoder model is set to be l = 10, and the window size for the
67
Figure 3.14: Disturbance case 8 (IDV8): comparison of monitoring results obtained from
DeepSVDD, VAE, LSTM-AE, MTAD-GAT, DiCCA, SFA, and the proposed regularized LSTM
encoder-decoder
68
Figure 3.15: Disturbance case 8 (IDV8): comparison of monitoring results between DiCCA, SFA,
and the proposed regularized LSTM encoder-decoder
Figure 3.16: Visualization of measurements collected from a geothermal power plant
69
encoder is K = 12. Additionally, λ in the objective function for regularizing the hidden states is
chosen to be 10. Finally, the prediction horizon T used during training is determined to be 12.
Figures 3.17 and 3.18 show the prediction results from the trained model. It can be observed
that both the one-step-ahead predictions and the 12-step-ahead predictions closely follow the
general trend in the data and show consistent responses to changes in operational settings and
ambient conditions.
Figure 3.17: One-step ahead prediction results of data collected from a geothermal power plant
The fault detection method based on the regularized LSTM encoder-decoder model is applied
to a data segment that includes various anomalies documented in the operation logs. According
to the operation logs, multiple events occurred during this period:
• 12-15: A production well went offline.
• 12-21, 12-25, 12-29: Injection pump failures.
• 1-6: High expander seal oil flow.
70
Figure 3.18: 12-step ahead prediction results of data collected from a geothermal power plant
• 1-7: High expander inlet pressure.
• 1-8: Low seal face Differential Pressure.
• 1-9: High seal oil supply flow.
These events can be categorized into three types: external disturbances, external faults, and internal faults. The offline status of a production well is considered an external disturbance. This
is because when a production well goes offline, the quantity of brine produced and transported
to the power plant changes, impacting operational conditions without indicating a malfunction
within the plant itself. Injection pump failures are classified as external faults, given that these
malfunctions occur outside the power generation unit. Conversely, the events from January 6
to January 9 represent internal faults within the power plant, as they directly affect its internal
operational processes. Figure 3.19 demonstrates the detection results for this data segment, emphasizing the indices used to monitor dynamic and static variations in measurements, along with
71
Figure 3.19: Monitoring results for data collected from a geothermal power plant
their corresponding alarms. An alarm is triggered if the index surpasses the control limit for two
consecutive time steps. It can be observed that the proposed method can detect all abnormal
events. These events are present in both the dynamic and static indices. It can be seen that the
sudden change in dynamics due to the anomalies first appears in the index for dynamic variation. Then, since the correlation relationship between measurements is changed following the
anomalies and takes longer to settle, the alarms for the static variation are triggered and remain
for multiple time steps, especially for the internal and external faults.
72
3.5 Summary
This study introduces a fault detection and condition monitoring approach using an LSTM encoderdecoder model with regularized latent dynamics. By applying regularization during model training, the hidden states of the LSTM encoder and decoder represent a low-dimensional latent space
that characterizes the main dynamics of the original high-dimensional data. Additionally, reversible instance normalization is applied to reduce the effect of discrepancies in data distribution on latent space prediction accuracy caused by normal adjustments made by feedback control
systems within the industrial process. Furthermore, two monitoring statistics are proposed using the regularized LSTM encoder-decoder model to detect faults that disrupt normal dynamics
and static relationships. As a result, the proposed fault detection method and monitoring indices
better reflect the true operating conditions of the process and reduce false alarms by distinguishing faults from normal adjustments, such as control setpoint changes and adjustments made by
controllers to compensate for disturbances. The effectiveness of the proposed fault detection measures is demonstrated using a synthetic dataset, three cases from the Tennessee Eastman process
benchmark dataset, and a real-world dataset collected from a geothermal power plant.
73
Chapter 4
Learning Latent Space Dynamics for Estimating and
Predicting System Conditions with Limited Measurements
4.1 Introduction
Geological CO2 storage is crucial for strategies targeting a net-zero energy system and addressing
climate change challenges. The industrial-scale deployment of geological CO2 storage requires
intensive monitoring for risk assessment and mitigation. In contrast to other industries, such as
manufacturing and energy, monitoring challenges in geological CO2 storage stem from high costs
and a lack of effective tools for measuring subsurface conditions. For instance, 4D seismic surveys,
which capture three-dimensional seismic data at various times over the same area to monitor
changes, are common for detecting CO2 plumes. However, these surveys are typically spaced
years apart, leading to substantial gaps in data and delays in tracking the dynamic behavior of CO2
plumes This leads to challenges in timely monitoring and understanding the dynamic behavior
of CO2 plumes. Moreover, converting seismic waves to CO2 saturation during the inversion
process of 4D seismic data introduces significant uncertainty, complicating the interpretation of
74
subsurface conditions. Alternatively, tools like crosswell seismic and monitoring wells allow for
more frequent data collection. Yet, these measurements cover only a small portion of the storage
site, leaving vast areas unmonitored and undersampled.
The data measured from CO2 storage are high-dimensional, capturing the extensive scale
and intricate three-dimensional characteristics of the sites. As CO2 is injected into subsurface
formations, it expands and migrates, influenced by varying geological conditions. This process
results in complex dynamics that are challenging to track and predict.
This chapter addresses the limitations in understanding CO2 plume dynamics due to limited
measurements and introduces a novel framework designed to extract and represent the spatial
dynamics of CO2 plume migration from limited measurements in a reduced-dimensional latent
space. To accommodate the unique characteristics of the data from CO2 storage sites, a framework is proposed to integrate diverse input sources and varying measurement frequencies to
reconstruct and predict CO2 plume migration. A spatio-temporal neural network model named
CO2-4DNet is designed to effectively capture the spatio-temporal dynamics of CO2 plume migration in a compact latent space, offering a robust solution to improve the accuracy and reliability
of monitoring in geological CO2 storage.
The remainder of this chapter is organized as follows. Section 4.2 introduces details of the
CO2-4DNet and proposed framework for reconstructing and predicting the CO2 plume. Subsequently, Section 4.3 presents case studies using a simulated dataset to demonstrate the effectiveness of the proposed method. Finally, summaries are presented in Section 4.4.
75
4.2 Extract Latent Space Dynamics from Spatio-temporal Data
4.2.1 Overview
Figure 4.1 illustrates the proposed framework’s structure, which consists of two main parts: reconstruction and prediction of CO2 plume saturation. The CO2-4DNet model initially employs a
combination of global and local data to create a series of 3D reconstructions of the CO2 plume saturation. After this reconstruction phase, a spatio-temporal model uses these reconstructions and
future injection rates as inputs to forecast the trajectory of the CO2 plume. The following section
introduces the CO2-4DNet model and discusses its application in reconstructing and predicting
CO2 plume saturation.
Figure 4.1: Proposed framework: reconstruction and prediction of CO2 plume saturation with
CO2-4DNet
76
4.2.2 CO2-4DNet Model
The CO2-4DNet model is designed to capture the spatio-temporal dynamics of CO2 plume migration within a compact latent space. As shown on the right side of Figure 4.1, CO2-4DNet includes two main components. The first is a CNN-based spatial encoder and decoder that captures
spatial correlations and integrates various measurement forms into a unified low-dimensional
latent space. The second component is a processing unit designed to address the spatio-temporal
dynamics of the CO2 plume within this latent space, incorporating both spatial and temporal dependencies. The details of these two primary modules are discussed in the subsequent sections.
4.2.2.1 Spatial Encoder and Decoder with Depthwise Separable Convolutional Neural
Networks
Consider a sequence consisting of T time steps of three-dimensional input data, characterized
by dimensions height H, width W, depth D, and C channels. This sequence is mathematically
denoted by the tensor X ∈ R
T ×C×H×W×D. The proposed architecture incorporates a spatial
encoder and a spatial decoder based on Convolutional Neural Networks (CNNs) to capture spatial
correlations while preserving temporal dependencies effectively. Given the intricate 3D structure
of the CO2 plume, the model incorporates 3D convolution operations, which inherently add more
parameters than 2D convolutions, thereby increasing computational demands and the risk of
overfitting. To address this, the model adopts a depthwise separable convolutional approach,
decomposing a traditional 3D convolution into depthwise and pointwise convolutions [98, 32].
The depthwise separable convolution involves conducting a depthwise convolution with separate
filters for each channel initially, followed by using a 1×1 filter to execute a pointwise fusion of
77
the feature maps produced in the initial step. The spatial encoder employs depthwise separable
Figure 4.2: Illustration of depthwise separable convolution.
3D convolutions and consists of multiple convolutional layers, mathematically expressed as:
Zi+1 = SiLU(GroupNorm(Conv3d(Zi))) (4.1)
In this equation, SiLU represents a sigmoid linear unit, functioning as the nonlinear activation
function[24], and GroupNorm refers to Group Normalization[94]. The encoder starts with Z1,
using the input data matrix X, and generates a latent representation Z ∈ R
T ×C
′
×H
′
×W
′
×D
′
.
Here, C
′
, H
′
, W
′
, and D
′
denote the dimensions of the latent space. The spatial decoder reverses
this mapping, converting Z ∈ R
T × C
′
× H
′
× W
′
× D
′
back into X ∈ R
T ×C×H×W×D. This
is achieved by replacing the Conv3d operation in Equation 4.1 with a transposed convolutional
operation. Additionally, a residual connection is added to link the spatial encoder’s first layer
78
with the spatial decoder’s final layer. This feature, frequently used in U-Net architectures [5],
helps preserve key spatial features from the input data during the decoding process.
4.2.2.2 Processor for Modeling Spatio-temporal Dynamics
A processor analyzes spatial dependencies and temporal variations within the compressed latent
space created by the spatial encoders. In this research, the SimVP model, originally developed
for video prediction tasks [28], is employed for spatio-temporal modeling of the dynamics in
the latent space. SimVP has also been successfully utilized in other spatio-temporal prediction
scenarios, such as traffic flow forecasting and climate prediction [80]. Compared to alternative
spatio-temporal models like PredRNN [87], PhyDNet [31], and CrevNet [101], SimVP has exhibited enhanced performance across various benchmark datasets. Due to its simplicity and effectiveness, SimVP is selected as the backbone model in CO2-4DNet. As the original SimVP model
was intended for handling 2-dimensional spatial data, modifications involving three-dimensional
spatial data are essential for integration into the proposed model. The 2D convolution within
SimVP is modified to 3D convolution using the depthwise separable 3D convolution technique
outlined in Section 4.2.2.1, thus enabling effective processing of 3D spatial datasets. In the processor unit of the proposed framework, multiple spatio-temporal attention units are stacked to
capture the intricate dynamics associated with CO2 plume migration.
A key feature of the SimVP model is using large kernel convolutions to achieve effective
receptive fields [15], followed by spatio-temporal attention for capturing features based on temporal and spatial correlations. Given the latent representations Z ∈ R
T ×C
′
×H
′
×W
′
×D
′
, to capture
the temporal evolution inside the sequential latent states, large kernel convolution is applied to
reshaped tensors of shape (T × C
′
) × H
′
× W
′
× D
′
. To enhance computation efficiency and
79
reduce parameter count, large kernel convolution is divided into three sequential steps: (1) a
(2d − 1) × (2d − 1) × (2d − 1) depthwise convolution for capturing local receptive fields within
T ×C individual channels, (2) a K
d ×
K
d ×
K
d
depth-wise dilation convolution for establishing connections between distant receptive fields, and (3) a 1×1 convolution for modeling channelwise
interaction and temporal dynamics. Here, d represents the dilation factor, and the receptive field
size is defined as K × K × K. As a result, the large kernel convolution can be formalized as:
Z˜ = Conv3d1×1 (Conv3ddw_d (Conv3ddw(Z))) (4.2)
where Conv3ddw represents depthwise convolution, Conv3ddw_d represents depthwise dilation
convolution, and Conv3d1×1 denotes channelwise 1×1 convolution. To obtain the spatio-temporal
attention, Z˜ is then divided channelwise into two components. One component is processed
through a sigmoid gating function to generate the spatio-temporal attention coefficient, while
the other serves as the latent features. These operations can be formulated mathematically as
follows:
Z˜
1,Z˜
2 = split(Z˜) (4.3)
Zˆ = σ(Z˜
1) ⊙ Z˜
2 (4.4)
where σ is sigmoid function and ⊙ is element-wise multiplication. The output Zˆ is a function
of both the gated attention σ(Z˜
1) and the latent features Z˜
2. This enables the model to capture spatio-temporal dynamics effectively while disregarding irrelevant features. Following the
80
spatio-temporal attention module, two additional 1 × 1 convolutions are performed to further
process the Zˆ.
4.2.3 CO2-4DNet for Reconstruction and Prediction
The proposed CO2-4DNet enables reconstruction and prediction by training models to process
various types of input data. In the reconstruction phase, the system uses multiple spatial encoders
to effectively utilize global and local measurements. These encoders build the latent space and
produce individual representations, which are then combined and fed into the processor unit.
Let Xt−T +1
g ∈ R
1×1×H×W×D denote a 3D global low-resolution reconstruction of the CO2 plume
saturation, obtained from the inversion of 4D seismic data acquired at time t−T + 1. Xt−T +1:t
l ∈
R
T ×1×H×W×D denotes a sequence of local high-resolution but noisy measurements of the CO2
plume from time t − T + 1 up to t. Ut−T +1:t ∈ R
T ×1×H×W×D denotes a sequence of 3D tensors,
where the values at the injection locations vary according to the injection rates. After extending
the first dimension of Xt−T +1
g
to T by replication, three spatio-temporal encoders take these
three input tensors, resulting in three outputs in the latent space Z
t−T +1:t
g ∈ R
T ×C
′
×H
′
×W
′
×D
′
,
Z
t−T +1:t
l ∈ R
T ×C
′
×H
′
×W
′
×D
′
, and Z
t−T +1:t
u ∈ R
T ×C
′
×H
′
×W
′
×D
′
, which can be mathematically
formulated as:
Z
t−T +1:t
g =fencg
(Xt−T +1
g
)
Z
t−T +1:t
l =fencl
(Xt−T +1:t
l
)
Z
t−T +1:t
u =fencu
(Ut−T +1:t
)
81
These three encoded latent variables are subsequently combined through element-wise addition,
integrating spatial information from diverse sources before being sent to the processor unit. A
spatial decoder then processes the processor’s output, producing the final sequence of 3D reconstructions of the CO2 plume saturation X˜ t−T +1:t over the interval from t − T + 1 to t.
Given the high costs of acquiring 4D seismic data, the interval for collecting this data might
exceed the reconstruction period T. For instance, if seismic data are gathered every 2T time units
and the latest data collection occurred at t−2T +1, there may be a need for reconstruction every
T time units to utilize the most recent local measurements from t − T + 1 to t. In such cases, the
global information obtained from 4D seismic data can become outdated. To address this issue, an
additional network is explicitly trained for reconstruction tasks. In this modified model, the input
for one of the encoders changes from the outdated low-resolution 3D CO2 plume data, denoted by
Xt−2T +1
g
, to more recent reconstructions from the initial CO2-4DNet, represented as X˜ t−2T +1:t−T
.
This model can also operate recurrently, utilizing the most recent local measurements and prior
reconstructions as inputs. This feature becomes especially valuable during prolonged gaps between 4D seismic data collections, enabling the inclusion of the latest local measurements and
up-to-date global information on the CO2 plume.
To predict the future trajectory of the CO2 plume migration denoted as Xˆ t+1:t+T
, an additional CO2-4DNet model is utilized. Unlike the reconstruction models for which data from the
current time is used to reconstruct the CO2 plume, local measurements are not yet available for
the prediction time period. As a result, this model deploys two spatial encoders to process the
previously reconstructed CO2 plume saturation, X˜ t−T +1:t
, and a sequence of future injection rates
at different injection wells, denoted as Ut+1:t+T
.
82
In summary, both the reconstruction and prediction of the CO2 plume utilize the fundamental CO2-4DNet architecture. The key difference is in the source of the input data. During the
reconstruction phase, one or two models are required depending on the interval at which global
measurements are taken. These models either process the 3D low-resolution CO2 plume saturation derived from seismic data inversion or use saturations from prior reconstructions. In the
prediction phase, a separate model is trained to map previous reconstructions and future control
inputs to CO2 plume predictions. The mathematical formulations of these models are as follows:
X˜ t−T +1:t = CO2-4DNet_1(Xt−T +1
g
, Xt−T +1:t
l
, Ut−T +1:t
)
X˜ t−T +1:t = CO2-4DNet_2(X˜ t−2T +1:t−T
, Xt−T +1:t
l
, Ut−T +1:t
)
Xˆ t+1:t+T = CO2-4DNet_3(X˜ t−T +1:t
, Ut+1:t+T
)
4.3 Case Study
4.3.1 Numerical Simulation for Generating Training Dataset
The dataset for this study is generated using a synthetic 3D two-phase fluid flow simulation
model created using CMG-GEM [14]. Situated 1000 meters deep, the storage reservoir is segmented into a grid system consisting of 20 vertical blocks, each 2.5 meters high, and a horizontal
layout of 120×120 blocks, with each block measuring 50 meters ×50 meters. This configuration
results in a total grid count of 288000 for the storage reservoir. The adjacent aquifer, located at the
same depth as the reservoir, retains identical vertical and central horizontal grid resolutions. To
optimize the computational efficiency, the grid dimensions at the lateral extremities of the aquifer
83
are coarser, comprising four blocks, each spanning 500 meters in the horizontal direction. Consequently, the overall grid count for the simulation model is 327680. The reservoir exhibits spatially
heterogeneous permeability, ranging from 0.1 mD to 2000 mD. Sequential Gaussian simulation is
utilized via the Stanford Geostatistical Modeling Software to produce various realizations of the
permeability map, each with different degrees of heterogeneity. Geometric anisotropy within the
model is characterized by a spherical variogram model. In this model, the maximum (x-direction)
and medium (y-direction) ellipsoid ranges are isotropically established across three distinct scenarios: 20, 60, and 100 units, while the minimum ellipsoid range is consistently set at two units.
Consequently, ten realizations are produced for each of the three scenarios. Figure 4.3 displays
the permeability for a subset of these realizations.
Figure 4.3: Subset of 3D log-permeability maps from generated realizations
For each permeability map realization, the injection process spans 15 years with data sampled
every three months, resulting in a 4D tensor for CO2 saturation with dimensions 60 × 128 ×
128 × 20. Simulation data are generated using varied well injection schedules, with two fixed
injection wells at coordinates {55, 55, 17} and {75, 75, 17}. The total CO2 injection volume over
84
this 15-year period, represented as V
inj
TCO2
, is fixed at 3.5e + 09 m3
(∼ 10 mtCO2). The injection
duration is segmented into three intervals, each lasting five years, to capture a broad spectrum of
realistic response scenarios. Each interval is assigned a specific injection volume, Vintervalx
, which
is chosen from Vhigh, Vmed, or Vlow. These volumes are defined as follows: Vhigh = 0.5V
inj
TCO2
,
Vmed = 0.35V
inj
TCO2
, and Vlow = 0.15V
inj
TCO2
. Using these volumes, ten unique combinations are
generated by adjusting the injection rate annually. This results in 300 simulation runs, each
generating distinct spatio-temporal CO2 saturation data. The saturation of the plumes at the end
of injection for the realization in Figure 4.3 is shown in Figure 4.4.
Figure 4.4: Samples of plume saturation at the end of injection
4.3.2 Data Processing
With a resolution of 128 × 128 × 20 for each 3D datapoint at every timestep, the data generated
by the simulation model need to be preprocessed to accurately simulate global CO2 saturation
measurements from 4D seismic surveys and local measurements from crosswell seismic and injection wells. Accounting for the inherent uncertainty and reduced resolution in determining
85
CO2 plume saturation from 4D seismic data inversion, the original dataset is downsampled using scale factors dH = 2, dW = 2, and dD = 4. Specifically, each separate 3D block defined
by dimensions dH, dW, and dD is assigned the mean CO2 saturation value within that block.
These selected scale factors mirror the comparatively lower vertical resolution of seismic surveys
relative to the horizontal resolution.
In representing the CO2 saturation reconstructed from crosswell seismic data, saturation values between the two injection wells are retained while the values in all other grid blocks are set
to zero. This approach highlights the inherent limitations of crosswell seismic techniques, which
typically yield accurate data exclusively between wells. Additionally, Gaussian noise of 10% is
applied to these values, representing the intrinsic uncertainties and noise in the crosswell seismic dataset. Regarding CO2 saturation measurements at the injection well sites, saturation values
along the vertical lines at both wells are kept, while other values are zeroed. A 10% uncorrelated
Gaussian noise is introduced to these retained data points to mimic measurement noise.
Finally, for the injection rates that function as control inputs, the 3D spatial details of the
injection sites are preserved by assigning time-varying injection rates to the voxels at coordinates
{55, 55, 17} and {75, 75, 17}. These rates are normalized by dividing by 2500000, ensuring they fall
within a suitable value range for neural network model inputs.
4.3.3 Model Training
In this study, the simulation is performed at intervals of three months. Therefore, the reconstruction interval is determined to be one year, corresponding to four data points. Subsequent
86
Interval of 4D
seismic data acquisition
Crosswell
seismic data
Monitoring
from injection wells
Case 1 1 year ✓ -
Case 2 1 year - ✓
Case 3 2 years ✓ -
Case 4 2 years - ✓
Case 5 5 years ✓ -
Case 6 5 years - ✓
Case 7 - ✓ -
Table 4.1: Cases with varying types of measurements
predictions are also made in units of four data points based on the results of the prior reconstruction and future control inputs. Based on these reconstruction and prediction intervals, multiple
scenarios that incorporate a variety of input data sources are examined to evaluate the performance of the proposed method and assess the importance of different types of measurements.
These scenarios are detailed in Table 4.1.
Several models are developed for the reconstruction process due to the variety of input types
and differing acquisition intervals for global measurements across the seven cases. In Cases 1
and 2, two separate models are trained to reconstruct the CO2 plume, each specifically tailored
to a distinct input type. For Cases 3 to 6, the acquisition interval of 4D seismic data surpasses the
assigned reconstruction period, necessitating two additional models as detailed in Section 4.2.3.
In Case 7, where global data is absent, an alternative reconstruction model equipped with two
spatial encoders is employed. Furthermore, a separate prediction model is trained to integrate
both past reconstructions and future control inputs. A detailed summary of the inputs, outputs,
and applications for each model is presented in Table 4.2.
87
Model Input Model Output Usage
Model 1 Xt−3
g
, Xt−3:t
l_crosswell, Ut−3:t X˜ t−3:t Reconstruction: Case 1, Case 3, Case 5
Model 2 Xt−3
g
, Xt−3:t
l_monitorwell, Ut−3:t X˜ t−3:t Reconstruction: Case 2, Case 4, Case 6
Model 3 X˜ t−7:t−4
g
, Xt−3:t
l_crosswell, Ut−3:t X˜ t−3:t Reconstruction: Case 3, Case 5
Model 4 X˜ t−7:t−4
g
, Xt−3:t
l_monitorwell, Ut−3:t X˜ t−3:t Reconstruction: Case 4, Case 6
Model 5 Xt−3:t
l_monitorwell, Ut−3:t X˜ t−3:t Reconstruction: Case 7
Model 6 X˜ t−3:t
, Ut+1:t+4 Xˆ t+1:t+4 Prediction: all Cases
Table 4.2: Model inputs and outputs and their corresponding cases
The dataset of 300 simulations is partitioned to train the models by allocating 243 simulations
for training and validation and reserving the remainder for testing. Each simulation comprises
60 3D data points, spanning a 15-year injection period. Given that the neural network models
are designed to perform reconstructions and predictions at four-datapoint intervals—equivalent
to one year—each simulation sequence is further segmented into smaller subsequences, each corresponding to inputs and outputs covering a one-year period. Following optimizing hyperparameters through grid searches, the models are implemented using PyTorch and trained with
the Adam optimizer. The batch size is set to 4, and the learning rate is fixed at 0.001. Training
proceeds for 100 epochs on a single NVIDIA A100 GPU. The detailed neural network structures
of the CO2-4DNets are presented in Figure 4.5. During training, as many grid cells in the data
points may have zero values, relative errors are employed as the metric over mean square error
to monitor model convergence and assess prediction performance. For the output with a batch
size equal to one, the relative error can be defined as:
Relative error =
PT
t=1
PH
i=1
PW
j=1
PD
k=1(Xoutput − Xtrue)
2
tijk
PT
t=1
PH
i=1
PW
j=1
PD
k=1(Xtrue)
2
tijk
(4.5)
88
4.3.4 Comparison with Neural Network-based Proxy Model Requiring
Geological Information as Input
The proposed framework in this work eliminates the need for geological information, such as
permeability and porosity maps, as input into the network while retaining the ability to continuously incorporate field measurements. To illustrate one of the advantages of this framework, the
study begins by examining the general characteristics of a neural network-based proxy model
that requires geological information as input. RU-Net is chosen as a representative example of
such a model. The RU-Net has previously been demonstrated to effectively predict CO2 plume
migration[92, 81]. It integrates a residual U-Net with convLSTM networks to forecast the evolution of CO2 saturation. Given the provided permeability map, porosity map, and initial field
conditions, the RU-Net recurrently predicts the CO2 plume. Since the RU-Net used constant control inputs in previous studies, the model has been modified by integrating additional control
inputs into its 3D ConvLSTM cell.
4.3.4.1 Comparison of Testing Errors
In this case study, the performance of the trained RU-Net model is compared with the reconstruction and prediction outcomes of the proposed framework under two distinct scenarios. In
the first scenario, RU-Net receives the exact permeability and porosity maps that are used to
generate the CO2 plumes in the simulations. The second scenario presents RU-Net with sample
permeability maps, simulating the uncertainties commonly associated with geological variables.
89
Figure 4.5: Details of the CO
2-4DNet
90
For the proposed framework, two cases that receive varying degrees of information from field
measurements are considered: Case 3 and Case 7. As outlined in Table 4.1, Case 3 benefits from
global measurements taken at two-year intervals and includes a cross-section of reconstructed
CO2 saturation derived from crosswell seismic data. Conversely, Case 7 relies exclusively on local measurements obtained via crosswell seismic methods. Model 1, Model 3, and Model 5, as
detailed in Table 4.2, are utilized to reconstruct CO2 plumes.
The following error metric is employed to evaluate the model performances, which has been
utilized for evaluating CO2 saturation errors to provide physical interpretations of the errors[91].
For each 3D CO2 plume at a specific time, the error is defined as:
Error =
1
PIi
X
i∈Ω
Ii
|Si − Sˆ
i
|, Ii = 1 if (Si > 0.01) or (|Sˆ
i
| > 0.01) (4.6)
where S is the actual CO2 saturation value, Sˆ is the reconstructed or predicted CO2 saturation
value, and Ω includes all the cells in the 3D grid.
Figure 4.6 depicts the comparison of errors, calculated using Equation 4.6, between the RUNet model and the proposed models that employ field measurements to reconstruct and forecast
the 3D CO2 plume saturation one year in advance. In this plot, solid lines depict the average error
for each year across ten different realizations, while the shaded areas represent the corresponding standard deviations. The figure shows that RU-Net yields accurate forecasts when provided
with precise geological information. However, when RU-Net is provided with inaccurate geological data—a common scenario in real-world applications due to uncertainties in subsurface
characterization—its forecasts deviate from the actual CO2 plume saturation in the field, resulting in larger relative errors. Regarding the reconstruction error from the trained CO2-4DNet in
91
Case 2, the error is comparable to that from RU-Net with exact input. Furthermore, since global
measurements are taken at intervals of two years, a noticeable drop in reconstruction accuracy
occurs each time the model receives the latest global measurements, demonstrating its effective
integration of field measurements into its reconstructions. In contrast, the model that relies exclusively on crosswell seismic data incurs a larger error due to limited information from local
measurements. Nevertheless, despite this limitation in information, the model continuously integrates field measurements. As a result, its performance is slightly superior to that of RU-Net
with incorrect inputs.
The rightmost subplot in Figure 4.6 illustrates the one-year-ahead prediction errors. Since
the initial reconstruction occurs after the collection of four data points, the error metrics for the
one-year-ahead prediction models begin from the second year. During the initial injection phase,
rapid changes in the shape of the CO2 plume boundary and its relatively small volume in the early
years result in large prediction errors from the CO2-4DNet models. However, as time progresses
and more comprehensive global information is acquired, the CO2-4DNet yields more accurate
prediction outcomes. Conversely, when forecasts are based on the reconstruction outcomes of
the model solely dependent on crosswell seismic data, the results are less accurate due to the
significant reconstruction error.
4.3.4.2 Comparison of Onset Time Errors
Onset time errors are calculated and compared to better visualize the reconstruction and prediction results of the CO2 plume across different models. The concept of onset time, utilized in the
literature [41, 53], refers to the calendar time when CO2 saturation at a specific location surpasses
a predefined threshold value (established at 0.05 in this study). Onset time errors are computed by
92
Figure 4.6: Reconstruction and prediction results: CO2-4DNet with varying inputs vs. proxy
models requiring geological information
comparing the onset times derived from the models’ outputs with the actual onset times obtained
from the simulation dataset. Using an onset time error map provides a more concise visualization
as it eliminates the need to display 3D CO2 saturation error maps for every timestep. Instead, it
consolidates these multiple 3D images from various timesteps into a single 3D image, effectively
illustrating the error in the propagation of the CO2 plume.
Figure 4.7: Evolution of CO2 plume saturation for one test case
Figure 4.7 displays the evolution of the CO2 plume saturation from a simulation case not utilized in model training. Figure 4.8 illustrates the onset time error for the CO2 plume across various
models at differing depths specific to this simulation case. As depicted in Figure 4.8a, the RU-Net,
with known permeability input, accurately captures the general shape and migration direction of
93
(a) Reconstructed onset time errors
(b) Predicted onset time errors
Figure 4.8: Comparison between onset time errors for CO2-4DNet with varying inputs versus
proxy models requiring geological information
94
the CO2 plume.However, as prediction saturation errors increase over time, the accuracy of onset
time decreases. Conversely, the RU-Net with inaccurate permeability input leads to significantly
larger onset errors, yielding inaccurate CO2 saturation values and incorrect representations of
the plume’s migration path and overall shape.
In comparison, the CO2-4DNet’s reconstruction outcomes, which incorporate yearly global
measurements and crosswell seismic data, exhibit the lowest onset time errors. The magnitude
of these errors remains consistent throughout the injection process as the CO2 plume expands
outward from the injection wells. With the extension of the acquisition interval for global data,
the onset time error increases. The CO2-4DNet model relying solely on local crosswell seismic
data exhibits a significantly greater onset time error compared to models incorporating global
measurements. Although the model trained on local data initially provides relatively accurate
onset times, its reconstructions become less accurate over time due to limitations in the input
source. Figure 4.8b compares the onset time errors between the RU-Net models and the CO2-
4DNet, which predicts based on the reconstructed past trajectory of the CO2 plume. Due to error
accumulation during prediction, onset time errors from these CO2-4DNet models are generally
more significant than errors from the reconstruction results. Nevertheless, onset time remains
more accurate than the proxy model, which requires geological information as input.
4.3.5 Error Comparison of CO2-4DNet with Varying Input Sources
This section analyzes how various field measurements impact the CO2-4DNet model’s ability to
reconstruct and predict CO2 plume saturation over time. The analysis compares two aspects:
95
first, it evaluates the significance of the interval between 4D seismic data collections, which provide global measurements to the model; second, it assesses the influence of various types of local
measurements. Figure 4.9 displays the trajectories of reconstruction errors and one-year-ahead
prediction errors across different model configurations. In these plots, solid lines represent the
average error for each year across ten different realizations, while shaded areas depict the corresponding standard deviations. Additionally, the figures on the right illustrate the CO2 saturation
errors from different models for one of the realizations.
For all models utilizing global measurements, increased intervals between global data collections are observed to result in larger reconstruction and prediction errors. Notably, errors
during the initial injection phase are higher than those in later stages, indicating that the model’s
reconstruction accuracy is closely linked to the CO2 plume’s migration rate and dispersion. Furthermore, in all scenarios involving global measurements as input, a noticeable error reduction
becomes apparent each time global data is collected. This underscores the critical role of global
measurements in monitoring the CO2 plume’s evolution. In this example, global measurements
enhance the CO2-4DNet’s performance and capability to seamlessly integrate new data into its
reconstructions. Regarding local data types, in scenarios that include global measurements, integrating crosswell seismic data consistently yields superior outcomes compared to those using
monitoring wells. This observation aligns with expectations, as data from crosswell seismic provide broader spatial coverage, thereby equipping the models with more comprehensive information to reconstruct CO2 saturation. This trend is particularly evident when combined with
longer global measurement intervals, where the setup involving monitoring wells consistently
displays higher error rates. For the specific case relying exclusively on crosswell seismic data
96
without global measurements, the errors are the largest and exhibit a continuous increase, further highlighting the significance of global measurements in the effectiveness of the proposed
model.
4.3.6 Generalization to Unseen Scenarios
To illustrate the CO2-4DNet’s ability to continuously integrate field measurements and adapt to
diverse unseen scenarios, the trained models are evaluated using two new scenarios different
from those used during training. The first scenario involves two injection wells with variable
well locations. This scenario aims to showcase the trained models’ ability to generalize across
diverse geological contexts and varying injection well locations. The second scenario introduces
a single injection well at fixed coordinates {64, 64, 17}. This scenario is employed to demonstrate
the proposed framework’s capability to generalize across varying numbers of wells. In the singlewell scenario, with only one injection well, crosswell seismic data are assumed to be collected
between the coordinates {54, 64} and {74, 64}. Ten simulations are conducted for each scenario
using permeability maps not included in the training dataset.
Figure 4.10 compares reconstruction and prediction errors across ten simulations in three distinct scenarios: two injection wells with fixed locations, two with random locations, and one at a
fixed coordinate. Among these scenarios, the first aligns with the configuration of the data used
to train the models, while the latter two scenarios are new to the trained models. Given that
these scenarios are entirely new to the models, the errors for the new cases are slightly higher
than those in the scenario used for training. Nevertheless, given the magnitude of the errors,
the trained reconstruction and prediction models demonstrate robust generalization capabilities
97
(a) Reconstruction errors
(b) Prediction errors
Figure 4.9: Comparative analysis of errors across different model inputs and local measurement
types
98
(a) Comparison of reconstruction errors
(b) Comparison of prediction errors
Figure 4.10: Comparative analysis of errors for three scenarios: two fixed-location injection wells,
two random-location injection wells, and one fixed-location injection well
99
(a) Comparison of onset time from reconstructed CO2 plume
(b) Comparison of onset time from predicted CO2 plume
Figure 4.11: Comparing the onset time for the scenario with two randomly located injection wells
(color indicates different onset times)
100
(a) Comparison of onset time from reconstructed CO2 plume
(b) Comparing the onset time from predicted CO2 plume.
Figure 4.12: Comparing the onset time for the scenario with one fixed-location injection well
(color indicates different onset times)
101
attributable to their ability to incorporate field measurements as input. Across these scenarios,
models that incorporate global measurements display smaller errors and reduced variability compared to the model relying solely on local crosswell seismic data. Furthermore, consistent with
the discussion in Section 4.3.5, the new scenarios demonstrate that more frequent global measurements improve reconstruction and prediction results.
To visualize the results for the new scenarios, top views of the onset times for three models
incorporating global measurements are plotted across four simulation cases. Figure 4.11 shows
the reconstructed and predicted onset times for scenarios with randomly varying well locations.
Despite variations in injection well positions and permeability maps, the models adeptly capture
the intricate evolution of the CO2 plume. When comparing the models, the onset map produced
by the model using global measurements at one-year intervals closely resembles that produced
by the model with two-year intervals. However, due to the lengthy gaps between updates, the
model employing a five-year global measurement interval yields a less accurate reconstructed
onset time. Similar observations are evident from the comparison of the onset times for the
scenario featuring one injection well at a fixed location, as shown in Figure 4.12.
4.4 Summary
This chapter introduces a novel framework for reconstruction and short-term prediction of CO2
plume saturation from field measurements. At the core of the reconstruction and prediction
stages is the CO2-4DNet model, which seamlessly integrates global and localized field measurements from multiple sources to efficiently capture the spatio-temporal dynamics of CO2 plume
migration. Within the CO2-4DNet, several CNN spatial encoders translate field measurements
102
into a compact latent space representation. Once the encoded measurements are fused in this
latent space, a processor model captures the spatio-temporal dynamics of the CO2 plume migration. Finally, a spatial decoder maps the latent representation back to the original data space
for reconstruction and prediction. Simulated datasets generated from a three-dimensional CO2
storage site are used to test the effectiveness of the proposed framework and CO2-4DNet model.
Comparisons with previously proposed neural network-based proxy models, which require geological information as inputs, demonstrate that the proposed model accurately reconstructs and
predicts 3D CO2 plume saturation by continuously integrating the latest field measurements.
Additionally, the impact of various input sources on the performance of the CO2-4DNet is evaluated. The results underscore the importance of global measurements for the model’s efficacy.
As anticipated, increased frequency in global measurement acquisition and a more comprehensive set of local measurements from crosswell seismic data lead to improved model performance.
Furthermore, the generalization performance of the proposed framework is evaluated using two
additional scenarios that are not used in training the model.
103
Chapter 5
Conclusion
The dissertation has developed and implemented several data-driven tools to extract and utilize
latent space dynamics in modern industrial systems. This research primarily contributes to three
areas: high-dimensional data visualization and interpretation, anomaly detection and monitoring,
and spatio-temporal estimation and prediction.
In the first application, the efficacy of the proposed DELFA troubleshooting procedure is
demonstrated through a comprehensive analysis of high-dimensional, plant-wide operational
data. By employing dynamic latent variable modeling through DiCCA, the procedure captures
both cross-correlation and autocorrelation within the data. Additionally, an extended version
of DiCCA enhances the model’s feature extraction by removing uninterested variations due to
external disturbances, thereby contributing to analyzing major oscillation features and troubleshooting anomalies. The integration of domain process knowledge allows the method to utilize composite loadings and weights effectively, pinpointing abnormal variables associated with
features present across multiple latent variables. The validation of DELFA’s troubleshooting efficacy through an industrial case study confirms its practical utility and showcases its potential
for broader application in similar industrial settings.
104
Furthermore, the dissertation presents a fault detection and condition monitoring approach
using an LSTM encoder-decoder model with regularized latent dynamics. The regularization
during model training ensures that the hidden states of the LSTM accurately represent a lowdimensional latent space, capturing the main dynamics of the original high-dimensional data.
The application of reversible instance normalization minimizes the impact of data distribution
discrepancies caused by normal operational adjustments within feedback control systems. The
proposed monitoring indices, derived from this model, distinguish between faults and normal operational adjustments, such as control setpoint changes and disturbance rejection within feedback
control systems, thereby reducing false alarms and enhancing the reliability of the monitoring
system. The effectiveness of the proposed approach is demonstrated using a synthetic dataset,
benchmark data from the Tennessee Eastman process, and a real-world dataset from a geothermal
power plant.
Lastly, the third application addresses challenges in estimating and predicting the behavior
of geological CO2 storage systems using a tailored spatio-temporal neural network model. This
model adeptly integrates sparse and infrequent data into a concise latent space, effectively capturing the essential dynamics of CO2 plume migration. The model integrates both global and localized field measurements, capturing the spatio-temporal dynamics of CO2 plume migration using
multiple CNN spatial encoders. The processor model within CO2-4DNet ensures spatio-temporal
dynamics within the encoded latent space, while the spatial decoder facilitates data space reconstruction and prediction. The effectiveness of this framework is tested using simulated datasets
from a 3D CO2 storage site, with performance comparisons indicating better accuracy compared
to previous neural network-based proxy models requiring geological information. Moreover, the
105
model’s generalization performance across additional scenarios further substantiates the model’s
robustness and practical applicability.
106
Bibliography
[1] Anam Abid, Muhammad Tahir Khan, and Javaid Iqbal. “A review on fault detection and
diagnosis techniques: basics and beyond”. In: Artificial Intelligence Review 54.5 (2021),
pp. 3639–3664.
[2] Temitope Ajayi, Jorge Salgado Gomes, and Achinta Bera. “A review of CO 2 storage in
geological formations emphasizing modeling, monitoring and capacity estimation
approaches”. In: Petroleum Science 16 (2019), pp. 1028–1063.
[3] JB Ajo-Franklin, J Peterson, J Doetsch, and TM Daley. “High-resolution characterization
of a CO2 plume using crosswell seismic tomography: Cranfield, MS, USA”. In:
International Journal of Greenhouse Gas Control 18 (2013), pp. 497–509.
[4] Jinwon An and Sungzoon Cho. “Variational autoencoder based anomaly detection using
reconstruction probability”. In: Special lecture on IE 2.1 (2015), pp. 1–18.
[5] Reza Azad, Ehsan Khodapanah Aghdam, Amelie Rauland, Yiwei Jia,
Atlas Haddadi Avval, Afshin Bozorgpour, Sanaz Karimijafarbigloo, Joseph Paul Cohen,
Ehsan Adeli, and Dorit Merhof. “Medical image segmentation review: The success of
u-net”. In: arXiv preprint arXiv:2211.14830 (2022).
[6] Anna Rosaria Boccella, Piera Centobelli, Roberto Cerchione, Teresa Murino, and
Ralph Riedel. “Evaluating centralized and heterarchical control of smart manufacturing
systems in the era of Industry 4.0”. In: Applied Sciences 10.3 (2020), p. 755.
[7] G.E.P. Box. “Some Theorems on Quadratic Forms Applied in the Study of Analysis of
Variance Problems, I. Effect of Inequality of Variance in the One-way Classification”. In:
Ann. Math. Statistics 25 (1954), pp. 290–302.
[8] Richard D. Braatz. Tennessee Eastman Problem Simulation Data. url:
http://web.mit.edu/braatzgroup/links.html.
107
[9] Junghui Chen and Chien-Mao Liao. “Dynamic process fault monitoring based on neural
network and PCA”. In: Journal of Process Control - J PROCESS CONTROL 12 (Feb. 2002),
pp. 277–289. doi: 10.1016/S0959-1524(01)00027-0.
[10] Feifan Cheng, Q. Peter He, and Jinsong Zhao. “A novel process monitoring approach
based on variational recurrent autoencoder”. In: Computers & Chemical Engineering
129 (2019), p. 106515. doi: 10.1016/j.compchemeng.2019.106515.
[11] L.H. Chiang, E.L. Russell, and R.D. Braatz. Fault Detection and Diagnosis in Industrial
Systems. Advanced Textbooks in Control and Signal Processing. London, Great Britain:
Springer-Verlag, 2001.
[12] M.A.A.S. Choudhury, S.L. Shah, and N.F. Thornhill. “Diagnosis of poor control-loop
performance using higher-order statistics”. In: Automatica 40 (2004), pp. 1719–1728.
[13] A. Cinar, A. Palazoglu, and F. Kayihan. Chemical Process Performance Evaluation. Boca
Raton, FL: Taylor & Francis CRC Press, 2007. isbn: 0-8493-3806-9.
[14] CMG. GEM user manual. 2019.
[15] Xiaohan Ding, Xiangyu Zhang, Jungong Han, and Guiguang Ding. “Scaling up your
kernels to 31x31: Revisiting large kernel design in cnns”. In: Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition. 2022, pp. 11963–11975.
[16] Dragan Djurdjanovic, Laine Mears, Farbod Akhavan Niaki, Asad Ul Haq, and Lin Li.
“State of the art review on process, system, and operations control in modern
manufacturing”. In: Journal of Manufacturing Science and Engineering 140.6 (2018),
p. 061010.
[17] Yining Dong, Yingxiang Liu, and S Joe Qin. “Efficient Dynamic Latent Variable Analysis
for High-Dimensional Time Series Data”. In: IEEE Transactions on Industrial Informatics
16.6 (2020), pp. 4068–4076.
[18] Yining Dong and S Joe Qin. “A novel dynamic PCA algorithm for dynamic data
modeling and process monitoring”. In: Journal of Process Control 67 (2018), pp. 1–11.
[19] Yining Dong and S Joe Qin. “Dynamic latent variable analytics for process operations
and control”. In: Computers & Chemical Engineering 114 (2018), pp. 69–80.
[20] Yining Dong and S. Joe Qin. “Dynamic-Inner Canonical Correlation and Causality
Analysis for High Dimensional Time Series Data”. In: IFAC-PapersOnLine 51.18 (2018),
pp. 476–481.
[21] Yining Dong and S. Joe Qin. “New Dynamic Predictive Monitoring Schemes Based on
Dynamic Latent Variable Models”. In: Industrial & Engineering Chemistry Research 59.6
(2020), pp. 2353–2365.
108
[22] J J Downs and E F Vogel. “A plant-wide industrial process control problem”. In:
Computers & Chemical Engineering 17.3 (1993), pp. 245–255.
[23] Yao Duan, Chuanchuan Yang, Hao Chen, Weizhen Yan, and Hongbin Li.
“Low-complexity point cloud denoising for Lidar by PCA-based Dimension Reduction”.
In: Optics Communications 482 (2021), p. 126567. doi: 10.1016/j.optcom.2020.126567.
[24] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. “Sigmoid-weighted linear units for neural
network function approximation in reinforcement learning”. In: Neural networks 107
(2018), pp. 3–11.
[25] Deniz Ender. “Process Control Performance : Not as Good as You Think”. In: Control
Engineering 24 (1993).
[26] Ming Fan, Dan Lu, and Siyan Liu. “A deep learning-based direct forecasting of CO2
plume migration”. In: Geoenergy Science and Engineering 221 (2023), p. 211363.
[27] Shihang Feng, Xitong Zhang, Brendt Wohlberg, Neill P Symons, and Youzuo Lin.
“Connect the Dots: In Situ 4-D Seismic Monitoring of CO 2 Storage With
Spatio-Temporal CNNs”. In: IEEE Transactions on Geoscience and Remote Sensing 60
(2021), pp. 1–16.
[28] Zhangyang Gao, Cheng Tan, Lirong Wu, and Stan Z Li. “Simvp: Simpler yet better video
prediction”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2022, pp. 3170–3180.
[29] Zhiqiang Ge. “Review on data-driven modeling and monitoring for plant-wide industrial
processes”. In: Chemometrics and Intelligent Laboratory Systems 171 (2017), pp. 16–25.
doi: 10.1016/j.chemolab.2017.09.021.
[30] Raoof Gholami, Arshad Raza, and Stefan Iglauer. “Leakage risk assessment of a CO2
storage site: A review”. In: Earth-Science Reviews 223 (2021), p. 103849.
[31] Vincent Le Guen and Nicolas Thome. “Disentangling physical dynamics from unknown
factors for unsupervised video prediction”. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2020, pp. 11474–11484.
[32] Yunhui Guo, Yandong Li, Liqiang Wang, and Tajana Rosing. “Depthwise convolution is
all you need for learning multiple visual domains”. In: Proceedings of the AAAI
Conference on Artificial Intelligence. Vol. 33. 01. 2019, pp. 8368–8375.
[33] Thomas J Harris, CT Seppala, and LD Desborough. “A review of performance
monitoring and assessment techniques for univariate and multivariate control systems”.
In: Journal of process control 9.1 (1999), pp. 1–17.
109
[34] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural
Computation 9.8 (1997), pp. 1735–1780. doi: 10.1162/neco.1997.9.8.1735.
[35] H. Hotelling. “Analysis of a complex of statistical variables into principal components.
Journal of Educational Psychology”. In: Journal of Educational Psychology 24 (1933),
pp. 417–441.
[36] M. Jelali and B. Huang. Detection and Diagnosis of Stiction in Control Loops: State of the
Art and Advanced Methods. Springer, 2010.
[37] Charles Jenkins, Andy Chadwick, and Susan D Hovorka. “The state of the art in
monitoring and verification—ten years on”. In: International Journal of Greenhouse Gas
Control 40 (2015), pp. 312–349.
[38] Xin Ju, François P Hamon, Gege Wen, Rayan Kanfar, Mauricio Araya-Polo, and
Hamdi A Tchelepi. “Learning CO _2 plume migration in faulted reservoirs with Graph
Neural Networks”. In: arXiv preprint arXiv:2306.09648 (2023).
[39] Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo.
“Reversible instance normalization for accurate time-series forecasting against
distribution shift”. In: International Conference on Learning Representations. 2021.
[40] Bei Li and Yunyue Elita Li. “Neural network-based CO2 interpretation from 4D sleipner
seismic images”. In: Journal of Geophysical Research: Solid Earth 126.12 (2021),
e2021JB022524.
[41] Tian Liu, Gill Hetz, Hongquan Chen, and Akhil Datta-Gupta. “Integration of time-lapse
seismic data using the onset time approach: The impact of seismic survey frequency”. In:
SPE Annual Technical Conference and Exhibition? SPE. 2019, D011S008R001.
[42] Ildar Lomov, Mark Lyubimov, Ilya Makarov, and Leonid E. Zhukov. “Fault detection in
Tennessee Eastman process with temporal deep learning models”. In: Journal of
Industrial Information Integration 23 (2021), p. 100216. doi: 10.1016/j.jii.2021.100216.
[43] P.R. Lyman and C. Georgakis. “Plant-wide control of the Tennessee Eastman problem”.
In: Computers & Chemical Engineering 19.3 (1995), pp. 321–331.
[44] Ji Ma and Yuyu Yuan. “Dimension reduction of image deep feature using PCA”. In:
Journal of Visual Communication and Image Representation 63 (2019), p. 102578. doi:
10.1016/j.jvcir.2019.102578.
[45] Jin-Feng Ma, Lin Li, Hao-Fan Wang, Ming-You Tan, Shi-Ling Cui, Yun-Yin Zhang,
Zhi-Peng Qu, Ling-Yun Jia, and Shu-Hai Zhang. “Geophysical monitoring technology
for CO2 sequestration”. In: Applied Geophysics 13.2 (2016), pp. 288–306.
110
[46] J. F. MacGregor, C. Jaeckle, C. Kiparissides, and M. Koutoudi. “Process Monitoring and
Diagnosis by Multiblock PLS Methods”. In: AIChE Journal 40 (1994), pp. 826–838.
[47] Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig,
Puneet Agarwal, and Gautam Shroff. “LSTM-based encoder-decoder for multi-sensor
anomaly detection”. In: arXiv preprint arXiv:1607.00148 (2016).
[48] Roshan Joy Martis, U. Rajendra Acharya, and Lim Choo Min. “ECG beat classification
using PCA, Lda, ICA and discrete wavelet transform”. In: Biomedical Signal Processing
and Control 8.5 (2013), pp. 437–448. doi: 10.1016/j.bspc.2013.01.005.
[49] T. J. McAvoy and N. Ye. “Base control for the Tennessee Eastman problem”. In:
Computers & Chemical Engineering 18.5 (1994), pp. 383–413.
[50] Norazwan Md Nor, Che Rosmani Che Hassan, and Mohd Azlan Hussain. “A review of
data-driven fault detection and diagnosis methods: Applications in chemical process
systems”. In: Reviews in Chemical Engineering 36.4 (2020), pp. 513–553.
[51] Alicia Millinger. “Five steps to select predictive analytics software”. In: Control
Engineering (July 2021).
[52] Ke Mu, Lin Luo, Qiao Wang, and Fushun Mao. “Industrial process monitoring and fault
diagnosis based on temporal attention augmented deep network”. In: Journal of
Information Processing Systems 17.2 (2021), pp. 242–252.
[53] Masahiro Nagao, Changqing Yao, Tsubasa Onishi, Hongquan Chen, and
Akhil Datta-Gupta. “An Efficient Deep Learning-Based Workflow for CO2 Plume
Imaging With Distributed Pressure and Temperature Measurements”. In: SPE Journal
(2023), pp. 1–15.
[54] Pangun Park, Piergiuseppe Di Marco, Hyejeon Shin, and Junseong Bang. “Fault
detection and diagnosis using combined autoencoder and long short-term memory
network”. In: Sensors 19.21 (2019), p. 4612.
[55] Seong Hyeon Park, ByeongDo Kim, Chang Mook Kang, Chung Choo Chung, and
Jun Won Choi. “Sequence-to-sequence prediction of vehicle trajectory via LSTM
encoder-decoder architecture”. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018).
doi: 10.1109/ivs.2018.8500658.
[56] You-Jin Park, Shu-Kai S Fan, and Chia-Yu Hsu. “A review on fault detection and process
diagnostics in industrial processes”. In: Processes 8.9 (2020), p. 1123.
[57] K. Pearson. “On lines and planes of closest fit to systems of points in space”. In: Phil.
Mag. 2 (1901), pp. 559–572.
111
[58] Matthias Preisig and Jean H Prévost. “Coupled multi-phase thermo-poromechanical
effects. Case study: CO2 injection at In Salah, Algeria”. In: International Journal of
Greenhouse Gas Control 5.4 (2011), pp. 1055–1064.
[59] S. J. Qin. “Statistical process monitoring: Basics and Beyond”. In: J. of Chemometrics 17
(2003), pp. 480–502.
[60] S. Joe Qin. “Survey on data-driven industrial process monitoring and diagnosis”. In:
Annual Reviews in Control 36.2 (2012), pp. 220–234.
[61] S. Joe Qin and Leo H. Chiang. “Advances and opportunities in machine learning for
process data analytics”. In: Computers & Chemical Engineering 126 (2019), pp. 465–473.
[62] S. Joe Qin, Yining Dong, Qinqin Zhu, Jin Wang, and Qiang Liu. “Bridging Systems
Theory and Data Science: A Unifying Review of Dynamic Latent Variable Analytics and
Process Monitoring”. In: Annual Reviews in Control 50 (2020), pp. 29–48.
[63] S. Joe Qin and Yingying Zheng. “Quality-relevant and process-relevant fault monitoring
with concurrent projection to latent structures”. In: AIChE Journal 59.2 (2013),
pp. 496–504.
[64] Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and
Garrison W. Cottrell. “A dual-stage attention-based recurrent neural network for time
series prediction”. In: Proceedings of the Twenty-Sixth International Joint Conference on
Artificial Intelligence (2017). doi: 10.24963/ijcai.2017/366.
[65] Mark Rafferty, Xueqin Liu, David M. Laverty, and Sean McLoone. “Real-time multiple
event detection and classification using moving window PCA”. In: IEEE Transactions on
Smart Grid 7.5 (2016), pp. 2537–2548. doi: 10.1109/tsg.2016.2559444.
[66] Christopher Reinartz, Murat Kulahci, and Ole Ravn. “An extended Tennessee Eastman
simulation dataset for fault-detection and decision support systems”. In: Computers &
Chemical Engineering 149 (2021), p. 107281.
[67] Marco Reis and Geert Gins. “Industrial process monitoring in the Big Data/Industry 4.0
ERA: From detection, to diagnosis, to prognosis”. In: Processes 5.4 (2017), p. 35. doi:
10.3390/pr5030035.
[68] N. Lawrence Ricker. “Decentralized control of the Tennessee Eastman Challenge
Process”. In: Journal of Process Control 6.4 (1996), pp. 205–221.
[69] Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui,
Alexander Binder, Emmanuel Müller, and Marius Kloft. “Deep one-class classification”.
In: International conference on machine learning. PMLR. 2018, pp. 4393–4402.
112
[70] Evan L. Russell, Leo H. Chiang, and Richard D. Braatz. “Fault detection in industrial
processes using canonical variate analysis and dynamic principal component analysis”.
In: Chemometrics and Intelligent Laboratory Systems 51.1 (2000), pp. 81–93.
[71] Chao Shang, Biao Huang, Fan Yang, and Dexian Huang. “Slow feature analysis for
monitoring and diagnosis of control performance”. In: Journal of Process Control 39
(2016), pp. 21–34.
[72] Chao Shang, Fan Yang, Xinqing Gao, Xiaolin Huang, Johan AK Suykens, and
Dexian Huang. “Concurrent monitoring of operating condition deviations and process
dynamics anomalies with slow feature analysis”. In: AIChE Journal 61.11 (2015),
pp. 3666–3682.
[73] Hanlin Sheng, Xinming Wu, Xiaoming Sun, and Long Wu. “Deep learning for
characterizing CO2 migration in time-lapse seismic images”. In: Fuel 336 (2023),
p. 126806.
[74] Parisa Shokouhi, Vikas Kumar, Sumedha Prathipati, Seyyed A Hosseini, Clyde Lee Giles,
and Daniel Kifer. “Physics-informed deep learning for prediction of CO2 storage site
response”. In: Journal of Contaminant Hydrology 241 (2021), p. 103835.
[75] M Adeel Sohal, Yann Le Gallo, Pascal Audigane, J Carlos De Dios, and Sean P Rigby.
“Effect of geological heterogeneities on reservoir storage capacity and migration of CO2
plume in a deep saline fractured carbonate aquifer”. In: International Journal of
Greenhouse Gas Control 108 (2021), p. 103306.
[76] Pengyu Song and Chunhui Zhao. “Slow down to go better: A survey on slow feature
analysis”. In: IEEE Transactions on Neural Networks and Learning Systems (2022).
[77] Pengyu Song, Chunhui Zhao, and Biao Huang. “SFNet: A slow feature extraction
network for parallel linear and nonlinear dynamic process monitoring”. In:
Neurocomputing 488 (2022), pp. 359–380.
[78] Kevin D. Starr, Heiko Petersen, and Margret Bauer. “Control Loop Performance
Monitoring – ABB’s experience over two decades”. In: IFAC-PapersOnLine 49.7 (2016),
pp. 526–532. doi: 10.1016/j.ifacol.2016.07.396.
[79] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence learning with
neural networks”. In: Advances in neural information processing systems 27 (2014).
[80] Cheng Tan, Zhangyang Gao, Siyuan Li, and Stan Z Li. “Simvp: Towards simple yet
powerful spatiotemporal predictive learning”. In: arXiv preprint arXiv:2211.12509 (2022).
[81] Meng Tang, Xin Ju, and Louis J Durlofsky. “Deep-learning-based coupled
flow-geomechanics surrogate model for CO2 sequestration”. In: International Journal of
Greenhouse Gas Control 118 (2022), p. 103692.
113
[82] Syed Ali Ammar Taqvi, Haslinda Zabiri, Lemma Dendena Tufa, Fahim Uddin,
Syeda Anmol Fatima, and Abdulhalim Shah Maulud. “A review on data-driven learning
approaches for fault detection and diagnosis in chemical processes”. In: ChemBioEng
Reviews 8.3 (2021), pp. 239–259.
[83] N.F. Thornhill, J. Cox, and M. Paulonis. “Diagnosis of plant-wide oscillation through
data-driven analysis and process understanding”. In: Control Engineering Practice 11
(2003), pp. 1481–1490.
[84] N.F. Thornhill and T. Hagglund. “Detection and diagnosis of oscillation in control
loops”. In: Control Eng. Practice 5 (1997), pp. 1343–1354.
[85] Nola D. Tracy, John C. Young, and Robert L. Mason. “Multivariate Control Charts for
Individual Observations”. In: Journal of Quality Technology 24.2 (1992), pp. 88–95. doi:
10.1080/00224065.1992.12015232.
[86] Evan Schankee Um, David Alumbaugh, Youzuo Lin, and Shihang Feng. “Real-time
deep-learning inversion of seismic full waveform data for CO2 saturation and
uncertainty in geological carbon storage monitoring”. In: Geophysical Prospecting (2022).
[87] Yunbo Wang, Zhifeng Gao, Mingsheng Long, Jianmin Wang, and S Yu Philip.
“Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal
predictive learning”. In: International Conference on Machine Learning. PMLR. 2018,
pp. 5123–5132.
[88] Zhumei Wang, Xing Su, and Zhiming Ding. “Long-term traffic prediction based on
LSTM encoder-decoder architecture”. In: IEEE Transactions on Intelligent Transportation
Systems 22.10 (2021), pp. 6561–6571. doi: 10.1109/tits.2020.2995546.
[89] Maria Weese, Waldyn Martinez, Fadel M. Megahed, and L. Allison Jones-Farmer.
“Statistical learning methods applied to process monitoring: An overview and
perspective”. In: Journal of Quality Technology 48.1 (2016), pp. 4–24. doi:
10.1080/00224065.2016.11918148.
[90] Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and
Sally M Benson. “U-FNO—An enhanced Fourier neural operator-based deep-learning
model for multiphase flow”. In: Advances in Water Resources 163 (2022), p. 104180.
[91] Gege Wen, Zongyi Li, Qirui Long, Kamyar Azizzadenesheli, Anima Anandkumar, and
Sally M Benson. “Real-time high-resolution CO 2 geological storage prediction using
nested Fourier neural operators”. In: Energy & Environmental Science 16.4 (2023),
pp. 1732–1741.
[92] Gege Wen, Meng Tang, and Sally M Benson. “Towards a predictor for CO2 plume
migration using deep neural networks”. In: International Journal of Greenhouse Gas
Control 105 (2021), p. 103223.
114
[93] Hao Wu and Jinsong Zhao. “Deep convolutional neural network model based Chemical
Process Fault Diagnosis”. In: Computers & Chemical Engineering 115 (2018),
pp. 185–197. doi: 10.1016/j.compchemeng.2018.04.009.
[94] Yuxin Wu and Kaiming He. “Group normalization”. In: Proceedings of the European
conference on computer vision (ECCV). 2018, pp. 3–19.
[95] Bicheng Yan, Bailian Chen, Dylan Robert Harp, Wei Jia, and Rajesh J Pawar. “A robust
deep learning workflow to predict multiphase flow behavior during geological CO2
sequestration injection and Post-Injection periods”. In: Journal of Hydrology 607 (2022),
p. 127542.
[96] Wenhao Yan, Jing Wang, Shan Lu, Meng Zhou, and Xin Peng. “A review of real-time
fault diagnosis methods for industrial smart manufacturing”. In: Processes 11.2 (2023),
p. 369.
[97] Xianjin Yang, Xiao Chen, and Megan M Smith. “Deep learning inversion of gravity data
for detection of CO2 plumes in overlying aquifers”. In: Journal of Applied Geophysics 196
(2022), p. 104507.
[98] Rongtian Ye, Fangyu Liu, and Liqiang Zhang. “3D depthwise convolution: Reducing
model parameters in 3D vision tasks”. In: Advances in Artificial Intelligence: 32nd
Canadian Conference on Artificial Intelligence, Canadian AI 2019, Kingston, ON, Canada,
May 28–31, 2019, Proceedings 32. Springer. 2019, pp. 186–199.
[99] Jianbo Yu, Xing Liu, and Lyujiangnan Ye. “Convolutional long short-term memory
Autoencoder-based feature learning for fault detection in Industrial Processes”. In: IEEE
Transactions on Instrumentation and Measurement 70 (2021), pp. 1–15. doi:
10.1109/tim.2020.3039614.
[100] Jianbo Yu and Chengyi Zhang. “Manifold regularized stacked autoencoders-based
feature learning for fault detection in Industrial Processes”. In: Journal of Process Control
92 (2020), pp. 119–136. doi: 10.1016/j.jprocont.2020.06.001.
[101] Wei Yu, Yichao Lu, Steve Easterbrook, and Sanja Fidler. “Efficient and
information-preserving future frame prediction and beyond”. In: (2020).
[102] Tao Yuan, Gang Li, Zhaohui Zhang, and S. Joe Qin. “Deep causal mining for plant-wide
oscillations with multilevel Granger causality analysis”. In: 2016 American Control
Conference (ACC). 2016, pp. 5056–5061.
[103] H. Henry Yue and S. Joe Qin. “Reconstruction-based fault identification using a
combined index”. In: Industrial & Engineering Chemistry Research 40.20 (2001),
pp. 4403–4414. doi: 10.1021/ie000141+.
115
[104] Ang Zhang, Xiaoyong Zhao, and Lei Wang. “CNN and LSTM based encoder-decoder for
anomaly detection in multivariate time series”. In: 2021 IEEE 5th Information
Technology,Networking,Electronic and Automation Control Conference (ITNEC) (2021).
doi: 10.1109/itnec52019.2021.9587207.
[105] Shuyuan Zhang, Kexin Bi, and Tong Qiu. “Bidirectional recurrent neural network-based
chemical process fault diagnosis”. In: Industrial & Engineering Chemistry Research
59.2 (2019), pp. 824–834. doi: 10.1021/acs.iecr.9b05885.
[106] Haitao Zhao, Shaoyuan Sun, and Bo Jin. “Sequential fault diagnosis based on LSTM
Neural Network”. In: IEEE Access 6 (2018), pp. 12929–12939. doi:
10.1109/access.2018.2794765.
[107] Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong,
Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. “Multivariate time-series anomaly
detection via graph attention network”. In: 2020 IEEE International Conference on Data
Mining (ICDM). IEEE. 2020, pp. 841–850.
[108] Fangning Zheng, Atefeh Jahandideh, Birendra Jha, and Behnam Jafarpour. “Geologic
CO2 storage optimization under geomechanical risk using coupled-physics models”. In:
International Journal of Greenhouse Gas Control 110 (2021), p. 103385.
[109] Qinqin Zhu, Qiang Liu, and S Joe Qin. “Concurrent monitoring and diagnosis of process
and quality faults with canonical correlation analysis”. In: IFAC-PapersOnLine 50.1
(2017), pp. 7999–8004.
[110] Qinqin Zhu and S Joe Qin. “Supervised diagnosis of quality and process faults with
canonical correlation analysis”. In: Industrial & Engineering Chemistry Research 58.26
(2019), pp. 11213–11223.
[111] Qinqin Zhu, S. Joe Qin, and Yining Dong. “Dynamic latent variable regression for
inferential sensor modeling and monitoring”. In: Computers & Chemical Engineering 137
(2020), p. 106809.
116
Abstract (if available)
Abstract
Data collected from modern industrial systems -- including sectors such as manufacturing, renewable energy generation, and carbon storage -- are inherently high-dimensional and dynamic. These datasets typically take the form of time series and spatio-temporal data, characterized by strong correlations among variables and autocorrelation in time. Consequently, despite the complexity and high dimensionality of these datasets, underlying patterns can be compactly represented by a set of latent variables that capture the dynamics of these systems in a reduced-dimensional latent space. This dissertation develops data-driven tools to extract and utilize these latent space dynamics in industrial systems, focusing on several key aspects: visualizing and interpreting high-dimensional data, monitoring conditions and detecting anomalies, and estimating and predicting system behavior.
In the first application, a workflow utilizing a linear dynamic latent variable model is proposed to visualize and interpret high-dimensional time-series data from chemical processes. The extracted dynamic latent features capture the dominant dynamics within the process data, facilitating the visualization of high-dimensional data and enhancing the understanding of complex process interactions. Furthermore, these dynamic latent variables are employed to identify and troubleshoot the root causes of abnormal plant-wide oscillations, with interpretations informed by model structure and domain knowledge.
In the second application, an LSTM encoder-decoder model with regularized latent dynamics is designed to monitor operating conditions and detect anomalies in industrial systems. Normalization modules are incorporated to address the non-stationary nature of the data. Additionally, regularizing the latent states during training enables the hidden states of the LSTM encoder-decoder to define a low-dimensional latent space, effectively capturing the primary dynamics of the original high-dimensional data. Furthermore, monitoring indices are proposed to identify faults that disrupt normal dynamics and static relationships. As a result, these indices accurately reflect the actual operating conditions of the system, reducing false alarms by effectively distinguishing between faults and normal operational adjustments.
In the third application, a spatio-temporal neural network model is proposed to learn the latent space dynamics in industrial applications involving high-dimensional 3D data with limited measurements. The model is specifically designed to estimate the dispersion and predict the migration paths of CO2 plumes in geological CO2 storage applications, which are characterized by spatially sparse and temporally infrequent data collection. The model continuously integrates various forms of field measurements into a concise latent space representation, capturing and learning the spatio-temporal dynamics of CO2 plume migration. As a result, the model is adept at handling diverse data inputs and managing varying measurement frequencies, which are critical for accurately estimating and predicting the behavior of 3D CO2 plumes in geological CO2 storage.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dynamic latent structured data analytics
PDF
Deep learning architectures for characterization and forecasting of fluid flow in subsurface systems
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
Optimization of CO2 storage efficiency under geomechanical risks using coupled flow-geomechanics-fracturing model
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
PDF
Dynamical representation learning for multiscale brain activity
PDF
Physics-aware graph networks for spatiotemporal physical systems
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Human motion data analysis and compression using graph based techniques
PDF
Deep learning for subsurface characterization and forecasting
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Incorporating aggregate feature statistics in structured dynamical models for human activity recognition
PDF
Analysis, design, and optimization of large-scale networks of dynamical systems
PDF
Data-driven learning for dynamical systems in biology
PDF
Learning logical abstractions from sequential data
PDF
A computational framework for diversity in ensembles of humans and machine systems
PDF
Deep learning models for temporal data in health care
PDF
Efficient control optimization in subsurface flow systems with machine learning surrogate models
PDF
Integration of multi-physics data into dynamic reservoir model with ensemble Kalman filter
PDF
Exploiting latent reliability information for classification tasks
Asset Metadata
Creator
Liu, Yingxiang
(author)
Core Title
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2024-08
Publication Date
07/16/2024
Defense Date
05/01/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
industrial systems,machine learning,neural network,process monitoring and anomaly detection,time series data
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Jafarpour, Behnam (
committee chair
), Nuzzo, Pierluigi (
committee member
), Ortega, Antonio (
committee member
)
Creator Email
lyxdydyydyy@gmail.com,yingxian@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113997P1X
Unique identifier
UC113997P1X
Identifier
etd-LiuYingxia-13242.pdf (filename)
Legacy Identifier
etd-LiuYingxia-13242
Document Type
Dissertation
Format
theses (aat)
Rights
Liu, Yingxiang
Internet Media Type
application/pdf
Type
texts
Source
20240716-usctheses-batch-1183
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
industrial systems
machine learning
neural network
process monitoring and anomaly detection
time series data