Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Robust real-time algorithms for processing data from oil and gas facilities
(USC Thesis Other)
Robust real-time algorithms for processing data from oil and gas facilities
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ROBUST REAL-TIME ALGORITHMS FOR
PROCESSING DATA FROM
OIL AND GAS FACILITIES
by
Alisha Deshpande
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(CHEMICAL ENGINEERING)
May 2017
Copyright 2017 Alisha Deshpande
ii
DEDICATION
To mom and dad.
iii
ACKNOWLEDGEMENTS
I would like to thank everyone who has helped me on the journey to my doctoral
degree. It would not have been possible without the support and inspiration of all the
mentors, colleagues, and friends that I have met during my years at USC.
First, I would like to express my gratitude towards my advisor, Dr. Joe Qin, who
welcomed me to his lab and whose knowledge and supervision have been critical during
my time here. I would also like to thank Dr. Qin for supporting my interests right from
the beginning and nominating me for the Chevron Fellowship, which would eventually
lead to an incredible collaboration with the company and the fascinating project that
forms this dissertation. Next I would like to thank all the members of my committee
particularly Dr. Iraj Ershaghi and Dr. Cyrus Shahabi for their guidance. I am also grateful
for their insights over the years as part of the Center for Smart Oilfield Technologies
(CiSoft). Thank you also to the Shahabi Group for their collaboration on the Advanced
Streaming Data Cleansing project.
I would like to express my immense gratitude towards Chevron and CiSoft for
their financial support during my graduate studies (through the Chevron Fellowship), all
of the data they provided for this project, as well as the extensive support system that
they created to helped me succeed. In particular, I would like to thank Dr. Lisa Brenskelle
for her mentorship and technical expertise and Brian Thigpen for the consistent support. I
am also grateful to various others at Chevron, including Keith Hall, Yvonne Bishop and
iv
Ana Jaquez for making my experiences in Houston so productive and for giving me the
exposure to both the industrial and the intellectual property/patent world.
The years at USC have been an incredible learning experience, and working with
the Qin Group has made the experience all the more enjoyable. Thank you to all the
members of the group, including Johnny Pan, Yining Dong, Qinqin Zhu, Gang Li, Zora
Zhang and Yuan Jin, for your friendship and your technical knowledge. I would also like
to express my appreciation towards the USC personnel who went above and beyond to
support me in my endeavors, specifically, Juli Legat, Andy Chen and Nicole Kerns.
Last, but certainly not least, I would like to thank my friends and family. To all
my friends at USC who went through all the ups and downs together with me, you all
rock! To my sister-in-law: thank you for your friendship and support. To my brother:
thank you for always cheering me on and for helping me keep perspective. To my father:
thank you for your constant encouragement and help and for introducing me to the world
of engineering. Finally to my mother: thank you for inspiring me. It is your love of
knowledge and incredible work ethic that led me here. This one’s for you, Dr. Deshpande.
v
TABLE OF CONTENTS
DEDICATION................................................................................................................... ii
ACKNOWLEDGEMENTS ............................................................................................ iii
LIST OF TABLES .......................................................................................................... vii
LIST OF FIGURES ....................................................................................................... viii
LIST OF ABBREVIATIONS ......................................................................................... xi
ABSTRACT ..................................................................................................................... xii
CHAPTER 1: INTRODUCTION .................................................................................... 1
1.1 Overview .............................................................................................................. 1
1.2 Process Data Analytics ......................................................................................... 3
1.3 What is Big Data? ................................................................................................ 5
1.4 Data-Driven Modelling ........................................................................................ 7
1.4.1 Univariate Process Monitoring and Analysis Methods ........................... 8
1.4.2 Multivariate Process Monitoring and Analysis Methods ....................... 10
1.5 Considerations for Oil and Gas Applications ..................................................... 11
1.6 Outline ................................................................................................................ 13
CHAPTER 2: APPLICATION OF THE WAVELET TRANSFORM FOR
UNIVARIATE DATA ANALYSIS ............................................................................... 15
2.1 Introduction ........................................................................................................ 15
2.2 The Wavelet Transform ..................................................................................... 16
2.2.1 Mallat’s Algorithm................................................................................. 17
2.3 Single Tag Cleansing ......................................................................................... 20
2.3.1 Data Cleansing ....................................................................................... 20
2.3.2 Fault Detection using the Wavelet Transform ....................................... 23
2.3.3 Reconstruction using Recursive Least Squares ..................................... 30
2.3.4 Challenges in Application to Oil and Gas Facilities .............................. 43
2.3.5 Complete Single Tag Data Cleansing Algorithm .................................. 47
2.3.6 Results .................................................................................................... 49
2.4 Automated Real-time Data Compression ........................................................... 54
2.4.1 Overview of Process Data Compression Techniques ............................ 54
2.4.2 Automated Real-time Wavelet-based Compression Algorithm ............ 56
2.4.3 Compression Results .............................................................................. 61
2.5 Summary ............................................................................................................ 73
2.5.1 Single Tag Cleansing ............................................................................. 73
2.5.2 Automated Real-Time Wavelet-based Compression ............................. 73
vi
CHAPTER 3: ROBUST DYNAMIC PRINCIPAL COMPONENT ANALYSIS FOR
MULTIVARIATE DATA ANALYSIS ......................................................................... 75
3.1 Introduction ........................................................................................................ 75
3.2 Principal Component Analysis ........................................................................... 76
3.2.1 Classical Principal Component Analysis ............................................... 76
3.2.2 Dynamic Principal Component Analysis ............................................... 80
3.3 Robust PCA Methods ......................................................................................... 81
3.3.1 Robust Estimators .................................................................................. 81
3.3.2 Probabilistic Methods ............................................................................ 83
3.4 Principal Component Pursuit ............................................................................. 85
3.5 Application of PCP to Robustify DPCA ............................................................ 90
3.5.1 A Sequential Method ............................................................................. 91
3.5.2 An Integrated Method ............................................................................ 92
3.6 Case Studies: Application to Process Data ........................................................ 93
3.6.1 Data Sources and Performance Evaluation ............................................ 93
3.6.2 Case Study 1 .......................................................................................... 96
3.6.3 Case Study 2 ........................................................................................ 104
3.7 Recursion and Implementation in DPCA ......................................................... 113
3.7.1 Recursive DPCA .................................................................................. 114
3.8 Preliminary Results for Recursion ................................................................... 116
3.9 Summary .......................................................................................................... 120
CHAPTER 4: EXTENSIONS FOR PROCESS FAULT DETECTION
APPLICATION............................................................................................................. 121
4.1 Introduction ...................................................................................................... 121
4.2 Process Faults vs. Sensor Faults ....................................................................... 122
4.3 The Tennessee-Eastman Process ...................................................................... 123
4.4 Results .............................................................................................................. 125
4.5 Summary .......................................................................................................... 132
CHAPTER 5: CONCLUSIONS .................................................................................. 133
BIBLIOGRAPHY ......................................................................................................... 135
APPENDIX: WORKING EXAMPLE OF NOVEL WAVELET COMPRESSION
ALGORITHM ............................................................................................................... 141
vii
LIST OF TABLES
Table 2.1: Original Wavelet Method with Evenly Sampled Data .................................... 44
Table 2.2: Suggested Strategy to Mitigate Uneven Time Sampling ................................. 44
Table 2.3: Pros and Cons of Repeated Value Strategy for Uneven Time Sampling ........ 44
Table 2.4: Numerical Results for Data Cleansing Results for Figure 2.16....................... 50
Table 3.1: Matrix Similarity Test...................................................................................... 96
Table 3.2: 4000 Samples for Training Data, 6000 for Testing, λ = 0.02 .......................... 97
Table 3.3: Same Number of Samples as Previous Test, λ = 0.022 ................................... 98
Table 3.4: Same as Test 1, but with 3000 Samples Training Data ................................... 98
Table A.1: Original Data for Working Example ............................................................ 141
Table A.2: Algorithm Steps with Working Example...................................................... 142
Table A.3: Compressed Data Stream from Working Example....................................... 146
viii
LIST OF FIGURES
Figure 1.1: Data from a Single Piece of Process Equipment .............................................. 3
Figure 1.2: Incoming Data Streams from a Sample Facility .............................................. 4
Figure 2.1: The Haar Wavelet ........................................................................................... 17
Figure 2.2: Mallat's Algorithm for the Decomposition of a Signal [17] ........................... 18
Figure 2.3: Hard and Soft Thresholds ............................................................................... 19
Figure 2.4: Outlier Detection Using Wavelet Transform ................................................. 25
Figure 2.5: Results for Real-time Wavelet Fault Detection .............................................. 26
Figure 2.6: Frozen Value Identification ............................................................................ 28
Figure 2.7: Linear Drift Detection .................................................................................... 28
Figure 2.8: Real-time Detection of Frozen Values and Drift............................................ 29
Figure 2.9: RLS with Forgetting Factor ............................................................................ 30
Figure 2.10: RLS with Forgetting Factor for Prediction................................................... 32
Figure 2.11: RLS Algorithm for Various Forgetting Factors ........................................... 37
Figure 2.12: RLS Algorithm for Various Numbers of Previous Values (1) ..................... 38
Figure 2.13: RLS Algorithm for Various Numbers of Previous Values (2) ..................... 39
Figure 2.14: RLS Algorithm for Various Initial Values of Inverse Covariance Matrix ... 40
Figure 2.15: Consecutive Errors - "Zoomed In" ............................................................... 42
Figure 2.16: Hopping Window Algorithm Visualization ................................................. 46
Figure 2.17: Rolling Window Algorithm Visualization ................................................... 46
Figure 2.18: Fault Detection and Data Cleansing Algorithm Applied to Process Data ... 51
ix
Figure 2.19: Reconstruction of a Fault Over Time using the RLS Model ........................ 52
Figure 2.20: Automated Real-Time Wavelet-based Compression Flowchart .................. 60
Figure 2.21: Data Compression to Two Levels Using Wavelet Transform ..................... 62
Figure 2.22: Relationship of Compression Ratio with Local Point Error......................... 63
Figure 2.23: Comparison of Compression with PI and Novel Algorithm ........................ 66
Figure 2.24: Compression of a Level Tag, Accuracy = 0.5%, Range = 35 in .................. 67
Figure 2.25: Compression of a Level Tag, Accuracy = 1%, Range = 35 in ..................... 68
Figure 2.26: Compression of a Pressure Tag, Accuracy = 0.25%, Range = 1500 psig .... 69
Figure 2.27: Compression of a Level Tag, Accuracy = 0.25%, Range = 1500 psig ........ 70
Figure 2.28: Compression of a Flow Tag, Accuracy = 0.5%, Range = 1800 BSPD ........ 71
Figure 2.29: Compression of a Flow Tag, Accuracy = 1%, Range = 1800 BSPD ........... 72
Figure 3.1: Effect of Lambda on the Detection and False Alarm Rates ......................... 100
Figure 3.2: Detection with Various DPCA/PCP Combinations ..................................... 101
Figure 3.3: Location and Magnitude of False Alarms .................................................... 102
Figure 3.4: Detection in Training Data after Cleansing with PCP ................................. 103
Figure 3.5: Raw Data for Case Study 2 .......................................................................... 105
Figure 3.6: Detection with Various DPCA/PCP Combinations ..................................... 106
Figure 3.7: Comparison in Reconstructions from PCP Methods to Raw Data ............... 108
Figure 3.8: Numerical Results for Integrated and Sequential PCP/DPCA Methods ...... 109
Figure 3.9: Raw Data: Outliers Added to Tags 2, 3, 5, 6, 7............................................ 110
Figure 3.10: Detection with Novel Robust DPCA (with Outliers) ................................. 111
Figure 3.11: Reconstruction of Data Tag with Outlier ................................................... 112
x
Figure 3.12: Numerical Results for PCP Methods - with Outliers ................................. 113
Figure 3.13: Fault Detection on Clean Data using Recursive DPCA ............................. 117
Figure 3.14: Fault Detection on Clean Data (Smaller Interval) using Recursive DPCA 118
Figure 3.15: Fault Detection in Faulty Data using Recursive DPCA ............................. 119
Figure 4.1: A Simplified Process Diagram of the Tennessee-Eastman Process ............. 124
Figure 4.2: DPCA Model for Correlated Variables in the Tennessee-Eastman Process 126
Figure 4.3: Detection of Sensor Faults Using DPCA Model Trained on Clean Data .... 128
Figure 4.4: Detection of Sensor Faults Using DPCA Model Trained on Faulty Data.... 129
Figure 4.5: Fault Detection in Tennessee-Eastman Process Using Robust DPCA ......... 130
Figure 4.6: Comparison between Raw Data and Robust DPCA Reconstruction ........... 131
Figure A.1: Comparison of Original Data with Archived Data for Working Example .. 147
xi
LIST OF ABBREVIATIONS
ALM Augmented Lagrange Multiplier
ADM Alternating Directions Method
BSPD Barrels of Steam Per Day
CR Compression Ratio
CUSUM Cumulative Sum
CVA Canonical Variate Analysis
DPCA Dynamic Principal Component Analysis
EWMA Exponentially Weighted Moving Average
LPE Local Point Error
MA Moving Average
OWT Online Wavelet Transform
PCA Principal Component Analysis
PCP Principal Component Pursuit
PLS Partial Least Squares
RDPCA Robust Dynamic Principal Component Analysis
RLS Recursive Least Squares
SVD Singular Value Decomposition
xii
ABSTRACT
Advancing sensor and data gathering technology has resulted in a substantial
increase in the amount and frequency of process data collected, creating a valuable
opportunity for data-driven modelling techniques in process monitoring. Since process
facilities are large interconnected systems, data collected may also be strongly correlated.
Therefore, this dissertation approaches the problem of data-driven modelling of process
data from two perspectives, first for univariate, uncorrelated data sets, and second for
multivariate, correlated data sets. Common methods for monitoring and analysis of
univariate data include the use of cumulative sums, moving average charts, interpolation
and other simple statistical methods. For correlated, multivariate data, methods such as
principal component analysis, partial least squares, and contribution plots are
conventionally used.
In this dissertation, the application of online wavelet transforms (OWT) to several
aspects of real-time analysis of univariate data was considered, namely, fault detection,
data cleansing and data compression. In the first application, the OWT is used to build a
model and detect faults in an uncorrelated tag (variable) which cannot use traditional
data-driven modelling that relies on underlying relationships between variables. The
OWT is successfully able to detect and identify various common sensor faults including
missing and frozen values, linear drift and spikes. Recursive least squares (RLS) is then
used to predict replacement values and is a significant improvement on current standards,
particularly with highly variable data. The new algorithm also updates statistics with each
xiii
new value, so decision making is accurate but does not require repeated calculations on
the whole data set. This application directly affects real-time decision making as faults
can be addressed immediately, and any predictive models built from the data will be
representative of real operation. The OWT is also adapted to create a novel data
compression algorithm to economically store large quantities of data in real-time. The
industry standard swinging door algorithm requires extensive manual input, in addition to
other drawbacks. The new method is an online filter that is completely automated, with
calculations unique to each tag.
There are numerous methods for building data-driven models for multivariate data
sets, including the commonly used principal component analysis (PCA) and its extension,
dynamic PCA (DPCA). However, there are several drawbacks to PCA and DPCA,
including the inability to handle any missing or corrupted data. Two new algorithms are
developed in this work by incorporating a robust PCA method called principal
component pursuit (PCP) into DPCA to create a robust DPCA method. PCP uses a
convex minimization problem to recover a fault-free matrix even when there are gross
sparse errors present, and is therefore more robust to outliers than other methods. The
PCP method is incorporated into DPCA in two ways. First, PCP and DPCA are combined
sequentially where PCP is used to generate a ‘fault free’ training data set as a
preprocessing step. In the second method, PCP is incorporated into DPCA to create an
integrated robust DPCA method. The two methods are compared to each other and to
traditional DPCA in two case studies using real process data. Both the methods present a
significant improvement for fault detection over DPCA models trained on faulty data.
xiv
Extension of this robust DPCA method to a recursive method is then discussed. In the
final part of the dissertation, the extension of these methods from sensor fault detection to
process fault detection is discussed and simulated data from the Tennessee-Eastman
process is used to show the method’s effectiveness.
1
CHAPTER 1 : INTRODUCTION
1.1 Overview
Vast amounts of data are generated in process facilities from sensors that record
information from all aspects of the process including flows, temperatures, levels etc.
Advances in sensor technology and improvements in data collection in the past few
decades have also substantially increased not only the type of information that can be
collected, but also its frequency. As a result, addressing the issues of effective data
storage and analysis is a recent challenge. The unprecedented access to detailed
information about process facilities also presents an opportunity for building data-driven
models, where a system is represented only by the information it has created instead of
using first-principles. This is particularly valuable for large-scale processes, where
models based on physical information can become very complicated, and require
expertise and time to construct. Furthermore, reliable data-driven models can lead to
online analysis of incoming data so that sensor and process faults can be detected and
managed. Investigating patterns in this data can also help prevent future problems.
Therefore, data-driven modelling is a subject that has gained particular interest in the oil
and gas industry, though it has classically been used and created for the computing and
technology industries.
In the rest of this chapter, the context and the background behind the data-driven
modelling techniques developed in this dissertation are discussed. The overall objective
2
of this research is to develop algorithms that can be seamlessly and practically integrated
into process data applications. The goal of these algorithms is to use incoming process
data as effectively as possible to optimize the operation of oil and gas facilities. In this
dissertation, several algorithms are developed and tested for a variety of tasks including
fault detection, data cleansing, compression, and reliable and robust model-building.
First, the field of process data analytics is introduced, along with its current
challenges. Then various methods of data-driven modelling currently in use are
presented, first for univariate data sets, then for multivariate data sets. For univariate data
sets, these include the use of basic statistics like interpolation as well as monitoring
techniques such as cumulative sum charts. For multivariate data sets, data-driven
modelling techniques such as partial least squares (PLS) and canonical variate analysis
(CVA) are briefly introduced prior to discussing principal component analysis (PCA), the
focus of the work in this dissertation. Additionally, potential issues specific to the
application of these modelling techniques to oil and gas facilities are presented. Finally
an outline of the complete dissertation is provided.
3
1.2 Process Data Analytics
Consider the single piece of process equipment shown in Figure 1.1: a heater.
This equipment has sensors that could be generating all of the information labeled in the
figure, in real time, all the time, sometimes even with frequencies of a new value
recorded every second or few seconds.
Figure 1.1: Data from a Single Piece of Process Equipment
If we extend this now to just a very small part of a facility, suddenly the volume
of data becomes much larger (Figure 1.2). Note that only basic information like
temperature, pressure and flow rate are labeled in Figure 1.1 and 1.2. The amount of
information that can be collected these days because of improvements in sensor
technology is incredible, however, it is also overwhelming. Once we have the data,
several important questions must be answered: how do we model the data? How do we
actually use this data before it ends up in the proverbial data graveyard? How do we use
it to understand the system and predict future events? Using this vast amount of data
4
effectively requires not only advanced analytical tools but also the computing technology
and capability to actually extract useful information from the data [1].
Figure 1.2: Incoming Data Streams from a Sample Facility
For processes with well understood mechanisms, such as those in process
facilities, physical relationships and thermodynamic principles are used to create models
of the systems. However, while this is straightforward for smaller systems, it requires
time and expertise to create and interpret these models on a large scale. Instead, data can
be used to track these processes in real-time and follow changes in the operation. These
data are often the only reliable source of information during uncertain or abnormal
situations [1].
5
The field of process data analytics provides the answer to the question: what if we
thought of a facility and the information collected within it not as a process problem, but
as a data problem? In this field, we either adapt current techniques or develop novel
mathematical methods for streaming data applications. Current ideas in machine learning,
statistics, computer science and many other fields can be considered. In recent times, “big
data” techniques have been developed and investigated, particularly in the technology
industry [2]. It is these techniques that may be best suited to adapt older process data
analytics methods to meet the data needs of today. The next section introduces the idea of
big data and why these methods are well suited to a process data application.
1.3 What is Big Data?
Big data eponymously refers to very large volumes of data that traditional
computing methods have difficulty in managing and analyzing. Process data, with
increasingly advanced collection strategies, better sensors, and increased computing
power, can be investigated as a big data problem. In fact, challenges faced in process data
analytics fall neatly under what are referred to as the four Vs of big data: volume, variety,
velocity and veracity [2]:
1. Volume
As shown in the previous section, better technology in process facilities has led to
the collection of massive data sets. Strategies must be developed to glean useful
information from this data (or find the ‘needle in the haystack’). Current process data
analytics techniques are hindered by their reliance on smaller known data sets that can be
6
used to create local/process specific models. Instead, the ability to use all of the incoming
data and historical data available (as is done in other applications) can lead to improved
decision making.
2. Variety
Big data consists of a wide variety of data. For example, a large-scale internet
company may retain a wide variety of data from their users (such as audio, video, text
etc.). Process data is analogous to this in the sense that data is collected from a wide
variety of sensors, equipment and processes. The data streams themselves may include
time series data, text, logs, lab data and so on. The capability of handling all of this data
at once could lead to an improved understanding of the overall process as it occurs in real
time.
3. Velocity:
Velocity refers to the effective use of big data as it is collected in real-time (likely
at varying frequencies). For example, instantly sorting through incoming data to intercept
fraud or imminent danger is a concern for security. Similarly, while data is constantly
being collected at process facilities, the ability to track and predict abnormal situations in
real-time is critical in maintaining the safety and productivity of the system.
4. Veracity:
Traditional data analysis and modelling requires the presence of ‘clean’ data that
can be used to reliably represent normal or expected scenarios. However, in the real
world this is not necessarily the case. Big data methods attempt to use data sets as-is, that
7
is, in spite of any errors or problems that may be present in the data, to generate models
that are still reliable. This is possible because the sheer volume of data can provide a
general understanding of the important features regardless of any imperfections that may
be present. In the process data application, collected data may have faults or errors due to
a variety of reasons including damaged sensors or missing values. Currently, this data is
ignored, and instead, analytical techniques require meticulously finding and using only
clean data to generate models. Approaching process data with big data techniques can
allow for reliable model-building regardless of the veracity of the data.
From the four Vs of big data and their direct applications to process data
analytics, it follows that we should then be using big data techniques to this process data
challenge.
1.4 Data-Driven Modelling
This section provides a brief review of statistical techniques currently being used
for the process data application. As with the rest of the dissertation, the discussion is
divided into two main topics by data type. First, univariate process monitoring and
analysis methods will be covered including various charts, the use of residuals and
autocorrelation. Univariate techniques, though not necessarily developed for the analysis
of uncorrelated data, will be applied to those variables in this work. Then, a discussion of
more advanced multivariate methods including PLS, CVA and PCA follows. These
methods will be applied to correlated data. A combination of the univariate and
multivariate data modelling techniques will be required to encompass all the collected
data in a process facility.
8
1.4.1 Univariate Process Monitoring and Analysis Methods
The performance of a process is often evaluated by tracking the changes in
important process variables [3]. Informally, this can be done be observing trends,
variations and abnormal behavior of these variables to collect information about the
overall process. That is not to say that these variables are uncorrelated with others, in
fact, in classical process monitoring, they may be key in understanding the behavior of
other parts of a process. However, the automated analysis of univariate data that is
uncorrelated depends on the ability to develop methods that solely consider the variable
being investigated, and therefore it is valuable to consider these classic techniques.
Process monitoring using univariate statistical methods all have a similar goal;
carry out a hypothesis test that determines if the mean and variance of the variable have
changed significantly on a continuous basis. Graphical procedures such as the Shewhart
chart, Cumulative Sum or CUSUM chart, moving average or MA chart, and
exponentially weighted moving average or EWMA chart are all examples of univariate
monitoring techniques. In Shewhart charts, the current measurement is compared to
control limits, while in CUSUM charts, all the recorded measurements are considered to
determine whether the current measurement is acceptable. Both the MA and EWMA
charts focus on recent measurements, with the former dropping old measurements and the
latter gradually weighing them down. Details on these charts and the specific hypothesis
tests associated with each can be found in Chemical Process Evaluation [3].
Additional univariate methods of particular interest to this application are those
that consider autocorrelation within the variable. In these cases, predicted values could
9
first be calculated for each new measurement and compared with the actual measurement
to get an error or ‘residual’. Changes in the system could also be monitored by using
recursive least squares (RLS) and altering model parameters. RLS is extremely valuable
in fault reconstruction, and is discussed in greater detail in Chapter 2.
Finally, wavelet transforms, like Fourier or Laplace transforms, are used to
convert a signal into coefficients that represent the data . These can be used to transform
large data sets however, in the basic form, they operate on one variable at a time. The
coefficients calculated from the wavelet transform can also provide valuable information
about the data as they represent the data in a different way. For example, wavelet
transforms have been applied for signal denoising since it has been shown that detail
coefficients of the wavelet transform represent noise and large values are outliers. These
properties make wavelet transforms a valuable tool for process data analysis of single
variables [4], [5].
While univariate techniques are simple, both in terms of computational
complexity and the interpretation of results, there are several limitations. As mentioned,
process facilities are integrated, connected systems and groups of variables can be
strongly correlated to one another. For example, a fault that is detected in one variable
may propagate through other variables. With univariate techniques, this would be
incorrectly detected as a process change, or there would be numerous false alarms with
no way of knowing where the problem actually originated. Simply put, these methods do
not allow the engineer or operator to see the big picture. Multivariate techniques can not
only allow the user to see the big picture, but improved computational capability and
10
better access to data can allow for a real-time thorough understanding of the process
while letting the computers do all of the calculations [3].
1.4.2 Multivariate Process Monitoring and Analysis Methods
There are numerous methods that are used for multivariate analysis of process
data, including the development of data-driven models and real-time process monitoring.
In this section, several of these methods are discussed, including regression methods,
CVA, contribution plots, PLS and finally, PCA.
Regression models can be built by developing a relationship between certain sets
of variables (for example, measurements), and other variables (for example, quality
variables). Numerous regression techniques exist, including basic linear regression, to
other forms such as nonlinear regression and stepwise regression, which handles collinear
data [3]. Canonical correlation analysis is a multivariate technique where a relationship is
developed between two sets of variables. The division of these variables is chosen based
on the purpose of the study/analysis. CVA can be considered as a more specific form of
this where one set of variables can be considered the indicator variables [6].
Multivariate methods, similar to univariate methods, also directly use process
measurements to determine whether a change or fault has occurred. Process monitoring
techniques use the T
2
and SPE statistic (discussed in Chapter 3), however, these statistics
do not indicate which variable caused the problem. Contribution plots use these statistics
but compare the contribution of each variable to the change, therefore allowing the
operator to find the source of the problem. The determination of variable contributions to
the T
2
statistic was developed in Macgregor and Miller [7], [8].
11
Another empirical model-building technique is PLS, which was developed for
chemometrics by Wold [9]. Similar to CVA, the goal is to extract the relationship
between two sets of variables, however, in this case, PLS is a latent (not directly
observed) variable model. The latent variables are selected so that the variation in the
process variables that is most predictive of the quality variables is extracted. It is a linear
modelling method though nonlinear extensions have been developed [3].
Finally, PCA is one of the most common methods of data analysis. It converts a
set of correlated observations into a set of linearly uncorrelated variables called principal
components (PCs) using an orthogonal transformation. It is also a dimensionality
reduction technique, so in addition to modelling data, it can represent data compactly
[10]. This method is further investigated and extended in the multivariate data analysis
section of this dissertation (Chapter 3, 4), where its specific advantages and
disadvantages are discussed.
1.5 Considerations for Oil and Gas Applications
There are several important considerations for developing data-driven modelling
techniques for applications in the oil and gas industry. As discussed earlier, oil and gas
facilities are classically modelled using the physical and thermodynamic relationships
within the system. The objective of using data-driven techniques is to require as little
knowledge of the process as possible. To achieve this, models must ideally be generated
without requiring any subjective inputs. Furthermore, it is important to build models that
can create clear, easily understandable results. This way, models can be easily
12
incorporated into operational systems and interpreted quickly by both engineers and
operators without requiring extensive knowledge of the physical process.
Data collection in process facilities can also present a challenge for data-driven
modelling. Extremely large volumes of time series data are collected at varying
frequencies. Therefore, challenges that arise from handling these data in real-time must
be addressed. The model-building process cannot be too computationally complex and
expensive, since the ultimate goal is to process data in real time. Data-driven models
must also be able to adapt and change over time, as most processes are dynamic, and a
stationary model would view the changes as faults, causing false alarms.
Another major challenge is that the incoming data can also have problems that
commonly occur in the collection of process data such as faulty and missing data that can
be caused by sensor faults. For example, malfunctioning sensors can cause frozen values
(where a reading will be stuck on a certain value), drift, outliers, and can even stop
reporting altogether for short periods of time. This type of data corruption results in
unreliable models, which will in turn result in incorrectly set alarms. False alarms are
tedious and operators will eventually learn to ignore and work around these. Clearly, a
model that is unreliable and cannot differentiate between normal and faulty data could
lead to dangerous results. Therefore, data-driven modelling techniques must be used
together with online data cleansing methods, which are used to detect and replace faults
in the data. Models built with data cleansing capabilities will ideally result in more
successful detection of faults and can be used to predict and prevent abnormal situations
which could result in safety issues and loss of production.
13
In process facilities, particularly in those related to oil production, the equipment
is designed to handle very high pressures and temperatures, and the ability to make
accurate, quick decisions is vital for maintaining safety and standards. While the sensor
faults that create data analysis challenges can be managed after they occur, they can be
considered as minor problems that do not require immediate attention in the physical
facility. However, process faults, where a problem occurs in the physical system, can at
the least cause process upsets and loss of production, and at the most, catastrophic failure
of the system. Therefore, another important extension of modelling process data is to be
able to differentiate between a sensor fault and a process fault, or to at least to diagnose
the fault to some extent.
1.6 Outline
This section details the organization of the rest of this dissertation. The first part
of the dissertation (Chapter 2) discusses novel ways of applying the wavelet transform for
the analysis of univariate data. The second part of the dissertation (Chapter 3 and 4)
discuss the application and development of methods using principal component analysis
and principal component pursuit for robust analysis of multivariate data. The work
presented in this dissertation is compiled from the works referred here [11], [12], [13],
[14], [15].
In Chapter 2, the mathematical background for the wavelet transform and its use
in data compression and cleansing is first introduced. Next, the development of the single
tag cleansing algorithm is discussed by dividing it into two parts: the fault detection
portion and the reconstruction portion. The online wavelet transform (OWT) is used for
14
the detection portion, while recursive least squares is used for the reconstruction. Specific
challenges within the oil and gas industry as they pertain to this algorithm are then
discussed. In the second part of the chapter, a new algorithm for automated wavelet-
based data compression is developed and tested.
In Chapter 3, the work moves to model building of multivariate data sets. The
step-by-step development of a new robust dynamic PCA method is outlined. First, a
mathematical introduction to PCA, along with a discussion of its advantages and
limitations is presented. Then a brief review of robust extensions to the original method is
provided. PCP, the method chosen in this work is then discussed. Two new algorithms
using PCP to robustify DPCA are presented. Results are shown using two cases studies,
both with real process data. In the final part of the chapter, the goal is to extend the robust
DPCA method to also be recursive. The already developed recursive PCA and its
extension to recursive DPCA are explored.
In Chapter 4, the possible extension of these multivariate methods from sensor
fault detection to process fault detection is explored. The method is tested using data
from the Tennessee-Eastman simulation.
Chapter 5 summarizes the conclusions from the dissertation and potential future
directions for the research work.
15
CHAPTER 2 : APPLICATION OF THE WAVELET TRANSFORM
FOR UNIVARIATE DATA ANALYSIS
2.1 Introduction
Chemical process data are vital for the control and operation of the process
facilities where they are collected. The ability to accurately and automatically cleanse and
compress data in real-time could potentially provide users with unprecedented access to a
process while using as few resources (including human hours and server space) as
possible. However, as discussed in Chapter 1, many of these data streams may not have
strong correlations with other variables, forcing the user to rely on univariate process
monitoring and analysis techniques. Several examples of commonly used univariate
techniques were discussed in Chapter 1.
In this chapter, the application of wavelet transforms to both online fault detection
and data compression is discussed, with new algorithms defined for each with a direct
application to real-time process data analysis. First, the wavelet transform is introduced
and background information is provided on wavelet decomposition, and why it is
advantageous to simpler univariate techniques for this application. Then, a novel
algorithm for real-time fault detection and data cleansing is described, and results are
provided using real process data, along with a brief discussion on the selection of initial
parameters along with specific challenges that may be faced in the application of this
algorithm for process data analysis. Next, a novel algorithm for automated real-time data
16
compression is presented, once again with results using actual process data. Finally, a
summary of the work is provided in the last section.
2.2 The Wavelet Transform
An overview of the wavelet transform is provided in this section. A wavelet is an
oscillation that starts at zero, increases and decreases back to zero. There are two
formulations, as in other data transforms: continuous and discrete. The mathematical
definition of the continuous wavelet transform is shown in equation (2.1). There are two
parameters: “a” is the scale or dilation parameter which corresponds to the frequency
information and “b” relates to the location of the wavelet function as it is shifted through
the signal, and thus corresponds to the time information.
𝑋 (𝑎 , 𝑏 ) =
1
√𝑎 ∫ 𝜑 �
𝑡 −𝑏 𝑎 �𝑥 (𝑡 )𝑑 𝑡 ∞
−∞
(2.1)
The discrete wavelet transform converts discrete signals into coefficients using
differences and sums instead of defining parameters. Various types of wavelets have been
developed and adapted to specific applications, however the analysis here uses the most
basic, the Haar wavelet which is shown in Figure 2.1. The equation for the Haar wavelet
is shown in equation (2.2). It is clear that applying equation (2.2) in the above equation
will result in a discrete function.
𝜑 (𝑡 ) = �
1, 0 ≤ 𝑡 ≤ 1/2
−1, ½ ≤ 𝑡 ≤ 1
0, 𝑜 𝑡 ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (2.2)
It is important to notice, however, that unlike the Fourier transform, which has
only one basis function, there are infinite possible choices for basis functions in wavelet
17
transforms. In addition, the wavelet transform differs from a Fourier transform as it is
localized in both time and frequency, while the latter is localized only in frequency [16].
Figure 2.1: The Haar Wavelet
2.2.1 Mallat’s Algorithm
A discrete method called Mallat’s algorithm is used to illustrate the effect of a one
dimensional wavelet transform and how it can be applied to data compression [17].
According to Mallat’s algorithm, data is decomposed using a high pass filter, which
results in a set of detail coefficients and a low pass filter, which results in a set of
approximation coefficients. The data can be further decomposed by down-sampling by a
factor of two and then further dividing the approximation coefficients into a new set of
detail and approximation coefficients. This process is outlined in Figure 2.2. In the figure,
18
G and H refer to the high and low frequency filters, respectively, d(n) are the detail
coefficients and ↓ 2 refers to the down-sampling step.
Figure 2.2: Mallat's Algorithm for the Decomposition of a Signal [17]
The idea here is to decompose the data, remove the noise and then reconstruct the
denoised data without losing trends and important outliers. The low frequency
approximation coefficients are not modified prior to reconstruction since they include the
“important information”. Conversely, the detail coefficients are composed of the noise in
the system. A threshold is determined for these, and detail coefficients with a magnitude
below a certain number are eliminated (this gets rid of noise but keeps significant
variations – like important outliers – so that the reconstructed data retains accuracy).
There are two types of thresholds (hard and soft) which use different methods of
determining whether data is kept or eliminated. In a hard threshold, data outside is kept
the same. Conversely, in a soft threshold, data outside the threshold is adjusted by
“shrinking” it by the amount of the threshold (see Figure 2.3).
19
Figure 2.3: Hard and Soft Thresholds
While decomposition of the signal to further steps can provide insight, valuable
information can be gleaned from the first level coefficients. First, it is known that a signal
can be perfectly reconstructed from its approximation and detail coefficients [18].
Further, the originally developed method for signal denoising using wavelet showed that
detail coefficients of the wavelet transform represent noise and large values can be
considered as outliers [5]. Finally, determining first level coefficients using a Haar
wavelet requires at the minimum only two data points, which allows for the application
of the wavelet transform to real-time applications (often referred to as the online wavelet
transform (OWT)). Therefore, the methods developed and described in this dissertation
exploit the use of these first level wavelet coefficients in the application of real-time
decision making for both fault detection and online data compression.
20
2.3 Single Tag Cleansing
2.3.1 Data Cleansing
Prior to analyzing and storing data, it must first be cleansed. Data cleansing refers
to developing methods that identify faults in the data and either eliminate them or
calculate appropriate replacements for them based on historical data (as well as expected
correlations between variables if dealing with a multivariate case). Faults, or variations
from the accepted or expected range of data, can arise from sensor failures such as
freezing at a certain value, biases and drifts, or missing values and outliers [19]. Methods
that identify sensor faults can also facilitate the building of models that identify if future
incidents are the result of process upsets or sensor faults.
In the field of process monitoring, correlations between various variables are used
to develop fault detection and diagnosis techniques. However, the variables that correlate
must first be found through a clustering procedure. While clustering is very effective,
there may be some “lone” variables that show no strong correlation to any other variable.
In this case, single tag cleansing must be applied separately to detect and replace faults in
the data. Common methods for the cleansing of univariate data, or “single tag cleansing”,
as it is referred to here on in, involve the use of thresholds to detect outliers which are
then replaced by means, moving averages and so on.
There are several commonly used methods in the field of fault detection of
univariate data in chemical facilities [20]. Briefly discussed in Chapter 1, these include
various charts, simple regression, autocorrelation and the use of residuals. In the
21
presented method, the properties of the coefficients from the wavelet transform are
exploited to detect, and in some cases, diagnose faults in single variables.
The reconstruction of data after the faults have been detected can be carried out
by using an adaptive filter like recursive least squares. Recursive least squares (RLS) is
an algorithm that recursively minimizes a weighted cost function that relates to the input.
This algorithm can be adapted to include a forgetting factor (between 0 and 1) that is used
to determine how much the previous information should be weighted; a very small
forgetting factor results in the retention of only the most recent samples. Finally, since
the application is specific towards single tag cleansing, the RLS algorithm is written in an
autoregressive manner, that is, only the relationships between previous samples in a
single variable are used. This application of RLS is inherently an “online” application as
it only considers previous data, and therefore, does not need to be modified in the same
way as the wavelet-based fault detection (which needs to be modified from a semi-batch
to real-time methodology).
In this section, a novel method of single tag cleansing is described and
demonstrated. The single tag cleansing is carried out in two steps. First, the wavelet
transform is used to detect, and in some cases, identify, the faults in a single process
variable. Then, an autoregressive form of the recursive least squares algorithm with a
forgetting factor is applied for data reconstruction. While the two methods are well
known separately, they have not previously been combined in the context of process data
cleansing.
22
Both the error detection and reconstruction parts of the algorithm can be
compared with simple operators that are currently in use. For error detection, this
algorithm is compared with simple operators such as the setting of thresholds with no
data transformation, searching for “0” values as well as using the variance to determine
flat-lines. While the wavelet algorithm proves to be equally as effective as the simple
operators, its major benefit is that is uses fewer user-defined parameters and can detect
multiple types of faults using one of two methods. For example, outliers, spikes, null
values and missing values are all detected using one single method and only one user
defined parameter, the error threshold. Linear drifts and flat-lines can also be detected by
a slightly different application of wavelets in the same way, although they require the use
of slightly larger windows of data (more on this later). This can also be extended to semi-
batch and online implementations. The real-time detection of faults using this particular
wavelet-based method has not been explored previously, to our knowledge.
The use of RLS with a forgetting factor for reconstruction effectively replaces all
the simple operators (Substitute Fixed Value, Substitute Last Good Value, Interpolate,
Substitute Mean). The user defines the number of previous values to use in the RLS
model, as well as the importance that must be placed on the previous data via the
forgetting factor (e.g. if the data is highly non-stationary, focus on the recent values
only). There is clearly an ideal combination of the two parameters (which can be seen
when the reconstruction of an artificially created fault is compared to the actual value),
but the optimization of the two parameters in a general setting is not straightforward. It
23
can be noted that the use of only two previous values while forgetting nearly all the data
results in a simple linear extrapolation.
2.3.2 Fault Detection using the Wavelet Transform
In this method, a Haar wavelet transform is applied to incoming process data to
determine approximation and detail coefficients. The use of wavelet coefficients in signal
denoising has been well developed and used in many applications, including image
analysis, physiological sound analysis and meteorological data [21]. In these methods,
denoising is carried out by setting a threshold on the detail coefficients; small values are
considered noise and large values as outliers. Further, the wavelet transform also retains
the time relationship of the data and thus can be used to find time dependent patterns in
data. The ability to use the coefficients to detect outlies and time dependent patterns can
then be used for the purpose of fault detection [22]. A search of the literature shows that
a few authors have attempted various versions of these and other strategies in an attempt
to effectively use wavelet transforms for data cleansing [5], [23], [24]. However, in large
part, these methods are batch analysis methods that require decomposition of large
amounts of data in simultaneous calculations, which would not translate well to real-time
fault detection.
Therefore, a method developed by Misra et al. using the OWT for real-time
compression of process data is applied and extended [4]. In the original method, wavelet
decomposition is carried out only on small windows of data (with a minimum of two data
points), which reduces buffering of the incoming data. However, data is saved as
coefficients and needs post-processing to recover. In the extension developed in this
24
work, fault detection and cleansing are done directly on the incoming data eliminating the
need to save or process any additional information beyond the data itself. A further
advantage is that this method is completely automated and can be carried out in real-time
parallel to the actual data collection with no user involvement.
In the fault detection step of the algorithm, first level detail coefficients are
determined using new pairs of data points (direct application of the OWT). A running
calculation of the mean and standard deviation of these coefficients is also updated, and a
cut-off threshold is determined based on a setting that combines the accuracy and range
of an instrument. This setting can be determined using available information from the
sensor manufacturer, or it can be a general value set by the user. Outliers and spikes are
detected whenever the detail coefficient surpasses this threshold. Faults that occur over
longer periods of time (such as flat-line or frozen sensors) are determined by analyzing
the differences between several consecutive detail coefficients. For example, if the
difference between several consecutive detail coefficients is exactly zero, then the sensor
may be frozen. Currently, this method is able to detect outliers, spikes, missing and null
values, frozen values, flat-line, and exactly linear drift (can occur as a result of sensor
malfunction). Step changes (which could be a result of bias) can also be detected,
however, detection only occurs at the time of the step change. Once faults are detected,
they are immediately replaced by a reconstruction.
2.3.2.1 Fault Detection Results for Single Faults
To first determine whether outliers would be observed in the detail coefficients of
the data, two outliers were artificially introduced into the data set. The data set was
25
decomposed to the final level (only one approximation coefficient left). Figure 2.4 is a
plot of the detail coefficients resulting from the decomposition. It can be observed that
even a relatively loose threshold (for example, ±150) leads to correctly identifying the
two outliers. Outliers can therefore successfully be identified by decomposing the data to
the first level. A cleansing procedure could then be carried out, or the two values could
simply be replaced by a mean to obtain a more accurate representation of the data. It is
important to note that tags operating close to 0 must first be scaled for this method to
effectively detect null values.
Figure 2.4: Outlier Detection Using Wavelet Transform
Results for outliers and spikes are presented in Figure 2.5. In the presented
example, three outliers/null values are added into a data set and are successfully detected
by the wavelet-based real-time detection method. The fault location is recorded to allow
the reconstruction algorithm to calculate a new value.
0 100 200 300 400 500 600
-400
-200
0
200
400
600
800
26
Figure 2.5: Results for Real-time Wavelet Fault Detection
2.3.2.2 Fault Detection Results for Continuous Faults
The results for frozen value and linear drift detection are shown in Figures 2.6 and
2.7. To test the detection strategy, these faults were artificially added into the original
data. In the frozen value case, the fault was identified when several successive first level
detail coefficients were exactly zero. Through this, one can identify where the fault began
and ended.
However, there are two issues with this strategy. First, detail coefficients at the
first level can be zero or close to zero when the process is stationary so it is difficult to
identify the difference between the normal process and a fault. Second, depending on the
data set, different numbers of identical measurements can signify a frozen value fault. In
this case, three successive zero detail coefficients were considered to be a frozen value
27
fault (corresponding to 6 frozen values), since the fault was artificially added, however
this may not be the case for other data sets.
As in the frozen value, a linear drift fault was artificially added into the data set
and successively identified using the detail coefficient strategy. Since the linear drift that
was added was within the normal range, it would be difficult to identify it using other
common methods. However, by analyzing which of the detail coefficients remain the
same, the strategy can be used to highlight the portion of the data that is affected (see
Figure 2.7). It is clear that it would not be possible to visually identify the drift as it looks
like a “normal” part of the data.
In these results the drift introduced was exactly linear. The strategy can be
extended to an approximately linear drift by introducing a tolerance parameter, or a
quadratic drift by decomposing the data to one further level (second order differences).
However, this method assumes that there are no missing values in the data and can only
be used to identify simple slope based faults like the ones presented here.
28
Figure 2.6: Frozen Value Identification
Figure 2.7: Linear Drift Detection
0 100 200 300 400 500 600
650
660
670
680
690
700
710
Sample #
0 100 200 300 400 500 600
640
660
680
700
720
Sample #
28
29
In the real-time detection for longer faults, if several values (4) are found to be
identical or have an exactly linear trend, they are detected as faults and replaced by the
reconstruction algorithm (see Figure 2.8). This can pose a problem when the fault
continues for a longer time, because the reconstruction model will become more and
more outdated. Also note that the overlap seen in Figure 2.8 would not occur with the
reconstruction algorithm running alongside (the values would have been cleansed
already).
Figure 2.8: Real-time Detection of Frozen Values and Drift
30
2.3.3 Reconstruction using Recursive Least Squares
Recursive least squares is a filter that is similar to least-mean squares, however, it
is more adaptive and therefore can be used with time-varying input statistics [25]. Least-
mean squares is an adaptive filter that updates parameters using a gradient method. RLS,
on the other hand, directly updates the parameters at each time-step, which explains why
it is more adaptive. The basic RLS algorithm can be explained by Figure 2.9 (adapted
from [25]) and the steps shown in Algorithm 1.
Figure 2.9: RLS with Forgetting Factor
31
Algorithm 1: RLS with Forgetting Factor
1. Using the previous set of coefficients, calculate the output y,
𝑦 (𝑒 ) = 𝑥 𝑇 (𝑒 )𝑝 𝑎𝑒𝑎𝑝 (𝑒 − 1)
2. Calculate the error using a desired signal, d(i),
𝑒 (𝑒 ) = 𝑑 (𝑒 ) − 𝑦 (𝑒 )
3. Calculate the gain vector, k,
𝑘 (𝑒 ) =
𝑷 (𝑒 − 1)𝒙 (𝑒 )
𝜆 + 𝑥 𝑇 (𝑒 )𝑷 (𝑒 − 1)𝑥 (𝑒 )
4. Update the inverse covariance matrix,
𝑷 (𝑒 ) = 𝜆 −1
[𝑷 (𝑒 − 1) − 𝑘 (𝑒 )𝑥 𝑇 (𝑒 )𝑷 (𝑒 − 1)]
5. Update the coefficients for the next iteration,
𝑝 𝑎𝑒𝑎𝑝 (𝑒 ) = 𝑝 𝑎𝑒𝑎𝑝 (𝑒 − 1) + 𝑘 (𝑒 )𝑒 (𝑒 )
6. Update the data by replacing y(i).Use the updated data and start again at
Step 1.
where 𝑦 (𝑒 ) is the output vector,
𝑥 (𝑒 ) is the vector that contains p (model order/number of previous points)
𝑷 (𝑒 ) is the inverse of the sample covariance matrix
𝑘 (𝑒 ) is the gain vector
𝑑 (𝑒 ) is the desired signal output
𝑝 𝑎𝑒𝑎𝑝 (𝑒 ) are the model coefficients/parameters
The forgetting factor is a user-defined parameter between 0 and 1 that decides the
importance placed on previous data. The smaller the forgetting factor, the smaller the
contribution of the previous samples. Conversely, when the forgetting factor is 1, all
previous values are considered and the case is called the “growing window RLS
algorithm” [26].
If the desired signal in Figure 2.9 was replaced by an estimate calculated using the
previous values of the data, the algorithm could best adapted for a trend prediction
application as follows (Figure 2.10, adapted from [25]). The only change in the steps
above is that the error is now calculated as the difference between the predicted value
from the current iteration and the previous one.
32
Figure 2.10: RLS with Forgetting Factor for Prediction
Steps of RLS Algorithm with Forgetting Factor for Prediction:
1. Same as before.
2. Calculate the error as,
𝑒 (𝑒 ) = 𝑦 (𝑒 ) − 𝑥 𝑇 𝑝 𝑎𝑒𝑎𝑝 (𝑒 − 1)
3. Steps 3-6 same as before.
Proper initialization of the algorithm is also important in RLS [25]. The initial
parameter coefficients can be set as zero if there is no prior knowledge. Additionally, the
initial value of the covariance matrix must be set as a large number multiplied by the
identity matrix. As seen later, the exact value of this “large number” matters less and less
as more data points are considered. Finally, the issue of setting the initial input data
history is circumvented by starting the algorithm one data point ahead of the model order,
p (essentially the number of coefficients). As a result, a fault within the first p data points
will not be fixed in this application.
This reconstruction algorithm can work in tandem with any fault detection
algorithm as its main requirement is the prior knowledge of fault locations. Aside from
33
the user-defined model order and forgetting factor, only the fault location is necessary for
the reconstruction algorithm to be carried out. Once these inputs are defined, the
reconstruction algorithm is as follows.
Steps of the RLS Fault Reconstruction Algorithm:
1. Load data and fault vector from detection algorithm.
2. Read the data till the first fault location.
a. If the data is highly non-stationary, read only the previous 50-100 points.
Otherwise, start from the beginning every time.
3. Do RLS to determine best set of model parameters (with specified forgetting
factor and model order).
4. Use the model parameters to predict the value at the fault location, and then
replace original value.
5. Move to next fault location and repeat steps 2-4 until all faults are replaced with
predicted values.
a. If faults are sequential (e.g. in drift or frozen values), do not update model
with predicted values – continue using the same parameters.
While the RLS algorithm essentially eliminates the need for any other simple
operators, it has a clear disadvantage in Step 5 – faults that are present for longer periods
of time cannot be accurately predicted as the model will remain unchanged. If the model
is updated with the replaced values, then there will be a cumulative error effect as the
fault continues. However, the RLS method still leads to equal or better estimates than the
34
simple operators in most cases. It is also useful to note that when the forgetting factor is
set very close to zero and the model order p is chosen as 2, it is equivalent to linearly
extrapolating the last two points in the data set.
In addition to comparing the RLS algorithm to linear regression and other simple
operators currently in use, an average prediction error can be determined to assess the
level of confidence that can be placed on the reconstruction. The prediction error is based
on single faults only, as the RLS algorithm has a cumulative error effect on longer faults.
2.3.3.1 Effects of Various RLS Setup Parameters
The results for the use of RLS with forgetting factor are presented in the next
section. First, the effects of the user defined parameters, including the forgetting factor,
model order and P-matrix initial values are discussed. In this case, the results are
compared with the actual value in the data set (as the tests are carried out on artificial
faults), as well as the results using linear regression done with p (the model order) data
points. Finally, comparisons to other simple operators will be discussed.
1. Effect of Forgetting Factor
Figure 2.11 presents the relationship between the predicted value and the value of
the forgetting factor used in the algorithm for an artificially created missing value fault.
The top graph compares the result of the RLS algorithm with the actual value and the
linear fit value, while the bottom graph compares the error of the linear fit value with the
RLS prediction. In this case both create a model with only two parameters. As mentioned
earlier, it is clear that as the forgetting factor approaches 0, the result of the RLS
35
algorithm approaches that of the linear regression (since the model order is 2). The closer
the forgetting factor gets to 1, the better the estimate. The rule of thumb when using
forgetting factor is to choose a value between 0.95 and 1. However, the data set presented
here is relatively stationary so a high forgetting factor is reasonable. For highly non-
stationary data, a low forgetting factor should be chosen so that previous data does not
affect future predictions.
2. Effect of Model Order
Figures 2.12 and 2.13 present the relationship between the predicted value and the
number of previous values used in the model, also referred to as the model order or the
number of model coefficients. Two different sections of data are presented, and it is clear
that using a higher model order results in a more accurate estimate. The simulation is run
with a forgetting factor of 0.9, as the data set is relatively stationary. In the case of non-
stationary data, it would be more prudent to use a lower model order as only the most
recent values can be useful in making a future prediction.
3. Effect of Initial Value of P-Matrix
While running the RLS algorithm, one of the user defined parameters can be the
initial value of the P-matrix. A rule of thumb is simply to use a large number multiplied
by an identity matrix with the same dimensions as the model order. To determine whether
the initial value of P has any real effect on the reconstruction, a relationship was drawn
between the predicted value and the actual value for varying powers of P-initial. Figure
36
2.14 presents the results. The simulation was run with 5 previous values and a forgetting
factor of 0.9, although changing these has little effect on the relationship that is observed.
It is clear that changing the initial value of the P-matrix has little effect on the prediction.
However, it is important to note that if the fault is very early on, the initial value will
have an effect. Therefore it is important to have at least a small set of fault free data at the
beginning of the set. If there are faults in the initial data set, their effect will be
diminished over time as the model focusses on more recent data.
37
Figure 2.11: RLS Algorithm for Various Forgetting Factors
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
674.5
675
675.5
676
676.5
677
Value of Forgetting Factor
Predicted Value
Comparison of the RLS with Forgetting Factor Prediction with the Actual Value and the Linear Fit Value
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.225
0.23
0.235
0.24
0.245
0.25
0.255
0.26
Value of Forgetting Factor
%Error
Comparison of Error Using RLS (with Forgetting Factor) vs Linear Fit
Prediction
Actual Value
Linear Extrapolation
Prediction Error
Linear Fit Error
37
38
Figure 2.12: RLS Algorithm for Various Numbers of Previous Values (1)
2 4 6 8 10 12 14 16 18 20
691.5
692
692.5
693
693.5
Number of Previous Values Used
Predicted Value
Comparison of the RLS with Forgetting Factor Prediction with the Actual Value and the Linear Fit Value
Prediction
Actual Value
Linear Extrapolation
2 4 6 8 10 12 14 16 18 20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Number of Previous Values Used
%Error
Comparison of Error Using RLS (with Forgetting Factor) vs Linear Fit
Prediction Error
Linear Fit Error
38
39
Figure 2.13: RLS Algorithm for Various Numbers of Previous Values (2)
0 5 10 15 20 25 30
691
691.5
692
692.5
693
693.5
Number of Previous Values Used
Predicted Value
Comparison of the RLS with Forgetting Factor Prediction with the Actual Value and the Linear Fit Value
Prediction
Actual Value
Linear Fit
0 5 10 15 20 25 30
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Number of Previous Values Used
%Error
Comparison of Error Using RLS (with Forgetting Factor) vs Linear Fit
Prediction Error
Linear Fit Error
39
40
Figure 2.14: RLS Algorithm for Various Initial Values of Inverse Covariance Matrix
10
0
10
1
10
2
10
3
10
4
10
5
691.5
692
692.5
Initial Value of Covariance Matrix
Predicted Value
Comparison of the RLS with Forgetting Factor Prediction with the Actual Value and the Linear Fit Value
2 Previous Values
6 Previous Values
10 Previous Values
10
0
10
1
10
2
10
3
10
4
10
5
0
0.05
0.1
Initial Value of Covariance Matrix
%Error
Comparison of Error Using RLS (with Forgetting Factor) vs Linear Fit
2 Previous Values
6 Previous Values
10 Previous Values
40
41
It is clear that the forgetting factor and the model order are the key variables to
achieve good reconstruction. Therefore, an initial value of 10000 for P is suggested,
removing it from being a user defined parameter. While the other two variables are kept
as user-defined parameters in the above tests, they could be optimized by cross
comparing and determining the ideal combination. In most cases, a relatively low value
of 0.65 for the forgetting factor and 5 previous values works for all types of data sets
(highly variable, faulty, stationary etc).
2.3.3.2 Reconstruction of Continuous Faults
The reconstruction algorithm is now tested when there are consecutive errors
present, as in drifts and flat-lines. The results presented in this section (Figure 2.15) are
generated by using a relatively low forgetting factor of 0.65 and 5 previous values in an
attempt to build a model that will hold over several samples. Figure 2.15 compares the
results of the RLS algorithm and ordinary least squares to the actual data set. It can be
observed that in this case, the RLS algorithm provides a better representation of the data
than the linear least squares.
42
Figure 2.15: Consecutive Errors - "Zoomed In"
1 1.5 2 2.5 3 3.5 4 4.5 5
686.5
687
687.5
688
688.5
689
689.5
690
690.5
Actual Value
Predicted Value
Linear Fit (Least Squares) Value
43
2.3.4 Challenges in Application to Oil and Gas Facilities
Since the application of this algorithm will be in oil and gas facilities, there are
additional issues that must be taken into consideration. Two of these will be discussed in
the following section: unevenly sampled time series and the use of a hopping/rolling
window (defined in section 2.3.4.2).
2.3.4.1 Unevenly sampled time series
Since an oil and gas facility is a very large system made up of a variety of small
parts, the amount of data collected in real-time is vast. However, to optimize the amount
of data collected, there is an initial step where data is collected by a ‘report by exception’
system [27]. For example, the level of a tank may be steady for long periods of time, and
may only change significantly when it is being filled or emptied. In the latter case, data is
automatically collected more frequently to capture these changes. In general, real-time
data collection may also not be exact, with values being recorded at an approximate time
interval. Since the classic wavelet algorithm assumes evenly sampled data, the effect of
unevenly timed samples must be considered for this application. One strategy that was
considered was to determine ‘missing’ time stamps based on the smallest time interval
observed so far and fill in these values by repeating the previously collected value (see
Tables 2.1 and 2.2). Table 2.3 then outlines the pros and cons of applying this strategy to
the current data cleansing method.
44
Table 2.1: Original Wavelet Method with Evenly Sampled Data
Time 1:00 1:01 1:03 1:04 1:05 1:06 1:08 1:09 1:10 1:11
Data P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Detail
Coefficient
D1 D2 D3 D4 D5 D6 D7 D8 D9
Table 2.2: Suggested Strategy to Mitigate Uneven Time Sampling
Time 1:00 1:01 1:02 1:03 1:04 1:05 1:06 1:07 1:08 1:09
Data P1 P2 P2 P3 P4 P5 P6 P6 P7 P8
Detail
coefficient
D1 0 D2 D3 D4 D5 0 D6 D7
Table 2.3: Pros and Cons of Repeated Value Strategy for Uneven Time Sampling
Pros Cons
• Mean/standard deviation
calculations would be more
accurate.
• First level detail coefficients when the same
value is held are 0. If D1 for example is
very different from 0, it would read
incorrectly as an outlier.
• Would be similar to the hold last
good value simple operator.
• Longer faults are detected based on trends
in the detail coefficients. Repeated values
result in detail coefficients that are 0,
possibly disrupting trends.
• Reconstruction would be simple.
• Data collection is already not exact – in the
provided data, the time interval is
approximate: it varies by a few seconds. It
may be difficult to fill in exact ‘missing’
time stamps.
• Detecting missing values is no longer a
capability.
Based on the advantages and disadvantages shown in Table 2.4, the negative
aspects of the suggested strategy outweigh the positive and the recommendation was to
accept uneven time sampling, as long as the time intervals were not very different from
45
sample to sample. Since the wavelet detection relies on trends in the detail coefficients, it
is not necessary to change the detection portion. Since the method is considering mainly
first level detail coefficients (and at most second level), the uneven time sampling is
acceptable. If the traditional wavelet transform was being used with more data points in
each calculation, this is the case where the timing would be more of a concern.
2.3.4.2 Hopping window vs Rolling Window Data Analysis
A second important consideration for the application of the single tag cleansing
algorithm to data from oil and gas facilities relates to the collection of data in real-time.
Since there are several steps that the data passes through, including the sensor, a
temporary server, and the final archive, the goal is to temporarily buffer as little data as
possible to reduce computational load to the system. Two strategies were initially
considered for the real-time implementation of the single tag cleansing algorithm.
First a ‘hopping window’ strategy was considered. This is the original strategy
behind batch or semi-batch implementations of wavelet-based outlier detection. In the
hopping window implementation, small ‘windows’ of data are analyzed simultaneously
(see Figure 2.16). For example, for a window size of 8 points (as in Figure 2.16), 4 pairs
of points will be used to determine 4 detail coefficients. The standard deviation is
calculated using only the values in the current window. This method requires the window
size and detail coefficient threshold as user inputs, where the latter is set as a multiple of
the standard deviation.
46
Figure 2.16: Hopping Window Algorithm Visualization
The second strategy is referred to as a ‘rolling window’ which requires the
buffering of 4 points (or 3 staggered pairs) at a time, the mean and the standard deviation.
This means a minimum of 7 data values must be temporarily buffered at a time. The
mean and standard deviation must also be updated with every new pair of points. The
visual representation of this strategy is shown in Figure 2.17.
Figure 2.17: Rolling Window Algorithm Visualization
There are several advantages to using the rolling window strategy. It is more
accurate in the application of the wavelet transform, since detail coefficients are
calculated using staggered pairs of data, and therefore every possible pair is considered. It
also requires far fewer points to be considered as the accuracy of the hopping window
strategy increases with window size. It was found that ideally 64-128 points would be
required for accurate results in a hopping window implementation. The rolling window
47
strategy also requires far fewer points to be buffered, reducing computational load. A
constantly updating standard deviation and mean also results in better accuracy for this
strategy. There are also some disadvantages to using the rolling window strategy. First,
several calculations have to be carried out simultaneously, including determining the
detail coefficient, updated standard deviation, new threshold and a comparison against
the previous threshold. In the case of a continuous fault, there also needs to be a
contingency of not including data that is reconstructed in real-time in updated mean and
standard deviations.
Considering these advantages and disadvantages, the rolling window strategy was
chosen as the best way to implement the complete single tag data cleansing algorithm in
real time. For an algorithm that is intended to be implemented for an indefinite period of
time, reducing the computational load has to be prioritized over doing extremely accurate
calculations using large amounts of data.
2.3.5 Complete Single Tag Data Cleansing Algorithm
In this section, the finalized and complete method for real-time single tag
cleansing is discussed. For the real-time application, a method developed by Misra et al
for online wavelet compression of process data is applied and extended as outlined in the
previous section [4]. The complete algorithm for the data cleansing method combines the
fault detection and reconstruction steps and is outlined in Algorithm 2. An important
aspect of this method is the ability to apply this method automatically and in real-time to
incoming process data. Since statistical information such as the mean, standard deviation
as well as the RLS parameters must be updated in real-time, the number of data points
48
represented in these values – and by extension, the calculation index – would increase
infinitely. Therefore, an interval must be defined, where the calculation index is
refreshed, and weighted calculations of the required statistics are carried out using new
indices.
Algorithm 2: Single Tag Data Cleansing
1. Initialize training data, n, and interval length, n
2
.
2. Determine initial model using training data:
a. Calculate 1
st
level detail coefficients using Haar wavelet.
b. Calculate mean and standard deviation of detail coefficients.
c. Calculate threshold.
d. Detect faults in training data by comparing detail coefficients to threshold.
e. Run RLS to replace detected faults.
f. Calculate new mean, standard deviation.
3. Initialize RLS parameters.
4. For every new pair of data points,
a. Calculate 1
st
level detail coefficients using Haar wavelet.
b. Update the following with new value:
i. Mean
ii. Standard deviation
iii. Threshold
iv. RLS parameters
5. Fault detection:
a. If several detail coefficients are exactly 0, frozen sensor.
b. If several detail coefficients are exactly identical, linear drift.
c. If detail coefficient exceeds threshold, outlier.
6. If fault detected, run RLS using most recent parameters, generate predicted value
7. Replace and return to step 4.
8. For every n2 values, temporarily hold mean and standard deviation to reset
calculation index.
The fault detection and data cleansing algorithm was tested on real process data,
with faults artificially added to test the effectiveness of the method. The detected faults
were then reconstructed using the second half of the algorithm and then compared to the
actual measurements. Faults were successfully detected on a variety of measurement
49
types, and reconstruction results using RLS were almost always an improvement over
currently used methods such as mean and linear interpolation, particularly in the case of
faults that occurred over longer periods of time. In this section, results are shown for
several cases, and the effect of various parameters on the success of the algorithm is
discussed.
2.3.6 Results
The following examples show the application of the complete algorithm, with
both detection and reconstruction of the fault (complete single tag data cleansing).
Figures 2.18 and 2.19 show the original data as well as the cleaned data for a real
measurement occurring over time, with several artificially added faults. Faults detected
and reconstructed using RLS include a missing value, a spike and a flat-line. Numerical
results for this detection and reconstruction are shown in Table 2.4, along with
comparisons to the actual data as well as simple reconstruction methods such as mean
and linear estimation. It is clear from the numerical results that the RLS predictions not
only have below 0.5% error from the actual value in every case, they also have lower
error in almost every case when compared to the other reconstruction methods. In
particular, this data cleansing method is able to very closely recreate the data missing
because of faults that occur over a short time period.
50
Table 2.4: Numerical Results for Data Cleansing Results for Figure 2.16
Fault Type Actual
Value
RLS Value
(%Error)
Linear
Estimation Value
(% Error)
Mean of Data
Set (% Error)
Consecutive 690.478 690.177 (0.04) 691.063 (0.08) 683.232 (1.05)
689.600 690.043 (0.06) 691.647 (0.30) 683.232 (0.92)
689.600 689.627 (0.003) 692.232 (0.38) 683.232 (0.92)
687.844 689.241 (0.20) 692.817 (0.72) 683.232 (0.67)
686.968 689.312 (0.34) 693.402 (0.94) 683.232 (0.54)
Single (1) 679.989 680.88 (0.13) 679.380 (0.09) 679.555 (0.06)
Single (2) 689.599 689.331 (0.04) 689.332 (0.04) 690.039 (0.06)
As seen previously in Figure 2.15, the RLS reconstruction, with its more detailed
model and consideration of autoregression, generates predictions that attempt to ‘follow’
along the original pattern of the data, as opposed to the linear or mean models, which do
not consider the entirety of the data.
51
Figure 2.18: Fault Detection and Data Cleansing Algorithm Applied to Process Data
52
Figure 2.19: Reconstruction of a Fault Over Time using the RLS Model
53
2.3.6.1 Confidence of Results
The actual implementation of this method takes place in real-time and as such,
there is no basis of comparison as in the artificially created faults as seen in the results so
far. Therefore, it is important to develop confidence metrics with which the results of
both the detection and reconstruction can be assessed. A traditional prediction interval
could be determined based on the current mean and variance of the data. This method
would require constantly updating the mean and the variance. Alternatively, a noise
model could be estimated from the data, however there would be no guarantee that this
model works for all types of data. Finally, another way to determine the confidence of the
result would be to apply a dead-band concept which identifies the range of possible
values based on the standard deviation.
For the detection portion of the method, the dead-band method is ideal, as it is a
classic way to determine whether a value is an outlier or not. Since outliers are
determined by whether their detail coefficients exceed a certain threshold, adding on a
dead-band based on the previous values can be a valuable way to double-check results. A
common definition of an outlier is a value that falls outside of three standard deviations.
In the data shown in Figure 2.18, all faults were certainly far outside three standard
deviations and could confidently be stated as faults.
Finally, for the reconstruction portion, the traditional prediction interval is the
ideal choice to determine whether the predicted value is within an acceptable range. A
95% prediction interval can be determined using the classic expression:
𝑋 �
𝑛 ± 𝑇 𝛼 𝑒 𝑛 �1 + (1/𝑛 ) (2.3)
54
where 𝑋 �
𝑛 is the sample mean, 𝑒 𝑛 is the sample variance, n is the sample size and 𝑇 𝛼 is the
score at α percentile (1-0.95=0.05 in this case) and a large degree of freedom. A key issue
here is that if the sample size is large, the prediction interval will also be very large and
consequently not very useful when it comes to assessing the quality of the results.
Therefore, only a small selection (5-10) of the previous values should be used to generate
sample statistics.
2.4 Automated Real-time Data Compression
2.4.1 Overview of Process Data Compression Techniques
As previously discussed, improvements in sampling methods and sensor
technology have resulted in the creation of vast amounts of process data that can be used
to monitor and improve the operation of chemical plants. There is, therefore, a need to
store the data in a compact form that is easy to access and manage. Another important
consideration is to ensure that data can be retrieved and transferred over long distances
efficiently and accurately [28]. Data compression is a method in which raw data is
analyzed and insignificant portions of data are eliminated. The intent is to control the loss
of information and maintain the integrity of the data while eliminating the noise [4].
Several methods have been proven successful in the application of process data
compression. There are three basic methods of data compression used for process data:
piecewise linear interpolation compression, data transforms and quantization [28]. The
first includes methods like the boxcar, in which the algorithm records the previous value
when a value outside a set error bound is detected, as well as the backward slope method,
55
where two previously recorded values are used to project a recording limit based on the
slope of the line between them. Once again if the new value falls outside the limit, the
previous value is recorded. These methods are used in the chemical engineering software
AspenTech. As is evident from their descriptions, these are simple algorithms that can be
operated effectively in steady state environments and with low computational ability [28].
Data transforms include the commonly known Laplace and Fourier transforms, as well as
the wavelet transform which is used here. There are two formulations: continuous, which
results in lossless compression, and discrete, which results in lossy compression. Finally,
quantization is a procedure of constraining a smaller discrete set from a continuous set of
values, for example, constraining real numbers of a set of integers [28].
Methods have also been developed for the online compression of process data. An
example of such a method is the swinging door compression algorithm used in the PI
software by OSI Soft. This algorithm is similar to the backward slope algorithm and
determines whether or not a value will be stored based on the amount it deviates from the
last recorded value. Data is compressed at two stages; first, at the interface, where an
‘exception’ algorithm determines which data to store based on the previous data point,
and second, at the PI server where data is put through an additional ‘compression’
algorithm. Values are also stored after certain time periods regardless of whether they
meet the deviation requirement to ensure that sufficient data is being stored. While this
method is easy to implement online – as it acts sequentially – and has high efficiency
when applied to steady state or pseudo-steady state behavior, the decision to store
information is based on univariate analysis, and cannot be extended to preserve
56
correlations between variables. It also has a low level of compression and is not reliable
when the data are “jagged”, that is, have many quick changes and spikes [29].
2.4.2 Automated Real-time Wavelet-based Compression Algorithm
The application of wavelet transforms to data compression has been well
developed, particularly in the technology field for applications such as image
compression [4]. However, until recently, these methods were largely used for batch data
applications. Misra et al developed the OWT for real-time data compression, in particular
for process data [4]. However, this method uses the classical denoising technique
described earlier, where thresholds are set on the detail coefficients, and coefficients
lower than the threshold are eliminated. This method still stores the data as coefficients,
and requires post-processing in the future to reconstruct the compressed data set. The
effectiveness of data compression depends on the threshold as well as the number of
levels of decomposition. An iterative method has been used here to automatically
determine the threshold based on a user defined maximum acceptable error between the
original and reconstructed data. The algorithm tries successively smaller thresholds to try
and achieve the maximum allowable error. Three quantities are used to evaluate the
effectiveness of the data compression. The compression ratio (CR, see equation (2.4))
compares the original uncompressed size to the compressed size; the higher the number,
the better the compression. The threshold is determined using the local point error (LPE),
which is the maximum distance between the original and reconstructed data [30].
Calculating the maximum distance between every data point separately allows the
algorithm to capture spikes and important outliers in the data as well.
57
𝐶𝐶 =
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑆𝑆𝑆 𝐶 𝑂 𝐶𝑆𝑂 𝑆𝑛𝑎 𝑂 𝑆𝑆𝑆 𝐶 (2.4)
In the new algorithm, the OWT is still used to determine detail coefficients,
however, the raw data point is directly compared to an estimated data point (determined
by eliminating the noise portion). Comparing the two values provides an idea of whether
the incoming value contains ‘important’ information about the process and must be saved
or whether it is simply following the pattern of the data (allowing for the noise creating
by the instrument) and does not need to be saved. Further, the threshold in the new
method is set based on the accuracy and range expected from the type of measurement
being made and the instrument used to make it (e.g. pressure, temperature etc.). The
complete automated real-time wavelet-based compression algorithm is described in
Algorithm 3. To better understand the iterative nature of the algorithm, Figure 2.20
visually presents the steps of the algorithm as a flowchart.
As discussed, this algorithm is adapted from the original online wavelet algorithm
presented by Misra et al [4]. However, there are several key difference and improvements
from their method. As previously described, the wavelet compression method requires a
threshold to determine whether information should be kept or deleted. In this algorithm,
instead of setting these thresholds as multiples of standard deviations (as in single tag
cleansing), the threshold is determined by a product of the instrument accuracy and
range. This information is easily made available by the manufacturers of the sensors.
However, if the information is not available, a generic setting can also be used.
The novel algorithm is also more of a data filter in that it compresses data sets by
immediately deciding which data points need to be kept or eliminated. This is more
58
similar to the boxcar strategy, which has been proven effective in online implementation.
Furthermore, it only requires the calculation of the detail coefficients to determine
whether data points should be retained. In the original method, both approximation and
detail coefficients must be calculated, saved and then reconstructed. Most of these steps
are eliminated in the new algorithm which reduces the computational load. Since the
decision to keep or eliminate data is also made ‘upfront’, there is no requirement to save
anything other than the compressed data set. Finally, by setting a maximum tree size, the
number of buffered data points is minimized while forcing intermittent saves (similar to
the swinging door algorithm).
This algorithm has several advantages over the traditional swinging door method
used for PI. First and foremost, it is an automated method, which requires no input or
decision making from the user. The use of the OWT also allows for much higher
compression ratios compared to swinging door [4]. The compression errors in this new
method are very low compared to the original method, while the compression ratios are
very high. Another advantage is that this method compresses each unique tag separately,
eliminating the tedious step within PI where five parameters must be set for every single
tag. Finally, the algorithm is also designed to work in a PI environment as it takes
advantage of the fact that time stamps and data points are saved together in the software.
59
Algorithm 3: Automated Real-time Wavelet-based Compression
1. User selects tags for archiving
2. Identify tag type based on string search of engineering units
a. If tag type not identifiable this way, do a string search of the tag (any
tags that have “%” units, or where unit string is empty)
3. Determine performance requirement based on Instrumentation Documentation
or Instrument Precision
4. Receive data + time stamps. Needs two points to begin.
5. Wavelet Decomposition on first two points (begin the “tree”).
6. Delete detail coefficient, reconstruct, compare against original.
a. If each value is within half the accuracy, pass test, go to 6. Note: Using
half the accuracy ensures that the two values are within the full accuracy
limit of each other.
b. If not, fail test, go to 5.
7. If test failed, save first of two points, and use second as new starting point.
8. If test passed, grow by two more points and carry out wavelet decomposition
again.
9. Reconstruct, perform same test.
a. If test failed, save first value, move to next new value, and go back to 3.
b. If test passed, grow by four more points.
10. Forcibly dispatch tree if it grows to 8 points to not overload server. Start new
tree.
60
Figure 2.20: Automated Real-Time Wavelet-based Compression Flowchart
60
61
2.4.3 Compression Results
Data compression using wavelet decomposition was carried out for several sets of
data. While the finalized algorithm from Figure 2.20 only decomposes the data to the first
level to reduce computational load, it was important to determine whether decomposition
to further levels would result in an improvement that warranted the increased load.
Presented in Figure 2.21 is an example of the results to two levels of decomposition.
While the two plots appear nearly identical, it is important to note the difference in
compression and local point error achieved at the two levels. In the first level, the CR is
61.5%, while the LPE is approximately 0.44. In the second level, the CR is 38.5% and
the LPE is 0.62. This means that for a small increase in error, the second level
decomposition results in a 20% decrease in the amount of data that must be retained.
However, this result must be considered along with the computational load concern.
Since the implementation of this algorithm is in real-time for an unlimited amount of
time, and since (as shown in Figure 2.25 and discussed shortly) the improvement over the
industry standard PI is enormous, the practical choice is to prioritize the reduction in
computational load and use the first level coefficients only.
62
Figure 2.21: Data Compression to Two Levels Using Wavelet Transform
0 100 200 300 400 500 600
640
660
680
700
720
Wavelet Decomposition Level 1
Raw Data
Reconstruction
0 100 200 300 400 500 600
650
660
670
680
690
700
710
Wavelet Decomposition Level 2
Raw Data
Reconstruction
62
63
Figure 2.22: Relationship of Compression Ratio with Local Point Error
Continuing this analysis on various data sets and to increasing levels results in a
relationship between the compression ratio and the maximum allowable LPE (see Figure
2.22). It is clear that while the CR reduces drastically with increasing LPE (that is, the
less data there is, the greater the LPE), a point is reached when it remains approximately
the same for increasing LPE.
The next step in testing the algorithm is to compare the compression results with
the industry standard PI method. Figure 2.23 shows a real data set where the original data
and the PI compression settings were provided and replicated (red) and then compared to
the new wavelet method (blue). The PI compression settings were 1% for both
0 0.5 1 1.5 2 2.5 3 3.5
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
Final Local Point Error
Compression Ratio Achieved
64
compression and exception deviation. The wavelet compression was set to 1% accuracy
to provide directly comparable results. As seen in Figure 2.23, both methods for
compression follow approximately the same trend with the PI method capturing more
outliers. However, the difference in the compression ratio is immense: in PI compression,
79.8% of the data are preserved with this setting, while with wavelet, only 22.5% of the
data are retained.
Figures 2.24 – 2.29 show examples of the application of the final compression
algorithm to actual data sets. In Figure 2.24, compression of a level tag with instrument
accuracy 0.5% and range 35 inches results in a compression ratio of 71.20%. Figure 2.25
shows the compression of the same level tag, but with the instrument accuracy set at 1%
instead. Now, the compression ratio reduces to 53.35%, however, the faulty data/unstable
behaviour is still captured by the compressed data set. Figure 2.26 shows a pressure tag
with instrument accuracy 0.25% and range 1500 psig. The compression ratio is 12.5%,
and only 108 of the 861 data points are preserved. This is because other than the ‘fault’
towards the end of the data set, it is stationary and moves in a predictable oscillatory
pattern. As a result far less data points are required to accurately depict the data set.
Furthermore, if only the first, stationary half, of the data set is considered, as in Figure
2.27, the compression ratio is slightly lower at 12.2%.
The final set of results (Figure 2.28 – 2.29) depict the results of a flow tag with
accuracy 0.5% and 1%, with a compressio nratio of 74.22% and 60.35%, respectively.
However, there is a third trend that depicts the linear interpolation of the values to
recreate original time stamps. Two strategies are actually included in the algorithm to plot
65
the data to recreate time steps: first, a method that simple holds the last good value till the
next compressed data point is reached, and second one that linearly interpolates data to
create the approximate trend of the original data set. The second is used in practical
applications and therefore the results are presented here. A full working example that was
included in the patent application is included in the Appendix [13].
66
Figure 2.23: Comparison of Compression with PI and Novel Algorithm
0 2000 4000 6000 8000 10000 12000
600
620
640
660
680
700
720
Time Stamp
Pressure
66
67
Figure 2.24: Compression of a Level Tag, Accuracy = 0.5%, Range = 35 in
0 50 100 150 200 250 300 350 400 450
0
5
10
15
20
25
30
35
40
Sample
Level
Original Data Set
Compressed Data Set
67
68
Figure 2.25: Compression of a Level Tag, Accuracy = 1%, Range = 35 in
0 50 100 150 200 250 300 350 400 450
0
5
10
15
20
25
30
35
40
Sample
Level
Original Data Set
Compressed Data Set
68
69
Figure 2.26: Compression of a Pressure Tag, Accuracy = 0.25%, Range = 1500 psig
0 50 100 150 200 250 300 350 400 450
0
100
200
300
400
500
600
700
800
Sample
Level
Original Data Set
Compressed Data Set
70
Figure 2.27: Compression of a Level Tag, Accuracy = 0.25%, Range = 1500 psig
0 50 100 150 200 250 300 350 400 450
665
670
675
680
685
690
695
700
705
Sample
Level
Original Data Set
Compressed Data Set
70
71
Figure 2.28: Compression of a Flow Tag, Accuracy = 0.5%, Range = 1800 BSPD
0 100 200 300 400 500 600 700 800 900 1000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Sample
Flow
Original Data Set
Compressed Data Set
Linear Interpolation to Recreate Original Timesteps
71
72
Figure 2.29: Compression of a Flow Tag, Accuracy = 1%, Range = 1800 BSPD
0 100 200 300 400 500 600 700 800 900 1000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Sample
Flow
Original Data Set
Compressed Data Set
Linear Interpolation to Recreate Original Timesteps
72
73
2.5 Summary
2.5.1 Single Tag Cleansing
In this chapter, two novel automated real-time data analysis algorithms were
developed for process data analysis in chemical facilities and then tested using actual
process data. In the first method, the OWT was used to build a model and detect faults in
a single variable. The OWT was shown to be successful in detection and identifying
several common sensor faults, including missing values, frozen sensors, outliers and
linear drift. To complete the data cleansing procedure, RLS was used to predict
replacement values. This data cleansing method showed improvements on simple
reconstruction methods such as mean and linear modelling, and is able to consider the
data set as a whole, while weighing recent information heavily (an improvement on using
only the recent information). The new algorithm (OWT and RLS together) also updates
statistics with each new value, so decision making is accurate but does not require
repeated calculations on the whole data set. This application directly affects real-time
decision making as detected faults can be addressed immediately, and any predictive
models built from the data will be representative of real operation.
2.5.2 Automated Real-Time Wavelet-based Compression
The OWT is also used to create a novel real-time data compression algorithm to
economically store large quantities of data, with as little user input required as possible.
The OWT threshold in this method is based on the accuracy and range of the instrument,
and therefore does not require any decision making on the part of the user, unlike the
74
industry standard swinging door algorithm which requires extensive manual input, in
addition to other drawbacks. The new method is also an online filter that is completely
automated, with calculations unique to each tag. This new algorithm showed high
compression ratios with low error, and also resulted in superior compression ratios when
compared to the swinging door method in similar conditions.
75
CHAPTER 3 : ROBUST DYNAMIC PRINCIPAL COMPONENT
ANALYSIS FOR MULTIVARIATE DATA ANALYSIS
3.1 Introduction
The previous chapter outlined novel methods for the analysis of univariate data.
However, process facilities are large interconnected systems producing massive data sets
with many variables that may or may not be correlated. Clustering methods can be used
to separate correlated multivariate data sets from the univariate data analyzed in the
previous chapter. While the methods from the previous chapter could potentially be
applied to multivariate data sets, it would be far more valuable to investigate and model
correlated data sets as one system. For example, a fault that occurs on a sensor may
actually propagate through different variables; in a univariate data set, this would be
detected as separate faults with no possible way to predict or correct for it in the future.
Furthermore, as seen in the previous work, incoming data may be faulty and erroneous,
and it is important to be able to model data reliably regardless of these problems.
Commonly used multivariate data-driven modelling techniques, including partial least
squares (PLS), canonical variates analysis (CVA) and contribution plots are all briefly
discussed in the introduction.
In this chapter, principal component analysis (PCA), a well-known technique for
modelling correlated data sets is discussed in detail. The limitations and extensions of
PCA are also discussed. A brief review of various robust PCA techniques, that is, PCA
76
techniques that allow for the modelling of faulty data is discussed. Principal component
pursuit (PCP), a method that is analogous to PCA is also introduced. Novel methods to
apply PCP to robustify dynamic PCA are introduced and two case studies using real
process data are presented to test these methods. Next, a potential method of extending
this method to a recursive robust DPCA method using the Lanczos algorithm is
discussed. Finally, a summary of the chapter is provided in the last section.
3.2 Principal Component Analysis
3.2.1 Classical Principal Component Analysis
Classical PCA is one of the most common methods of data analysis, and has
successfully been applied in a wide variety of disciplines. It converts a set of correlated
observations into a set of linearly uncorrelated variables called principal components
(PCs) using an orthogonal transformation. These PCs take into account the variability of
the original data; the first PC has the largest possible variance, the second has the second
largest, and so on [10]. It can be viewed as a method used to create a new orthogonal
coordinate system formed with the most ‘useful’ dimensions, that is, the dimensions that
contain the most information. For example, a data set containing 3 or more variables can
be represented by two principal components that account for most of the variability,
resulting in a reduction in dimensionality. The formulation of the PCA problem can be
stated as follows.
77
Suppose there exists a large data matrix, X (size nxp). Then a linear combination
of all the variables within X (size nxp) can be considered:
�
𝒕 1
�= �
𝑿
� �
𝒑 1
� (3.1)
where 𝒕 1
is a ‘score vector’ that refers to information between samples and 𝒑 1
is a
‘loading vector’ that refers to information between variables. For more dimensions,
generalize the problem:
𝑻 = 𝑿𝑷 (3.2)
The loadings (𝑷 ) are determined by maximizing the variance of 𝒕 𝑆 to transfer as
much information from the original data as possible. This can be done by maximizing
𝒕 𝑆
𝑇 𝒕 𝑆 , as long as the data is mean centered (mean = 0). Once T and P are determined, the
data can be reconstructed approximately as follows (there is some information loss when
the dimension is reduced):
𝑿 ≈ 𝒕 𝑆 𝒑 𝑆 𝑇 ; maximize 𝒕 𝑆
𝑇 𝒕 𝑆 (3.3)
𝑒 ℎ𝑒𝑒𝑒 𝒕 𝑆 = 𝑿 𝑆 𝒑 𝑆 and 𝒑 𝑆
𝑇 𝒑 𝑆 = 1 (3.4)
PCA can be solved by the eigendecomposition of the covariance matrix. The
eigenvectors are the principal components and the corresponding eigenvalues of the
covariance matrix are the variance of the principal components. Therefore, the
eigenvector of the largest eigenvalue is the first principal component. Since the
78
covariance matrix (or some approximation of it) is required for this solution, PCA is
often more generally solved using singular value decomposition (SVD). In SVD, every
matrix can be decomposed as:
𝐗 = 𝐔 ∙ 𝜮 ∙ 𝑽 𝑇 (3.5)
with 𝑼 𝑇 𝑼 = 𝑰 , 𝑽 𝑇 𝑽 = 𝑰 (3.6)
where 𝜮 is a diagonal matrix containing the singular values, 𝜎 1
≥ 𝜎 2
≥ ⋯ ≥ 𝜎 𝑛 ≥ 0, and
U and V contain the left and right singular vectors of X, respectively. Multiplying both
sides by 𝑿 𝑇 yields:
𝑿 𝑇 𝑿 ∙ 𝑽 = 𝑽 𝜮 𝑇 𝑼 𝑇 ∙ 𝑼 𝜮 𝑽 𝑇 ∙ 𝑽 = 𝑽 ∙ 𝜮 𝑇 𝜮 (3.7)
Then:
𝑿 𝑇 𝑿 ∙ 𝒗 𝑆 = 𝜎 𝑆 2
∙ 𝒗 𝑆 , 𝑒 = 1, … , 𝑛 (3.8)
where 𝒗 𝑆 is the i
th
column vector of 𝑽 . Therefore, 𝜎 𝑆 2
is an eigenvalue and 𝒗 𝑆 is the
corresponding eigenvector of 𝑿 𝑇 𝑿 . Since we have established that the columns of the
right singular vectors of the matrix X are the eigenvectors of X
T
X, and the covariance of
X is X
T
X/(n-1), these are also the eigenvectors of the covariance matrix. The loadings and
scores are then determined by:
𝑷 = � 𝒗 1
𝒗 2
⋯ 𝒗 𝑘 � (3.9)
𝑻 = 𝑿 ∙ 𝑷 (3.10)
79
where k is the number of principal components, limited by 𝑘 ≤ max (𝑛 , 𝑝 ).
This classical formulation of PCA is almost ubiquitous and can be found as an
inbuilt function in most common data analysis software. However, it cannot be
universally applied as it has several limitations. First, PCA is only useful as a
dimensionality reduction technique when some of the variables in the original data are
correlated. If all the variables are completely independent, PCA does nothing other than
reordering according to variance. In addition, PCA is sensitive to scaling; data must be
normalized before using it. Also, not all distributions are completely characterized by the
mean and the covariance, resulting in the data not being accurately represented. All of
these limitations must be considered prior to the use of PCA for data analysis.
In this case, the drawback that we want to overcome is the sensitivity of PCA to
outliers and faults. The training data used to build PCA models must be fault-free as even
a single gross error can have a strong effect on the resulting low rank matrix. It is
important to overcome this drawback in the application of process data analysis as
outliers can commonly occur for any number of reasons (for example, sensor failure).
The goal of robust PCA is to address this disadvantage of PCA and build reliable models
in the absence of fault-free data [31]. Mathematically, PCA finds the optimal solution to a
quadratic criterion (maximizing variance, minimizing mean squared error) whereas the
idea in robust PCA is to replace the quadratic criteria with one that grows more slowly
[32]. Finally, since the PCA problem does not take into account the changes in the
variables over time, it must be adapted to be able to handle process dynamics.
80
3.2.2 Dynamic Principal Component Analysis
Dynamic PCA (DPCA) is an extension of the classic PCA problem that considers
an augmented matrix comprised of time lagged (shifted) variables from the original
matrix X, as follows,
𝑿 (𝑙 ) =
⎣
⎢
⎢
⎡
𝑥 𝑡 𝑇 𝑥 𝑡 −1
𝑇 𝑥 𝑡 −1
𝑇 𝑥 𝑡 −2
𝑇
… 𝑥 𝑡 −𝑂 𝑇 … 𝑥 𝑡 −𝑂 −1
𝑇
⋮ ⋮
𝑥 𝑡 +𝑂 −𝑛 𝑇 𝑥 𝑡 +𝑂 −𝑛 −1
𝑇
⋱ ⋮
… 𝑥 𝑡 −𝑛 𝑇 ⎦
⎥
⎥
⎤
(3.11)
where 𝑥 𝑡 𝑇 is represents the observation at time t, and l represents the number of previous
observations being considered. This allows for the consideration of correlations between
variables as well as extracting time-dependent relationships in the data [31], [33].
The steps for DPCA for fault detection are otherwise identical to that of PCA, and are
as follows:
1. Scale data matrix to zero mean and unit variance.
2. Generate the augmented matrix using l previous observations.
3. Do SVD on the covariance matrix of this augmented matrix.
4. Determine the number of principal components to retain for the model*
5. Determine the Q statistic by using the residuals (defined in section 4.2)
*There are a few ways to do this task [34]. In this research, the cumulative percent
variance is calculated, and then a number is chosen from the plot that represents
approximately 90% of the variance in the data. Normally, a higher percentage is desired
81
to represent the model, but this small tradeoff must be made to use DPCA with the
chosen robust PCA method.
3.3 Robust PCA Methods
3.3.1 Robust Estimators
The main problem that robust PCA addresses is the presence of missing, corrupt
or outlying observations [31]. To overcome this, robust estimation methods carry out
conventional computations on robust covariance or correlation matrices instead of
standard non-robust ones that can be adversely affected by corrupt measurements [35].
These are the first robust PCA techniques to have been developed and are often presented
purely in a mathematical sense instead of industry applications [10].
Robust measures for determining covariance matrices were being developed
through the 1970s, most notably by Devlin et al, Maronna and Huber [36-38]. Devlin et
al’s work in 1981 was the first of several studies to compare these methods when used for
PCA [35], [39].They used Monte Carlo methods (repeated random sampling) to compare
various robust estimation techniques including separate bivariate analysis (element-by-
element basis), multivariate trimming and M-estimators (simultaneous manipulation of
all variables). In their study, M-estimators were found to have the best performance, but
were found to break down with large dimensionality or when outliers are asymmetric.
The fraction of outliers that can be tolerated in the M-estimators approach is 1/(p+1) or
1/p where p is the number of variables.
82
Devlin et al’s study considered four robust estimation methods [35]. The first
approach was elementwise and involved the separate estimation of each off diagonal
element by a robust correlation coefficient. The resulting matrix may not be positive-
definite in this case and must be adjusted by shrinking. If there is missing data, the new
correlation matrix is formed using only the number of complete data pairs. It follows that
an estimate for the covariance matrix can be calculated by simply rescaling the
correlation matrix. The remaining approaches are all multivariate (involve the
simultaneous manipulation of all variables) and iterative. In the first, robust regression
calculations using weighted least squares (with ordinary least squares as a starting point)
result in a robust covariance matrix estimate. The second is called multivariate trimming
and also carries out iterative calculations with the Fisher-z transform being used to
determine when to stop. The last approach is similar to multivariate trimming, except in
this case, Huber type weights are assigned to each vector based on the squared distances.
The drawback to the multivariate approaches is that there is an inherent assumption that
there is no missing data, however, the estimated matrices are positive-definite.
Another key study in the field of robust estimators was published by Croux and
Haesbroeck [39]. Similar to Devlin et al, this study compares the performance of several
different kinds of robust estimators, including an “S-estimator” as well as a “one-step
reweighted minimum covariance determinant estimator”. S-estimators have higher
breakdown points than M-estimators and were found in this study to also be efficient to
compute.
83
One of the most recent studies on the topic of robust estimators in PCA was by
Zhang and Lerman [40]. They developed and demonstrated a new M-estimator that
converted the problem into a convex minimization that was then used to estimate a
“robust inverse sample covariance”. The study successfully demonstrated an example
using image processing. Tharrault et al combined such a robust PCA method with the use
of structured residuals and then applied it to fault detection [41]. Robust estimation
techniques can generally be found in literature that pertains closely to machine learning
and there is not much literature applying this technique to process monitoring.
As robust estimators are the most classic way to robustify PCA, newer more
sophisticated robust PCA methods have been developed in the past two decades,
including probabilistic methods, which are presented next, and principal component
pursuit, the method ultimately used in this research. Furthermore, the newer methods are
capable of handling a wider variety of faults and have been proven as improvements on
the previous methods.
3.3.2 Probabilistic Methods
Probabilistic PCA was first proposed by Tipping and Bishop and is, as the name
suggests, a probability-based version of PCA that allows one to capture information
about the noise in the data while simultaneously presenting a model [42]. The incipient
paper in the field discussed the method of finding the principal axes of observed data by
assuming a noise model and then using maximum-likelihood estimation for the
parameters. They then outlined an expectation maximization (EM) algorithm for
84
estimating this subspace iteratively using a latent variable model (model that uses latent
variables to describe observable variables) called factor analysis.
A key assumption in the original probabilistic PCA method was previous
knowledge of the number of latent variables (effectively, the number of retained principal
components). However, it presented a massive advantage over conventional PCA; the
projections could now be obtained even with missing values resulting in a new robust
PCA method. Studies that develop and apply probabilistic PCA methods have used
several different distributions to model the noise. Zhao and Jiang’s and Khan and
Dellaert’s studies used a Student-t distribution [43], [44]. Luttinen et al also used a
Student-t distribution, however, they assumed that outliers could be present
independently and therefore applied the distribution to each dimension [32].
An extension of probabilistic PCA, called Bayesian PCA was also first developed
by Bishop [45] but has recently gained increased interest in machine learning as well as
in its possible applications to process monitoring. Bayesian PCA assumes a prior
distribution over the parameter set. It presents a distinct advantage over probabilistic
PCA since it can automatically determine the number of latent variables and therefore the
number of retained principal components [45]. It is also able to better utilize limited data
sets, which is an advantage of Bayesian modelling in general. Luttinen et al demonstrated
the use of Bayesian PCA on two data sets and showed that it was possible to accurately
reconstruct corrupt data using other elements from the same data vector (it was
previously not possible to differentiate outliers between variables) [32]. Ding et al
85
applied Bayesian PCA and used it to differentiate between small dense noise and spare
outliers (also not possible in other robust PCA methods) [46].
Nounou et al applied Bayesian PCA for process monitoring by assuming prior
knowledge of both the parameters and the measurements themselves (most applications
use prior knowledge of the parameters alone) [47]. More recently, Ge and Song
successfully proved the application of Bayesian PCA to robust process monitoring and
proposed a fault reconstruction method using the resulting model [48]. They also
improved the original Bayesian PCA method by using a “variational inference” algorithm
to make it more efficient. Liu et al apply a variational Bayesian PCA for process
monitoring in a wastewater treatment plant [49].
As noted earlier, one of the key advantages of probabilistic PCA – its ability to
include more information about prior knowledge – is also one of its biggest assumptions.
Also, though they are very promising, these methods are more suited to modelling limited
data sets where the number of features is greater than the number of samples. Finally,
since DPCA has already been tested and used successfully in this application, a robust
PCA method that least intrusively fits into the current method would be ideal. Therefore,
while these methods are studied and considered, they were not chosen for this
application.
3.4 Principal Component Pursuit
The third and the most recently developed method for robust PCA is a family of
methods that can be referred to as principal component pursuit methods. Recall that
conventional PCA utilizes the l
2
norm and can be solved using singular value
86
decomposition, but is not able to handle any corrupted entries. Principal component
pursuit (PCP) is a robust PCA method that uses a convex minimization problem to
recover the low rank matrix even when there are gross sparse errors present and is
therefore more robust to outliers [50], [51]. It was proven to be successful at recovering
both the low rank and sparse matrix exactly (under a few conditions) by Candes et al
[31]. Further, no other robust PCA method has a polynomial time algorithm that still has
a guarantee of strong performance [51].
There are many popular applications of PCP, particularly in image processing
[51]. PCP has been applied to the field of process fault detection and diagnosis in very
few studies. Isom and Labarre first showed that PCP-based process monitoring was able
to improve on conventional PCA by its ability to isolate more faults [52]. However, their
study was built upon the same assumptions used in PCA of fault-free data. More recently,
Cheng and Chen applied PCP to process monitoring including the key step of training
data preprocessing [51]. They noted that scaling is vital in PCP even though intuitively it
should not be affected (since the rank and sparsity of a matrix do not change with scale).
The PCP process resulted in a PC subspace and a residual subspace (as in PCA),
however, they were calculated simultaneously because of its setup as a convex
optimization. Cheng and Chen further discussed applications of PCP to process
monitoring by representing the fault signal as a post-filtered residual that makes the
detection/isolation/reconstruction steps more straightforward [51]. The main advantage of
using PCP instead of conventional PCA for process monitoring is its ability to be
87
sensitive to errors that are represented by the two PCA statistics (T
2
and Q)
simultaneously.
The PCP problem can be defined as follows, for a matrix M that can be written as
the combination of a low rank matrix L
0
and a sparse matrix S
0
:
𝑝𝑒𝑛𝑒𝑝𝑒𝑚𝑒 ‖𝐿 ‖
∗
+ 𝜆 ‖𝑆 ‖
1
(3.12)
𝑒 𝑠 𝑏 𝑠 𝑒 𝑠𝑡 𝑡 𝑜 𝑀 = 𝐿 + 𝑆 (3.13)
In the above problem, ‖∗‖
∗
refers to the nuclear norm (sum of singular values)
and ‖∗‖
1
refers to the l
1
norm (sum of magnitudes of the entries). The λ parameter
represents a trade-off between rank and sparsity. An effective way to solve the PCP
algorithm is the augmented Lagrangian multiplier (ALM) with alternating directions
method [53], [54]. ALM is a common method for solving constrained optimization
problems, by instead creating unconstrained problems but adding a penalty term. The
ALM function of the above problem is [54]:
L(𝐿 , 𝑆 , 𝑌 , 𝜇 ) ≐ ‖𝐿 ‖
∗
+ 𝜆 ‖𝑆 ‖
1
+ 〈𝑌 , 𝑀 − 𝐿 − 𝑆 〉 +
𝜇 2
‖𝑀 − 𝐿 − 𝑆 ‖
𝐹 2
(3.14)
The alternating directions method (ADM) is an improvement on ALM for solving
convex minimization problems with linear constraints. In the classic ALM method, both
L and S are minimized simultaneously as follows [53]:
(𝑆 𝑘 +1
, 𝐿 𝑘 +1
) ∈ argmin
𝑆 ,𝐿 L (𝑆 , 𝐿 , 𝑌 𝑘 ) (3.15)
𝑌 𝑘 +1
= 𝑌 𝑘 + 𝜇 (𝑀 − 𝐿 𝑘 +1
− 𝑆 𝑘 +1
)
88
However, the structure and consequently the problem itself is separable, so ADM
can be applied to minimize them consecutively in each iteration [54].
𝐿 𝑘 +1
∈ argmin
𝐿 L (𝐿 , 𝑆 𝑘 , 𝑌 𝑘 , 𝜇 𝑘 ) (3.16)
𝑆 𝑘 +1
∈ argmin
𝑆 L (𝐿 𝑘 +1
, 𝑆 , 𝑌 𝑘 , 𝜇 𝑘 )
𝑌 𝑘 +1
= 𝑌 𝑘 + 𝜇 (𝑀 − 𝐿 𝑘 +1
− 𝑆 𝑘 +1
)
The determination of 𝐿 𝑘 +1
requires an SVD calculation with the complexity
O(𝑛 2
3
), and requires the most computational load in the solution. The overall steps of the
algorithm are summarized from [54] in Algorithm 4 (note that matrix D is the diagonal
matrix from the SVD calculation within the algorithm).
Algorithm 4: ALM with Alternating Directions
1. Input 𝑴 ∈ ℝ
𝒏𝒏 ×𝒏𝒏
2. Initialize
𝑺 𝟎 = 𝒀 𝟎 = 𝟎
𝝀 = 𝒏 �𝒎𝒎𝒙 {𝒏𝒏 , 𝒏𝒏 } ⁄
𝝁 = (𝒏𝒏𝒏𝒏 ) 𝟒 ‖𝑴 ‖
𝒏 , 𝒌 = 𝟎 ⁄
3. while not converged, do
𝑳 𝒌 +𝒏 ← 𝑫 𝒏 /𝝁 (𝑴 − 𝑺 𝒌 + 𝝁 −𝒏 𝒀 𝒌 );
𝑺 𝒌 +𝒏 ← 𝑺 𝝀 /𝝁 (𝑴 − 𝑳 𝒌 +𝒏 + 𝝁 −𝒏 𝒀 𝒌 );
𝒀 𝒌 +𝒏 ← 𝒀 𝒌 + 𝝁 (𝑴 − 𝑳 𝒌 +𝒏 − 𝑺 𝒌 +𝒏 );
4. end while
5. Output: 𝑳 𝒌 and 𝑺 𝒌
Zhou et al. extend the original method to result in an estimate of the low rank
matrix that is stable to both small entry-wise noise and gross sparse error (only the latter
89
had been shown) by extending the method to a noisy model, where the matrix is a
combination of L
0
, S
0
, and a noise term Z
0
[51]. The new problem becomes:
𝑝𝑒𝑛𝑒𝑝𝑒𝑚𝑒 ‖𝐿 ‖
∗
+ 𝜆 ‖𝑆 ‖
1
(3.17)
𝑒 𝑠 𝑏 𝑠 𝑒 𝑠𝑡 𝑡 𝑜 ‖𝑀 − 𝐿 − 𝑆 ‖
𝐹 ≤ 𝛿
The 𝛿 represents a bound with two conditions: first, it must be greater than 0, and
second, the Z
0
term must fall under this value. This more relaxed version of PCP results
in stable estimates of the low rank and sparse matrices, as is therefore referred to as stable
PCP. The algorithm follows the same steps outlined in Algorithm 1, however, the
convergence criteria is now 𝛿 instead of 0, and the parameter λ can no longer be
represented by the above equation (𝝀 = 1 �𝑝𝑎𝑥 {𝑛 1, 𝑛 2} ⁄ ) and must now be tuned.
In the current state of this research, PCP has been chosen as the robust PCA
method to adapt to DPCA and recursive DPCA. First, the PCP algorithm is flexible
enough that it can be adapted to DPCA in two different ways. Secondly, PCP has been
proven to recover the underlying matrices exactly [31]. Thirdly, one of the goals of the
project will be to create a recursive method so that the model can be updated if there are
changes in the process. If PCP is successfully adapted to the DPCA algorithm, the
extension from robust DPCA to robust recursive DPCA could mirror the extension of
normal DPCA to recursive DPCA. Finally, PCP is an algorithm that has piqued interest in
recent years in particular for process monitoring, therefore, this research is relevant to
today’s industry needs.
However, there are some drawbacks to using PCP. Since the solution of PCP also
requires an SVD calculation, it has high computational complexity, which is also the
90
main reason why PCA cannot be easily applied to very large systems. In particular, the
second method tested in this work (see Algorithm 6) carries out SVD on a much larger
matrix (the time lagged matrix used in DPCA). While this is successful on the tested data
sets, other strategies to adapt PCP for use with very large data sets may need to be
considered.
3.5 Application of PCP to Robustify DPCA
The stable PCP algorithm described in the previous section has been used as a
robust PCA method. In this research, it is incorporated into DPCA in two ways to
determine the more effective robust DPCA method. The two ways are referred to as the
Sequential Method and the Integrated Method in this work. As seen in the Case Studies in
the following section, the integrated method is proven to be effective. With normal data,
it mimics the results of ‘normal’ DPCA. With faulty data, the method is able to detect and
eliminate faults, creating a much more reliable model.
91
3.5.1 A Sequential Method
In the first method, PCP and DPCA are combined sequentially. PCP is used to
generate a ‘fault free’ training data set from faulty data. Then DPCA is carried out as
usual on the newly cleaned data set.
Algorithm 5: Sequential Method (PCP, then DPCA)
1. Input 𝑴 ∈ ℝ
𝒏𝒏 ×𝒏𝒏
2. Initialize
𝑺 𝟎 = 𝒀 𝟎 = 𝟎
𝝀 ≈ 𝒏𝟎
−𝒏
𝝁 = (𝒏𝒏𝒏𝒏 ) 𝟒 ‖𝑴 ‖
𝒏 , 𝒌 = 𝟎 ⁄
3. while not converged, do (same as Algorithm 1, different
convergence criteria)
𝑳 𝒌 +𝒏 ← 𝑫 𝒏 /𝝁 (𝑴 − 𝑺 𝒌 + 𝝁 −𝒏 𝒀 𝒌 );
𝑺 𝒌 +𝒏 ← 𝑺 𝝀 /𝝁 (𝑴 − 𝑳 𝒌 +𝒏 + 𝝁 −𝒏 𝒀 𝒌 );
𝒀 𝒌 +𝒏 ← 𝒀 𝒌 + 𝝁 (𝑴 − 𝑳 𝒌 +𝒏 − 𝑺 𝒌 +𝒏 );
4. end while
5. Output: 𝑳 𝒌 and 𝑺 𝒌 , where the former is the uncorrupted,
unscaled data and the latter is a sparse matrix
6. Scale 𝑳 𝒌 to zero mean and unit variance
7. Generate augmented matrix, 𝑳 𝒌 ,𝒎 𝒂𝒂 with lagged
variables from 𝑳 𝒌
8. Do DPCA on 𝑳 𝒌 ,𝒎 𝒂𝒂
(a) Compute cov(𝑳 𝒌 ,𝒎 𝒂𝒂
)
(b) Do SVD on cov(𝑳 𝒌 ,𝒎 𝒂𝒂
)
(c) Separate the set of variables that represent the most
variance (the DPCA model) from the least variance
(for the Q statistic)
9. Calculate Q statistic for fault detection
92
3.5.2 An Integrated Method
In the second method, an augmented matrix is first created (the first step in
DPCA). Then, PCP is done on this matrix. Finally, the recovered (and still augmented)
matrix is used to build the DPCA model. This integrates the two algorithms to create a
new robust DPCA method.
Algorithm 6: Integrated Method (PCP on augmented matrix)
1. Input 𝑴 ∈ ℝ
𝒏𝒏 ×𝒏𝒏
2. Generate augmented matrix, 𝑴 𝒎 𝒂𝒂 with lagged variables
from 𝑴
3. Initialize
𝑺 𝟎 = 𝒀 𝟎 = 𝟎
𝝀 ≈ 𝒏𝟎
−𝒏
𝝁 = (𝒏𝒏𝒏𝒏 ) 𝟒 ‖𝑴 ‖
𝒏 , 𝒌 = 𝟎 ⁄
4. while not converged do (same as Algorithm 1, different
convergence criteria)
𝑳 𝒌 +𝒏 ← 𝑫 𝒏 /𝝁 (𝑴 𝒎 𝒂𝒂
− 𝑺 𝒌 + 𝝁 −𝒏 𝒀 𝒌 );
𝑺 𝒌 +𝒏 ← 𝑺 𝝀 /𝝁 (𝑴 𝒎 𝒂𝒂 − 𝑳 𝒌 +𝒏 + 𝝁 −𝒏 𝒀 𝒌 );
𝒀 𝒌 +𝒏 ← 𝒀 𝒌 + 𝝁 (𝑴 𝒎 𝒂𝒂 − 𝑳 𝒌 +𝒏 − 𝑺 𝒌 +𝒏 );
5. end while
6. Output: 𝑳 𝒌 and 𝑺 𝒌 , where the former is the uncorrupted,
unscaled augmented matrix and the latter is a sparse matrix
7. Scale 𝑳 𝒌 to zero mean and unit variance
8. Do DPCA on 𝑳 𝒌 ,
(a) Compute cov(𝑳 𝒌 )
(b) Do SVD on cov(𝑳 𝒌 )
(c) Separate the set of variables that represent the most
variance (the DPCA model) from the least variance (for
the Q statistic)
9. Calculate Q statistic for fault detection
93
3.6 Case Studies: Application to Process Data
3.6.1 Data Sources and Performance Evaluation
There are two case studies presented in this section. Case Study 1 uses data from a
field test on steam generator data. Two sets of data are available for this equipment: a
clean data set representing normal operation and a corrupted data set, with sensor faults
introduced during the field test. The set contains 5 variables and 10000 samples. Case
Study 2 uses operation data from a single piece of process equipment to ensure that tags
are correlated. There is no missing data and no obvious faults, and the data is from
normal process operation. Therefore, the data is considered to be ‘largely normal’ and a
low detection rate is desired (though it is unknown whether the data contains faults). This
data set contains 7 variables and 4200 samples.
The two case studies allow for two different approaches for analysis. Since the
first data set has a known clean data set available, detection and false alarm rates can be
calculated, as well as the similarity between the normal data set and the one cleaned by
the PCP algorithm. Detection will be determined using the square prediction error (SPE),
also known as the Q statistic. The second data set does not have any basis for
comparison, but is considered to be normal operation data and is therefore expected to
have a low detection rate. Faults are then added to several variables to determine the
effectiveness of the algorithm. Data reconstructions based on the models with ‘normal’
data and ‘faulty’ data will then be compared. The calculations for these performance
metrics are listed below.
94
1. Similarity
The similarity between two matrices, X
1
and X
2
, is determined by calculating the
cosine of the angle between them:
cos ∠(𝑋 1
, 𝑋 2
) =
𝑡 𝐶𝑎 𝑡 𝐶 (𝑋 1
𝑋 2
𝑇 )
�𝑡 𝐶𝑎𝑡 𝐶 (𝑋 1
𝑋 1
𝑇 )�𝑡 𝐶𝑎 𝑡 𝐶 (𝑋 2
𝑋 2
𝑇 )
(3.18)
2. SPE (or Q) statistic
This statistic is the squared norm of the residual vector (it uses the residuals to
determine how well a sample conforms to the model).
𝑄 = 𝑒 𝑇 𝑒 , 𝑒 = (𝐼 − 𝑃 𝑃 𝑇 )𝑥 (3.19)
The control limits for the Q statistic can be calculated using the method by
Nomikos and Macgregor [55]:
𝛿 2
= 𝑔 𝑆𝑆𝑆
𝜒 𝛼 2
(ℎ
𝑆𝑆𝑆
) (3.20)
where (1 − 𝛼 ) × 100% is the confidence level, 𝑔 𝑆𝑆𝑆
= 𝜃 2
𝜃 1
⁄ , ℎ
𝑆𝑆𝑆
= 𝜃 1
2
𝜃 2
⁄ , 𝜃 1
=
∑ 𝜆 𝑆 𝑛 𝑆 =𝑂 +1
, 𝜃 2
= ∑ 𝜆 𝑆 2 𝑛 𝑆 =𝑂 +1
and 𝜆 𝑆 is the i
th
eigenvalue of the covariance.
The Q statistic is used for DPCA specifically because it is statistically
independent with respect to the time lags if enough are included (the T
2
statistic, which is
also commonly used cannot be extended the same way) [56].
95
3. Detection rate
This value represents the fraction of the actual faults that are successfully detected
based on the threshold for the Q statistic (𝑄 𝑎 ):
𝐷 𝑒𝑡𝑒𝑠𝑡𝑒𝑜𝑛 𝐶𝑎𝑡𝑒 =
𝑁𝑁 𝐶𝑏𝐶𝐶 𝐶 𝑜 𝐶𝐶𝐶𝐶𝐶 𝑡𝑡𝑂 𝐶 𝐷 𝐶𝑡𝐶𝑡𝑡𝐶𝐶 𝐹𝑎𝑁𝑂𝑡𝐶
𝑁𝑁 𝐶𝑏𝐶𝐶 𝐶 𝑜 𝑇 𝐶𝑡𝑎𝑂 𝐷 𝐶𝑡𝐶𝑡𝑡𝐶𝐶 𝐹 𝑎𝑁𝑂𝑡𝐶 (3.21)
where the number of correctly detected faults is determined by the overlap in the samples
where 𝑄 > 𝑄 𝑎 and the faulty samples.
4. False alarm rate
This value represents the fraction of the total faults detected that were falsely
reported:
𝐹 𝑎𝑙𝑒𝑒 𝑎𝑙 𝑎𝑒𝑝 𝑒𝑎𝑡𝑒 =
𝑁𝑁𝐶𝑏𝐶𝐶 𝐶 𝑜 𝐹 𝑎𝑂𝐶𝐶 𝐴𝑂𝑎𝐶𝐶𝐶 𝑁𝑁𝐶𝑏𝐶𝐶 𝐶 𝑜 𝑁 𝐶𝐶𝐶𝑎𝑂 𝑆 𝑎𝐶𝐶𝑂𝐶𝐶 (3.22)
where the number of false alarms is determined by the overlap in the samples where
𝑄 > 𝑄 𝑎 and normal samples.
5. Percent faulty
This value represents the fraction of the total data set that was reported as faulty
(see Case Study 2):
% 𝑓 𝑎𝑠𝑙𝑡𝑦 =
𝑁𝑁𝐶𝑏𝐶𝐶 𝐶 𝑜 𝑇 𝐶𝑡𝑎𝑂 𝐷 𝐶𝑡𝐶𝑡𝑡𝐶𝐶 𝐹𝑎𝑁𝑂𝑡𝐶
𝑁𝑁𝐶𝑏𝐶𝐶 𝐶 𝑜 𝑆𝑎 𝐶 𝐶𝑂𝐶𝐶 (3.23)
96
3.6.2 Case Study 1
Four different scenarios are compared in this case study. In the first scenario,
DPCA is carried out on normal data. This is the ideal scenario, since DPCA modelling
has been successfully carried out on uncorrupted data. The second scenario is the worst
case, where DPCA is carried out on corrupted data. The third scenario is the sequential
method outlined in Algorithm 5, where PCP is used to clean the data first, and this data is
then used to build the DPCA model. The final scenario is the integrated method outlined
in Algorithm 6, where PCP is carried out directly on the augmented matrix to be used in
DPCA.
The first test is a simple comparison of the similarity between the various
scenarios to the ideal case. It is clear from Table 3.1 that the faulty DPCA is unacceptable
and did not result in a reliable model. However, both the sequential and integrated robust
PCA methods are very similar to the original DPCA results.
Table 3.1: Matrix Similarity Test
Test Similarity
Normal DPCA vs Faulty DPCA 0.4361
Normal DPCA vs Sequential Method 0.9981
Normal DPCA vs Integrated Method 0.9969
In the next test, detection and false alarm rates are calculated using 4000 samples
for training data, and the remaining 6000 as testing data. The results are presented in
Table 3.2. Once again, the goal is to meet or surpass the results of the normal DPCA
results, which show approximately 77% successful detection, and a low 6.5% false alarm
97
rate. The faulty data DPCA is unable to detect most of the faults, while the sequential
method results in a prohibitively high false alarm rate. The integrated method results in a
comparable fault detection rate to the normal DPCA, however, it also has a higher false
alarm rate.
Table 3.2: 4000 Samples for Training Data, 6000 for Testing, λ = 0.02
Test Detection of Fault False Alarm
DPCA Trained with Normal Data 0.7688 0.0651
DPCA Trained with Faulty Data 0.0815 0.0028
PCP then DPCA (Sequential), Faulty Data 0.9104 0.6178
Robust DPCA (Integrated), Faulty Data 0.7746 0.2995
There are several parameters that are required as inputs for the PCP and DPCA
method, including the number of principal components and 𝜇 , the parameter that
determines the convergence rate [24]. However, there are known methods to quickly
determine these values. The λ parameter, on the other hand, must be chosen for the stable
PCP method. Although there is an equation for it (see Algorithm 1), it only applies in the
original PCP problem where there is no noise matrix. Therefore, the next test (Table 3.3)
is identical to the previous; however, λ has been slightly changed. It is clear that even
with a slight change in λ from 0.02 to 0.022, the results are significantly affected.
Therefore it is vital that this parameter be tuned properly. Figure 3.1 shows the effect of λ
on the detection and false alarm rates on this data set. The parameter must be tuned to
balance the detection rate with the false alarm rate, both of which reduce as λ increases.
98
Table 3.3: Same Number of Samples as Previous Test, λ = 0.022
Test Detection of Fault False Alarm
DPCA Trained with Normal Data 0.7688 0.0651
DPCA Trained with Faulty Data 0.0815 0.0028
PCP then DPCA (Sequential), Faulty Data 0.8821 0.4217
Robust DPCA (Integrated), Faulty Data 0.5873 0.1021
In the next test, the number of training samples has been changed to 3000. The
results are very similar to the first test where the methods that include PCP show high
detection rates. However, the reduction in the training data set results in high false alarms
for both these methods as well. This shows that while the robust DPCA method can be
effective, it is important to have sufficient training data to create a reliable model.
Table 3.4: Same as Test 1, but with 3000 Samples Training Data
Test Detection of Fault False Alarm
DPCA Trained with Normal Data 0.7994 0.0976
DPCA Trained with Faulty Data 0.0507 0.0025
PCP then DPCA (Sequential), Faulty Data 0.9961 0.9769
Robust DPCA (Integrated), Faulty Data 0.8361 0.6151
Several other important observations can be made from these tests as well. First, it
is clear that in all tests, the DPCA model trained with faulty data results in very poor
detection, therefore proving the need for an alternative method. Figure 3.2 graphically
shows that the PCP algorithms are both comparably successful in detection using faulty
data as the DPCA model created with clean data. However, the integrated method is able
to detect some faults that the sequential method is not able to. Furthermore, though the
sequential method results in high detection rates in all the tests, it also consistently results
in high false alarm rates, making the results unreliable. The integrated method is effective
99
with proper tuning and sufficient training data; without one or the other, the detection rate
and false alarm rates can be adversely affected. However, it is important to recall that the
integrated method requires an SVD calculation on a much larger matrix (the time lagged
DPCA matrix), so it will be more complex and time consuming for very large data sets.
Figures 3.3 and 3.4 allow for a closer look at the detection results for the data set
analyzed in Figure 3.2. First, Figure 3.3 shows the location and magnitude of the false
alarms identified in this data set. It is clear that although the results from Tables 3.2-3.4
show high false alarm rates for this method, in reality the magnitude of the false alarms is
very small for most of the data set. The magnitude is only large where longer faults are
detected. This shows that the false alarm issue is of concern mainly when data sets are
very faulty, however, the magnitude of the false alarms for fault free data sets would be
very small. A threshold can be set for the false alarm magnitude that allows for the
elimination of the false alarm readings in this case. Figure 3.4 shows the detection in the
training data set after the PCP step has been carried out (that is, during an intermediate
step in the algorithm). It can be observed that PCP is successful in eliminating all of the
faults in the original data set, and therefore is creating a reliable model for detection in
the testing data set.
100
Figure 3.1: Effect of Lambda on the Detection and False Alarm Rates
0.01 0.015 0.02 0.025 0.03 0.035 0.04
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Lambda
Rate
Detection Rate
False Alarm Rate
101
Figure 3.2: Detection with Various DPCA/PCP Combinations
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
etect o t a ous C / C Co b at o s
Samples
Detection Index (Q)
Normal DPCA
Sequential PCP
Integrated PCP
101
102
Figure 3.3: Location and Magnitude of False Alarms
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
5
10
15
20
25
30
g
Sample
Magnitude of False Alarm
102
103
Figure 3.4: Detection in Training Data after Cleansing with PCP
500 1000 1500 2000 2500 3000 3500 4000
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
g g ( )
Samples
Detection Index
Faulty Data Detection
Cleansed Data Detection
103
104
3.6.3 Case Study 2
The results in this section are from analysis of data from a single unknown piece
of process equipment. In the first part, the data is used as-is (see Figure 3.5 for plots of
the raw data). In the second part, several outliers are added to a few of the tags. In both
sets of results, 2000 samples are used for training data, and the remaining as testing data.
Figure 3.6 shows the detection results for the data set using normal DPCA, and
both the sequential and integrated methods. Note that although it is unknown whether
there are faults within the data, it is clear that the integrated method detection results are
very similar to the normal DPCA results and therefore the data is ‘largely normal’, and a
low percentage of faults is acceptable. The sequential method in this case has a very high
percent faulty result; this is likely reflective of the same problem of high false alarm rates
in this method.
105
Figure 3.5: Raw Data for Case Study 2
0 1000 2000 3000 4000 5000
82
84
86
88
90
92
94
96
Samples
Oil Cleaning Temp (deg F)
0 1000 2000 3000 4000 5000
168
170
172
174
176
178
180
Samples
Oil and Water Temp (deg F)
0 1000 2000 3000 4000 5000
20
20.2
20.4
20.6
20.8
21
Samples
Oil and Water Level (ft)
0 1000 2000 3000 4000 5000
150
155
160
165
170
175
180
185
Samples
Oil and Water Temp (deg F)
0 1000 2000 3000 4000 5000
16.2
16.4
16.6
16.8
17
17.2
17.4
17.6
Samples
Oil and Water Level 1 (ft)
0 1000 2000 3000 4000 5000
11.5
12
12.5
13
13.5
Samples
Oil Cleaning Level 2(ft)
0 1000 2000 3000 4000 5000
166
168
170
172
174
176
178
Samples
Oil and Water Temp (deg F)
105
106
Figure 3.6: Detection with Various DPCA/PCP Combinations
500 1000 1500 2000 2500 3000 3500 4000
0
10
20
30
40
50
60
70
80
90
100
Samples
Detection Index (Q)
Normal DPCA (Xd)
Integrated (XL)
Sequential (Xd4)
107
Figure 3.7 shows the reconstruction of one of the tags using the two PCP based
models and compares them to the raw data. The integrated method is successful at
recreating the raw data, while the sequential method is not able to recreate the raw data.
Figure 3.8 presents this information numerically. The second row in each table, which
refers to the sequential method, is not as close to the raw data as the integrated method
(third row). Finally, the integrated method also results in relatively low detection (Detint
refers to percent faulty) which is reasonable since the data is supposed to be
representative of normal operation.
108
Figure 3.7: Comparison in Reconstructions from PCP Methods to Raw Data
0 200 400 600 800 1000 1200 1400 1600 1800 2000
145
150
155
160
165
170
175
180
185
Co pa so bet ee a ata, C esuts a d C t aug e ted at
Samples
Oil and Water Temperature (deg F)
Raw Data
Sequential PCP
Integrated PCP
109
Figure 3.8: Numerical Results for Integrated and Sequential PCP/DPCA Methods
In the next test, several obvious outliers/missing data points have been introduced
to the data set (see Figure 3.9 for the raw data). From Figure 3.10, it can be seen that the
normal DPCA method is no longer effective as the presence of outliers has greatly
affected the error bound. While the sequential method still has the same problem of over-
detection, both PCP methods have successfully detected all 6 introduced outliers as
anomalies. The reconstruction of one of the variables’ training data set (Figure 3.11)
shows that the integrated PCP method is able to completely eliminate the outlier, while
the sequential method is able to reduce its magnitude. The integrated method also is
particularly successful as the previous low detection rate as only been affected slightly
(from 10.07% to 10.32%, Figure 3.12), meaning that the new outliers have been detected,
but nothing else has changed from the previous test.
110
Figure 3.9: Raw Data: Outliers Added to Tags 2, 3, 5, 6, 7
0 2000 4000 6000
82
84
86
88
90
92
94
96
Samples
Oil Cleaning Temp (deg F)
0 2000 4000 6000
0
50
100
150
200
Samples
Oil and Water Temp (deg F)
0 2000 4000 6000
0
200
400
600
800
1000
Samples
Oil and Water Level (ft)
0 2000 4000 6000
150
155
160
165
170
175
180
185
Samples
Oil and Water Temp (deg F)
0 2000 4000 6000
0
200
400
600
800
1000
Samples
Oil and Water Level 1 (ft)
0 2000 4000 6000
0
2
4
6
8
10
12
14
Samples
Oil Cleaning Level 2(ft)
0 2000 4000 6000
0
50
100
150
200
Samples
Oil and Water Temp (deg F)
110
111
Figure 3.10: Detection with Novel Robust DPCA (with Outliers)
500 1000 1500 2000 2500 3000 3500 4000
0
50
100
150
200
250
300
350
400
450
500
Samples
Detection Index (Q)
Normal DPCA (Xd)
Integrated (XL)
Sequential (Xd4)
112
Figure 3.11: Reconstruction of Data Tag with Outlier
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
20
40
60
80
100
120
140
160
180
Samples
Oil and Water Temperature (deg F)
Raw Data
Sequential PCP
Integrated PCP
113
Figure 3.12: Numerical Results for PCP Methods - with Outliers
3.7 Recursion and Implementation in DPCA
One of the disadvantages of all PCA and PCP methods discussed in this chapter is
that these models are time invariant [56]. However, that is not the case with real
processes, particularly in oil and gas facilities. Changes in the process such as varying
product requirements, quality of raw materials, operational changes and maintenance can
all affect incoming measurements. This in turn would cause changes in mean and
variance, correlation amongst variables and ultimately a change in the empirical model
itself. Therefore, modelling process data should not only consider the accuracy of the
114
‘first’ model created based on the available training data, but should also consider how
this model can be efficiently updated if the process were to change with respect to time.
Li et al listed a number of considerations to extend PCA to a recursively updating
method [57]. They include the recursive updates of mean, efficient methods to carry out
PCA, and recursively determining the number of principal components and the
monitoring statistics. In their study, an efficient recursive updating method for the
correlation matrix is derived, and following this, several methods for recursive PCA are
presented. In a technical report by Dong [58], one of these methods is successfully
extended to the application of DPCA. This method is described in the following section.
3.7.1 Recursive DPCA
Prior to the recursive PCA calculations, the mean, variance and correlation
matrices must be recursively updated with every new batch of incoming data. As shown
in Li et al, the recursive calculations for the mean, variance and correlation matrix are
shown in equations 3.24 – 3.26 [57]. The first task is to calculate the recursive mean (a
weighted mean of old and new information) as this information is then used to calculate
the variance and correlation matrix. The recursive calculation for the mean is:
𝒎 𝑛 𝐶 𝑛 = 𝜇 𝒎 𝐶𝑂𝐶 + (1 − 𝜇 )(𝑿 𝑛 𝐶 𝑛 )
𝑇 𝒏 (3.24)
where m is the mean, X
new
is the new data, 1 is a column vector comprised of 1’s and μ is
a forgetting factor. In a traditional weighted mean calculation, 𝜇 = 𝑛 𝐶𝑂𝐶 (𝑛 𝐶𝑂𝐶 + 𝑛 𝑛 𝐶 𝑛 ) ⁄ .
However, if μ > 𝑛 𝐶𝑂𝐶 (𝑛 𝐶𝑂𝐶 + 𝑛 𝑛 𝐶 𝑛 ) ⁄ , then more weight is placed on the new data and this
parameter behaves as a forgetting factor.
115
The recursive calculation for the variance is:
𝜎 𝑛 𝐶 𝑛 ,𝑆 = 𝜇 � 𝜎 𝐶𝑂𝐶 ,𝑆 + ∆𝒎 2
(𝑒 )�+ (1 − 𝜇 )
1
𝑛 𝑛𝑛𝑛 ‖𝑿 𝑛 𝐶 𝑛 (: , 𝑒 ) − 𝒏𝒎 (: , 𝑒 )‖
2
(3.25)
where along with the previously defined quantities, 𝜎 𝑆 is the variance of the ith variable,
and ∆𝒎 = 𝒎 𝑛 𝐶 𝑛 − 𝒎 𝐶𝑂𝐶 .
The recursive calculation for the correlation matrix is:
𝑹 𝑛 𝐶 𝑛 = 𝜇 𝛴 𝑛 𝐶 𝑛 −1
(𝛴 𝐶𝑂𝐶 𝑹 𝐶𝑂𝐶 𝛴 𝐶𝑂𝐶 + ∆𝒎 ∆𝒎 𝑇 )𝛴 𝑛 𝐶 𝑛 −1
+ (1 − 𝜇 )
1
𝑛 𝑛𝑛𝑛 (𝑿 𝑛 𝐶 𝑛 −
𝒏 𝒎 𝑇 )(𝑿 𝑛 𝐶 𝑛 − 𝒏 𝒎 𝑇 )
𝑇 𝛴 𝑛 𝐶 𝑛 −1
(3.26)
where along with the previously defined quantities, 𝑹 is the correlation matrix and 𝛴 is a
diagonal matrix with the standard deviations of the variables [57].
Now that the recursive updates for the three main quantities are defined, the next
task is to recursively implement PCA. For this, an iterative algorithm called the Lanczos
algorithm can be used. This algorithm described by Golub and Van Loan has several
advantages: there is no need to overwrite the original matrix, it does not make any
assumptions about the correlation matrix, it requires less storage and only the extreme
eigenvalues need to be calculated [57], [58], [59]. The goal in the Lanczos algorithm is to
determine a tridiagonal matrix using the correlation matrix:
𝜞 = 𝑽 𝑇 𝑹 𝑛 𝐶 𝑛 𝑽 (3.27)
where 𝜞 is a tridiagonal matrix. It has been shown that the extreme eigenvalues of 𝜞 and
𝑹 𝑛 𝐶 𝑛 are approximately the same [57]. This method is also referred to as Lanczos
tridiagonalization.
116
3.8 Preliminary Results for Recursion
Now that the recursive method for DPCA has been described in the previous
section, this section provides an example. In Figure 3.13, a normal data set (the same
used in section 3.6.2: Case Study 1) is analyzed using the recursive DPCA algorithm. It is
clear that with an updating model, the control limit is able to follow along with the
detection index and result in relatively few false alarms. However, this also depends on
the batch-wise interval chosen. In Figure 3.13, an interval of 500 points is chosen, while a
smaller interval of 300 in Figure 3.14 refines the model even further and perhaps too
much since we already know this data set is clean.
Figure 3.15 shows the same data set, this time with outliers present in the testing
data set. This shows that as long as there is a reliable way to cleanse the training data set,
the future model updates can compensate for outliers, and the control limit shows that the
outliers will be successfully detected. While this method may be very effective for
recursive DPCA, the direct application to the robust DPCA methods may only be
successful using the sequential method (not the integrated method). This is because in the
sequential method, the PCP part is completed before the DPCA set up, and the recursive
method can be applied at this point. However, in the integrated method, the PCP is
applied within the process and is inherently a first order optimization unlike PCA which
is second order. In this case, other methods such as a gradient approach may need to be
investigated.
117
Figure 3.13: Fault Detection on Clean Data using Recursive DPCA
0 500 1000 1500 2000 2500 3000
10
-3
10
-2
10
-1
10
0
Sample
Q Index
Q index
Control limit
118
Figure 3.14: Fault Detection on Clean Data (Smaller Interval) using Recursive DPCA
0 500 1000 1500 2000 2500 3000
10
-3
10
-2
10
-1
10
0
Sample
Q Index
Q index
Control limit
119
Figure 3.15: Fault Detection in Faulty Data using Recursive DPCA
0 500 1000 1500 2000 2500 3000
10
-4
10
-2
10
0
10
2
10
4
10
6
10
8
Sample
Q Index
Q index
Control limit
120
3.9 Summary
There are several ways to build data-driven models including the classic PCA,
which extracts correlation information to build a model. This can be extended to DPCA
which includes time lagged variables to be able to represent dynamic processes.
However, there are several drawbacks to DPCA, including its inability to handle any
missing or corrupted data. Robust PCA methods can address this problem. There are
many types of robust PCA being explored currently with process monitoring applications,
including probabilistic PCA and robust estimators. This work uses PCP and incorporates
it into DPCA to create a robust PCA method. The PCP method is incorporated in two
ways and compared to traditional PCA with clean and faulty data. Both the methods
present a significant improvement over DPCA on faulty data, which fails to detect most
of the faults. The integrated method, where PCP is done on the augmented matrix to be
used in DPCA, is effective under certain conditions which include sufficient training data
and correct tuning of the lambda parameter.
The next step for this research was to extend the method to a recursive robust
DPCA method that can update the model automatically and to test the method. This will
allow the new method to be incorporated into the field and tested for effectiveness. An
example of recursive dynamic PCA methods is discussed. Finally, the extension of this
method to process fault detection should be investigated. Ideally, the result should be a
method that is robust, online and automatic, and can also attempt to diagnose a fault to
decide whether it is a sensor fault or process fault. These ideas will be discussed in
Chapters 4.
121
CHAPTER 4 : EXTENSIONS FOR PROCESS FAULT
DETECTION APPLICATION
4.1 Introduction
The majority of this dissertation discusses process data analysis with the presence
of faulty data. These data sets may include faults such as spikes, missing values, frozen
sensors, flat-lines and so on, which have been discussed extensively in the previous
chapters. However, these are all examples of sensor faults, faults that are the result of a
discrepancy of a sensor reading and the actual measurement [60]. These are simpler to
detect as they usually do not cause massive changes in the process. Process faults,
however, or problems occurring in the physical facility, are more difficult to predict and
can present as process changes that may never be detected in the same way as sensor
faults. In this chapter, a brief introduction to process faults and sensor faults is provided.
Then the Tennessee-Eastman process is introduced and used to check whether the robust
DPCA method developed in Chapter 3 could potentially be used to detect process faults.
Since the results provided in the previous chapters are also all on real process data, the
use of the Tennessee-Eastman process allows for a check of the work using simulated
data sets.
122
4.2 Process Faults vs. Sensor Faults
For a fault detection method to be fully usable in a process facility, it must be able
to detect both sensor faults and process faults. In order to distinguish between the two, it
is important to understand the various causes and types of faults and the differences
between the two.
Sensor faults are caused by faulty sensor readings which can arise from incorrect
hardware design, calibration issues, and low battery levels, among other reasons [60].
These problems can cause a discrepancy between the recorded reading and the actual
measurement. The sensor faults that have been tested in this dissertation have all been
observed in real sensors, and include spikes (occasions where a single sample will
register a reading far from the current trend), added noise in measurements, and longer
term offsets (where the recorded data may be biased or drift away from the actual
measurements) [60]. Sensor faults are exactly that, problems that occur with sensors that
may cause difficulty with automated control strategies and process performance
evaluation. They are not an indicator of physical problems in the facility such as process
upsets. The latter are referred to as process faults.
Process faults occur when a physical change occurs in the system such as stuck
valves, unexpected temperatures or pressures, runaway reactions, pump failures etc. As a
result of these abnormal events, the recorded data will diverge from the expected data
recorded during normal operation. The ability to detect process faults can have a direct
and immediate impact on improving the safety and productivity of a process facility and
is an activity that has generally relied on the experience and manual intervention of
123
process operators. However, as the complexity in systems and the capability of sensors to
collect vast amounts of data grows, it is difficult to rely on manual intervention alone
[61]. The problem with automatically detecting process faults in evident in how they will
be represented in data; while sensor faults can be categorized as spikes, outliers and so
on, process faults may present themselves in a variety of ways and affect multiple
variables across the system as the abnormal situation propagates.
To understand if the algorithms developed here can be applied to all kinds of
faults, it is important to use a process that is well-defined. Therefore, the Tennessee-
Eastman process is used to further test the multivariate methods developed in Chapter 3
on simulated data from a very well-known process.
4.3 The Tennessee-Eastman Process
The Tennessee-Eastman Process was developed by Downs and Vogel from the
Eastman Chemical Company in the early 90s with the intention of studying process
control and monitoring strategies [62]. The process produces two products from four
reactants and contains five major unit operations: a reactor, a condenser, a compressor, a
separator and a stripper. A simplified process diagram is shown in Figure 4.1. Data is
collected from various sensors throughout the process, including pressures, flowrates,
temperatures and component concentrations from several analyzers. Detailed descriptions
of the sensor placement and individual reactions can be found in the original study [62].
Simulation data used in this chapter was downloaded from the website of Professor R.D.
Braatz.
124
Figure 4.1: A Simplified Process Diagram of the Tennessee-Eastman Process
124
125
The DPCA model for the Tennessee-Eastman process was set up using the model
developed by Rato and Reis, who used DPCA for fault detection based on decorrelated
residuals [63]. In this work, it is used as the base DPCA model to calculate detection rates
and false alarm rates, and to compare to the models build using faulty data and robust
DPCA.
4.4 Results
The first task was to double-check the DPCA model using clean data and
correlated sets of variables, determined using the correlation coefficients between
variables. In the following results, separator pressure, stripper pressure, reactor pressure,
Component C in product and recycle valve all have strong correlations and are therefore
tested. The locations of these within the process are shown by red stars in Figure 4.1. To
check the effectiveness of the DPCA model on clean data, the control limits are set so
that the false alarm rate is very low, that is, the vast majority of the data is normal, with
some very small exceptions. In this case, the false alarm rate is set to 2.2%, so 97.8% of
the data is read as normal. This setup is shown graphically in Figure 4.2, where the blue
line represents the Q statistic control limit.
126
Figure 4.2: DPCA Model for Correlated Variables in the Tennessee-Eastman Process
0 500 1000 1500
0
2
4
6
8
10
12
14
Sample
Q Statistic
126
127
Once the DPCA model was confirmed as accurate for this data set, the next task
was to check the algorithm on simulated data with a simple sensor fault (spike) and a
longer potential process fault (large extended change in the readings of multiple
variables). Since the Case Studies shown in Chapter 3 were all on actual process data,
including a sensor fault provides an additional proof of concept for the robust DPCA
algorithm.
In Figure 4.3, the original method of fault detection using a DPCA model trained
on clean data is presented. Both types of faults are successfully detected with a detection
rate of 100%. Note that this detection rate is an idealized case since we are using a
simulated process and data set. In Figure 4.4, the same faults are not detected using a
DPCA model trained with faulty data. The detection rate is only 2.5% since neither added
fault is detected. Additionally, the false alarm rate is now 69.89% since the model is no
longer reliable. This once again reinforces the need for a robust DPCA method.
In Figure 4.5, the same faults are added into the data, but the robust DPCA
methods are now used to test detection. The integrated method is able to detect all the
introduced faults. The false alarm rate can be calculated since the clean data set is
available: the integrated method (previously shown to be the superior method) has a false
alarm rate of 12.2% compared to the sequential method’s 90%. Finally Figure 4.6 shows
the reconstruction of the training data using PCP-based cleansing. Both methods are able
to eliminate all the introduced faults in the original training data set. However, it is clear
that both the methods tend to over-smooth the data which explains the high false alarm
rates observed in the Case Study results.
128
Figure 4.3: Detection of Sensor Faults Using DPCA Model Trained on Clean Data
0 500 1000 1500
0
5
10
15
20
25
30
35
40
45
50
Sample
Q Statistic
128
129
Figure 4.4: Detection of Sensor Faults Using DPCA Model Trained on Faulty Data
0 500 1000 1500
0
5
10
15
20
25
30
35
40
45
50
Sample
Q Statistic
129
130
Figure 4.5: Fault Detection in Tennessee-Eastman Process Using Robust DPCA
200 400 600 800 1000 1200 1400
0
20
40
60
80
100
120
Detection with Various DPCA/PCP Combinations
Sample
Q Statistic
Integrated
Sequential
130
131
Figure 4.6: Comparison between Raw Data and Robust DPCA Reconstruction
300 320 340 360 380 400 420 440 460 480 500
-10
-8
-6
-4
-2
0
2
Comparison between Raw Data, PCP results and PCP with augmented matrix
Samples
Measurement
Raw Data
Sequential PCP
Integrated PCP
131
132
These results show that the newly developed robust DPCA methods can be used
to successfully detect short sensor faults and longer process faults, however, this is not a
diagnostic method. This means that although both can be detected, there is no way to
differentiate between the two, a capability which is vital in real-time decision making.
Therefore, the future direction for this study should be the development of a tool that can
distinguish between process and sensor faults.
4.5 Summary
In this Chapter, the differences between process and sensor faults are discussed
and the potential extension of the robust DPCA methods to process faults is explored.
Data from the Tennessee-Eastman simulation is used to test the robust DPCA algorithms
from Chapter 3 to simulated data. The integrated PCP method is able to eliminate the
faults introduced to the training data set in this example, and is able to detect all
introduced faults in the testing data with a low false alarm rate. The sequential method
results in a high number of false alarms in this example, however, it is also able to
eliminate the faults introduced to the training data set. While the capability of the method
to eliminate short sensor faults and longer process faults is promising, the method must
be explored in tandem with diagnostic methods that can distinguish between the two
types of faults to determine the immediacy of any abnormal situation.
133
CHAPTER 5 : CONCLUSIONS
Advancing sensor and data gathering technology over the past few decades has
resulted in the substantial increase in the amount and frequency of data collected from
any process. While this has new created challenges in terms of storage and analysis, it
can be seen as a valuable opportunity for building data-driven models, particularly in
large-scale processes where models based on physical information can become too
complicated. In particular, reliable data-driven models can lead to online analysis of
incoming data so that sensor and process faults can be detected and managed.
Investigating patterns in this data can also help prevent future problems.
In this dissertation, data-driven techniques are developed and tested for both
univariate and multivariate data sets. The first part of the dissertation (Chapter 2)
considers univariate data sets. For univariate data sets, a complete single tag cleansing
method using a discrete online wavelet transform for fault detection and recursive least
squares for reconstruction is presented. The method is developed while considering the
process data application in mind, including considerations for real-time implementation,
computational load and complexity, and data collection frequency and limitations. A
novel wavelet-based real-time data compression technique is also developed and tested.
This method proves to be superior to the industry standard method, and requires little
user input unlike the latter.
The latter half of the dissertation (Chapters 3 and 4) covers data-driven modelling
for multivariate correlated data sets. First, the classic modelling technique PCA is
134
discussed, and its limitations are outlined. The objective is to develop a method that can
extend PCA to be robust to faulty data and therefore can be used to build reliable models
regardless of data quality. Therefore, two new algorithms based on principal component
pursuit, a method analogous to PCA, are developed and tested. Both methods are
successful in modelling the data, however, present high false alarm rates when compared
to model-building with clean data. The integrated method results in high detection rates,
and is successful in using faulty data for data-driven modelling. Possibilities for
extending this method to a recursive robust dynamic PCA method are also discussed in
this chapter. Finally, the extension of these multivariate methods to process fault
detection is presented in Chapter 4, and the Tennessee-Eastman process is used to check
its effectiveness.
The next steps of this work include finalizing and testing the recursive robust
dynamic PCA method so that is completely automated and functional. The objective is to
field test this method so that it can be practically implemented into currently operational
facilities. Finally, further investigating the possibility of extending these methods for
having increased functionalities such as prediction of process upsets and process fault
detection is recommended.
135
BIBLIOGRAPHY
[1] S.J. Qin, “Process data analytics in the era of big data,” AIChE Journal., vol. 60, no.
9, pp. 3092–3100, 2014.
[2] IBM. "The Four V's of Big Data." IBM Big Data and Analytics Hub. n.d. Web. 20
Jan. 17.
[3] C. Ali, A. Palazoglu A. and F. Kahiyan, Chemical Process Performance
Evaluation. CRC Press, 2007.
[4] M. Misra, S.J. Qin, S. Kumar, and D. Seeman, “Online data compression and error
analysis using wavelet technology,” Process Systems Engineering, vol. 46, no. 1,
pp. 119-132, 2000.
[5] R. Ranta, “Iterative Wavelet-Based Denoising Methods and Robust Outlier
Detection,” IEEE Signal Processing Letters vol. 12, pp. 557-560, 2005.
[6] R. Gittins, Canonical Analysis: A Review with Applications in Ecology. Springer,
1985.
[7] J.F. MacGregor, C. Jaeckle, C. Kiparissides, and M. Koutoudi, “Process monitoring
and diagnosis by multiblock PLS methods,” AIChE Journal, vol. 40, no. 5, pp. 826–
838, 1994.
[8] P. Miller, R.E. Swanson, and C.F. Heckler, “Contribution plots: The missing link in
multivariate quality control,” Int. J. App. Math. & Comp. Science, vol. 8, no. 4, pp.
775–792, 1998.
[9] H. Wold, “Partial least squares,” Encyclopedia of statistical sciences, 1985.
[10] I. Jolliffe, Principal Component Analysis. Springer, 2002.
[11] A. Deshpande, S.J.Qin, and L.A. Brenskelle, “Fault Detection System Utilizing
Dynamic Principal Components Analysis”, U.S. Provisional Patent Application No.
62/421080, filed November 2016 by Chevron.
[12] A. Deshpande, S.J. Qin and L.A. Brenskelle LA, “Automated Real-time Process
Data Analysis Using Online Wavelet Transforms: Fault Detection, Data Cleansing
136
and Compression,” SPE Intelligent Energy International Conference and
Exhibition, Aberdeen, UK, 2016.
[13] A. Deshpande and L.A. Brenskelle, “Automated Wavelet-Based Data Compression
Systems and Methods,” U.S. Provisional Patent Application No. 62/348023, filed
June 2016 by Chevron.
[14] A. Deshpande, Y. Dong, G. Li, Y. Zheng, S.J. Qin, and L.A. Brenskelle, “Data
Processing Framework for Data Cleansing,” U.S. Patent Application No.
20160179599, filed Nov 2015 by Chevron/USC. Patent Pending.
[15] A. Deshpande, Y. Dong, G. Li, Y. Zheng, S.J. Qin and L.A. Brenskelle, “Advanced
Streaming Data Cleansing,” SPE Digital Energy Conference and Exhibition, The
Woodlands, Texas, USA, 2015.
[16] R. Cohen, “Signal Denoising Using Wavelets,” Technicon, Israel Institute of
Technology, 2011.
[17] S.G. Mallat, “A theory for multiresolution signal decomposition: The wavelet
representation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11,
pp. 674–693, 1989.
[18] P.H. Hsu, “Spectral Feature Extraction of Hyperspectral Images using Wavelet
Transform,” Ph.D. Thesis, National Cheng Kung University, Tainan, Taiwan,
R.O.C., 2003.
[19] D.M. Himmelblau. Fault detection and diagnosis in chemical and petrochemical
processes. Elsevier, 1978.
[20] L.H. Chiang, E.L. Russell, R.D. Braatz, “Fault diagnosis in chemical processes
using Fisher discriminant analysis, discriminant partial least squares, and principal
component analysis,” Chemometrics and Intelligent Laboratory Systems vol. 50, pp.
243–252, 2000.
[21] N.T. Altan and B.B. Ustundag, “Reconstruction of Missing Meteorological Data
Using Wavelet Transform,” Center of Agricultural& Environmental Informatics,
Istanbul Technical University, 2011.
[22] H. Li, “Application of wavelet transform to fault detection and data reconstruction,”
Informal Report, 2013.
137
[23] T.R. Rocha, S.P. Paredes, J.H. Henrique, “A Wavelet Scheme for Reconstruction of
Missing Sections in Time Series Signals,” Computing in Cardiology, vol. 37, pp.
461-464, 2010.
[24] D. Vogiatzis and N. Tsapatsoulis, “Missing Value Estimation for DNA Microarrays
with Multiresolution Schemes,” ICANN no. 4132, pp. 141-150, 2006.
[25] MIT OpenCourseWare, “Signal Processing: Continuous and Discrete”, 2008.
[26] K. Belpatre and M.R. Bachute, “Comparative performance study between the Time-
Varying LMS algorithm, LMS algorithm and RLS algorithm,” International
Journal of Computer Applications (National Conference on Innovative Paradigms
in Engineering and Technology) pp 6-10, 2012.
[27] OSI Soft. “Exception reporting and compression testing.” OSI Soft Live Library, PI
Server n.d. Web. 24 Jan 2017.
[28] M.J. Watson, A. Liakopoulos, D. Brzakovic and C. Georgakis, “A Practical
Assessment of Process Data Compression Techniques,” Ind. Eng. Chem. Res., vol.
37, pp. 267-274, 1998.
[29] E. H. Bristol, “Swinging door trending: Adaptive trend recording?” Proc. ISA Nat.
Conf., pp. 749–756, 1990.
[30] M. Misra, S. Kumar, S.J. Qin, and D. Seemann, “Error based criterion for on-line
wavelet data compression,” Journal of Process Control, vol. 11, no. 6, pp. 717-731,
2001.
[31] E.J. Candes, X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis?”
Journal of the Association of Computing Machinery, vol. 58, no. 3, pp. 1-37, 2009.
[32] J. Luttinen, A. Ilin, and J. Karhunen, “Bayesian Robust PCA of Incomplete Data,”
Neural Process Letters, vol. 36, pp. 189-202, 2012.
[33] W. Ku, R.H. Storer and C. Georgakis, “Disturbance detection and isolation by
dynamic principal component analysis,” Chem Intell Lab Syst, vol. 30, no. 1, pp.
179-196, 1995.
[34] S.J. Qin and R. Dunia, “Determining the number of principal components for best
reconstruction,” Journal of Process Control, vol. 10, pp. 245-250, 2000.
138
[35] S.J. Devlin, R. Gnanadesikan, and J.R. Kettenring, “ Robust Estimation of
Dispersion Matrices and Principal Components,” Journal of the American
Statistical Association, vol. 76, no. 374, pp. 354-362, 1981.
[36] S.J. Devlin, R. Gnanadesikan, and J.R. Kettenring, “Robust Estimation and Outlier
Detection With Correlation Coefficients,” Biometrika, vol. 62, pp. 531-545, 1975.
[37] R.A. Maronna, “Robust M-Estimators of Multivariate Location and Scatter,”
Annals of Statistics, vol. 4, pp. 51-67, 1976.
[38] P.J. Huber, P.J. “Robust Covariances,” in Gupta S.S. & Moore D. (Eds) Statistical
Decision Theory and Related Topics II, New York: Academic Press, pp. 165-191,
1977.
[39] C. Croux and G. Haesbroeck, “Principal Component Analysis based on Robust
Estimators of the Covariance or Correlation Matrix: Influence Functions and
Efficiencies,” Biometrika, vol. 87, no. 3, pp. 603-618, 2000.
[40] T. Zhang and G. Lerman, “A novel M-estimator for Robust PCA,” Journal of
Machine Learning Research, pp. 749-808, 2014
[41] Y. Tharrault, G. Mourot, J. Ragot, and D. Maquin D, “Fault detection and Isolation
with Robust Principal Component Analysis,” Int. J. Appl. Math. Comput. Sci., vol.
18, no. 4, pp. 429–442, 2008.
[42] M.E. Tipping and C.M. Bishop, “Probabilistic principal component analysis,” J.
Royal Stat. Soc., vol. 61, pp. 611-622, 1999.
[43] J. Zhao and Q. Jiang, “Probabilistic PCA for t distributions,” Neurocomputing, vol.
69, pp. 2217-2226, 2006.
[44] Z. Khan and F. Dellaert, “Robust generative subspace modeling: the subspace t
distribution,” Technical Report, GVU Center, College of Computing, Georgia,
2004.
[45] C.M. Bishop, “Bayesian PCA,” in Kearns MS, Solla SA and Cohn DA (Eds)
Advances in Neural Information Proceeding Systems, MIT Press, vol. 11, pp. 382-
388, 1999.
[46] X. Ding, L. He, and L. Carin, “Bayesian Robust Principal Component Analysis,”
IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3419-3430, 2011.
139
[47] M.N. Nounou, B.R. Bakshi, P.K. Goel and X. Shen, “Bayesian Principal
Component Analysis,” Journal of Chemometrics, vol. 11, pp. 576-595, 2002.
[48] Z. Ge and Z. Song, “Robust monitoring and fault reconstruction based on
variational inference component analysis,” Journal of Process Control vol. 21, pp.
464-474, 2011.
[49] Y. Liu, Y. Pan, Z. Sun and D. Huang, “Statistical monitoring of wastewater
treatment plants using variational Bayesian PCA,” Ind. Eng. Chem. Res., vol. 53,
no. 8, pp.3272–3282, 2014.
[50] Z. Zhou, X. Li, J. Wright, E. Candes and Y. Ma, “Stable Principal Component
Pursuit,” International Symposium on Information Theory, 2010.
[51] Y. Cheng and T. Chen, “Application of Principal Component Pursuit to Process
Fault Detection and Diagnosis,” American Control Conference, pp. 3535-3540,
2013.
[52] J.D. Isom and R.E. LaBarre, “Process Fault Detection, Isolation and Reconstruction
by Principal Component Pursuit,” American Control Conference, pp. 238-243,
2011.
[53] X. Yuan and J. Yang, “Sparse and low-rank matrix decomposition via alternating
direction methods,” preprint, 2009.
[54] Z. Lin, L.W. Chen and Y. Ma, “The augmented Lagrange multiplier method for
exact recovery of a corrupted low-rank matrices,” arXiv preprint arXiv:1009.5055,
2010.
[55] P. Nomikos and J.F. MacGregor, “Multivariate SPC charts for monitoring batch
processes. Technometrics, vol. 37, no. 1, pp. 41-59, 1995.
[56] E.L. Russell, L.H. Chiang, and R.D. Braatz, “Fault detection in industrial processes
using canonical variate analysis and dynamic principal component analysis,”
Chemometrics and Intelligent Laboratory Systems, vol. 51, pp. 81-93, 2000.
[57] W. Li, H.H. Yue, S. Valle-Cervantes, and S.J. Qin. "Recursive PCA for adaptive
process monitoring." Journal of Process Control, vol. 10, no. 5, pp. 471-486, 2000.
[58] Y. Dong, “Recursive DPCA for Process Monitoring,” CiSoft Technical Report,
University of Southern California, 2014.
140
[59] G.H. Golub and C.F. Van Loan. Matrix computations, Volume 3. JHU Press, 2012.
[60] A.B. Sharma, L. Golubchik and R. Govindan, “Sensor faults: Detection methods
and prevalence in real-world datasets,” ACM Transactions on Sensor Networks, vol.
6, no. 3, article 23, 2010.
[61] V. Venkatasubramanian, R. Rengaswamy, K. Yin, and S.N. Kavuri, “A review of
process fault detection and diagnosis: Part I: Quantitative model-based methods,”
Computers & Chemical Engineering, vol. 27, no. 3, pp. 293-311, 2003.
[62] J.J. Downs and E.F. Vogel, “A plant-wide industrial process controlproblem,”
Computers & Chemical Engineering, vol. 17, no. 3, pp. 245–255, 1993.
[63] T.J. Rato and M.S. Reis, “Fault detection in the Tennessee-Eastman benchmark
process using dynamic principal components analysis based on decorrelated
residuals (DPCA-DR),” Chemometrics and Intelligent Laboratory Systems, doi:
10.1016/j.chemolab.2013.04.002, 2003.
141
APPENDIX: WORKING EXAMPLE OF NOVEL WAVELET
COMPRESSION ALGORITHM
The following working example of the automated real-time wavelet-based
compression is included in the patent application [13] and uses simulated data (Table
A.1) to show the steps from the algorithm flowchart (Figure 2.22). The algorithm
proceeds with the steps shown in Table A.2.
Table A.1: Original Data for Working Example
Data Point ID Time Pressure (bar)
P
1
1:00 pm 100.0
P
2
1:01 pm 110.0
P
3
1:02 pm 120.0
P
4
1:03 pm 120.1
P
5
1:04 pm 110.0
P
6
1:05 pm 110.2
P
7
1:06 pm 110.0
P
8
1:07 pm 110.3
P
9
1:08 pm 110.1
P
10
1:09 pm 110.1
P
11
1:10 pm 110:5
P
12
1:11 pm 110.3
P
13
1:12 pm 110.0
P
14
1:13 pm 120.5
P
15
1:14 pm 115.0
142
Table A.2: Algorithm Steps with Working Example
# Description Example
0 Preprocessing Step: Once tags are chosen, a text
search procedure (through the name and
measurement units) will identify the tag type and
set the accuracy based on this. If the tag type
cannot be identified, a general threshold will be
chosen.* Range of the tag will be within PI, so
determine threshold by PThr = PAcc*PRange
Tag name: PI100.PV
Units: bar
Example Accuracy for Pressure
Tags:
PAcc = 0.5%
PRange = 200
Then PThr = (0.5/100)*200 = 1
1 Receive two data points (value and time stamp)
from the chosen tag and buffer them.
Data:
P
1
= (1:00 pm, 100.0 bar)
P
2
= (1:01 pm, 110.0 bar)
2 Apply Haar wavelet decomposition to both data
points to generate one approximation coefficient
(one time use**) and one detail coefficient.
Approximation Coefficient = (P
1
+P
2
/√2)
Detail Coefficient = (P
1
-P
2
/√2)
Approximation Coefficient:
A = (100.0+110.0)/ √2 = 148.49
Detail Coefficient:
D = (100.0-110.0) )/ √2 = -7.071
3 Use the generated approximation coefficient to
calculate a single estimate (one time use) to
represent both values.
Normal Haar Reconstruction
=(A +D)/ √2 and (A -D)/ √2
Estimate = A/√2
Estimated Point (EP):
EP
12
= 148.49/√2 = 105.0
4 Determine the error (one time use) between the
estimate and the two data points.
Error (Err):
Err
12
=max( | EP
12
– (P
1
,P
2
) |
= |105-(100,110)|
= 5
5 Determine whether the error is above a threshold
associated with the tag type.
Test:
Err
12
> PThr or Err
12
< PThr
6 If the error is above the threshold, store first data
point in long term storage. Continue to buffer the
second data point.
If Result:
Err
12
> PThr
Store P
1
Continue to buffer P
2
6a Receive a third data point. P
3
(1:02 pm, 120.0 bar)
143
6b Repeat steps 2-6 until such time that the error
between the estimate and the two values is below
the threshold.
Note: The error will be consistently above the
threshold when data is highly variable like the
example here where the data increases by about
10% in the first two steps.
Steps 2-6 with P
2
and P
3
:
A
23
= 162.63
D
23
= -7.07
EP
23
= 115
Err
23
= 4.55
Err
23
> PThr
Store P
2
Steps 2-6 with P
3
and P
4
P
4
(1:03 pm, 120.1 bar)
A
34
= 169.78
D
34
= -0.071
EP
34
= 120.05
Err
34
= 0.042
Err
34
< PThr
7 If the error is below the threshold, then continue
to buffer the first two points i.e. do not put
anything in long term storage.
If Result: Err
34
< PThr
Buffering Data:
P
3
(1:02 pm, 120.0 bar)
P
4
(1:03 pm, 120.1 bar)
8
Receive two more data points.** P
5
(1:04 pm, 110.0 bar)
P
6
(1:05 pm, 110.2 bar)
9 Apply Haar wavelet decomposition to all 4 data
points to generate one approximation coefficient
(one time use***).
Equations:
A
1
= (P
3
+P
4
)/√2
A
2
= (P
5
+P
6
)/√2
A
12
= (A
3
+ A
4
)/ √2
Approximation Coefficients:
A
1
= 169.78
A
2
= 155.70
A
12
= (169.78 + 155.70)/ √2
A
12
= 230.15
Detail Coefficients:
D
1
= -0.071
D
2
= -0.141
10 Use the generated approximation coefficient to
calculate a single estimate (one time use) to
represent all 4 values.
EP
3456
= (A
12
/√2) √2
Estimated Point (EP):
EP
3456
= (232.65/√2)/ √2
= 115.075
144
11 Determine the maximum error (one time use)
between the estimate and the four data points.
Error (Err):
Max(Err
3456
)
= 4.61
12 Determine whether the error is above a threshold
associated with the tag type.
Test:
Err
3456
> PThr or Err
3456
< PThr
13 If the error is above the threshold, store first data
point in group of four (P
3
) to long term storage.
Drop the second point (P
4
), and start testing again
with the third point (P
5
).
See below for a detailed explanation of the logic
behind this step.****
P
3
P
4
P
5
P
6
Store P
3
Delete(P
4
)
Go back to step one with P
5
.
14 If the error is below the threshold once again
(with four points), receive four more data points,
to buffer a total of 8 data points.
For the case of the working
example, we consider P
5
to P
12
which satisfies this requirement.
Max(Err
5678
) < PThr
Buffering data:
P
5
(1:04 pm, 110.0 bar)
P
6
(1:05 pm, 110.2 bar)
P
7
(1:04 pm, 110.0 bar)
P
8
(1:05 pm, 110.2 bar)
P
9
(1:04 pm, 110.0 bar)
P
10
(1:05 pm, 110.2 bar)
P
11
(1:04 pm, 110.0 bar)
P
12
(1:05 pm, 110.2 bar)
15 Apply Haar wavelet decomposition to all 8 data
points to generate four approximation coefficients
(one time use) and four detail coefficients.
Approximation Coefficients:
A
1
= 155.079 A
12
= 220.25
A
2
= 155.78
A
1234
= 311.59
A
3
= 155.70 A
34
= 220.40
A
4
= 155.99
Detail Coefficients:
D
1
= -0.141
D
2
= -0.212
D
3
= 0
D
4
= 0
16 Determine the maximum error (one time use) Error (Err):
145
between the estimate and the four data points. Max(Err
56789101112
)
= 0.15
17 Determine whether the error is above a threshold
associated with the tag type.
Test:
Err
56789101112
> PThr or
Err
56789101112
< PThr
18 If the error is below the threshold, do not add any
more points (buffering too much data will
overload the computer). Put the first point (P
5
) to
long term storage and delete the seven following
points.
Note: in this case, if the error is above the
threshold, keep the first point, drop the second,
third and fourth point and start testing again from
point 5. See note for more detail.***
P
5
P
6
P
7
P
8
P
9
P
10
P
11
P
12
Store P
5
Delete P
6
-P
12
19 Go back to step 1. Results from repeating steps:
Store P
13
, P
14
and P
15
.
*Testing on real process data has shown that a general threshold will also result in
compression ratios that are a large improvement on conventional PI data compression.
**The Haar wavelet transform requires 2
n
data points
*** ‘One time use’ refers to values that are calculated in the algorithm but nor stored
(and instead are transformed in future calculations).
****To better explain this step, consider the sets of data points as trees. The first tree
contains 2 data points, which are close to each other so that Err < Thr. In this case, we
grow the tree to 4 points. With 4 points, if now Err > Thr, then we know only the smaller
tree (containing the first two data points) ‘passes the test’ of Err < Thr. Therefore we drop
down to that tree and compress it. However, if at the 4-point tree, we still had Err < Thr,
then we would grow the tree to 8 points. Now with 8 points, if Err > Thr, we know that
the smaller tree (of 4 data points) ‘passed the test’. Then following the same logic, we
146
would compress the smaller tree of 4 points to 1, and start testing again from the 5
th
point.
Conversely if the 8-point tree resulted in Err < Thr, then we could hypothetically grow
the tree infinitely. However, every time the tree grows, more values have to be buffered,
which will quickly cause too much load on the computer. Therefore, a maximum tree size
has to be set. A maximum tree of 8 points was found to give sufficiently superior results
to conventional methods while controlling the amount of buffered data, however, it can
be easily changed.
Summary of Results for Working Example
Following the above steps for the working example, the resulting compressed data
stream is shown in Table 3. The working example includes the various combinations that
can occur in the algorithm when comparing the threshold with the error (above-above,
below-above, above,-below, below-below, forced dispatch). Finally, a plot comparing the
compressed and original data is shown in Figure 2 to show that the compressed data
represents the original data very closely.
Table A.3: Compressed Data Stream from Working Example
Data Point ID Time Pressure (bar)
P
1
1:00 pm 100.0
P
2
1:01 pm 110.0
P
3
1:02 pm 120.0
P
5
1:04 pm 110.0
P
13
1:12 pm 110.0
P
14
1:13 pm 120.5
P
15
1:14 pm 115.0
147
Figure A.1: Comparison of Original Data with Archived Data for Working Example
Note that PI will automatically carry out linear interpolation between the data
points preserved after compression (shown in Figure 2. While this is usually sufficient, it
can be observed in this very short example data set that there can be slight differences in
the trend. Therefore, there is an option in the algorithm called “Hold Last Value” which
will generate a dataset that recreates the original time steps (that is, it will fill in 1:03 pm
between 1:02 and 1:04 pm) and will duplicate the preserved value as a post-processing
step. This does not affect the online version of the compression.
100
105
110
115
120
125
0 2 4 6 8 10 12 14
Pressure (bar)
Time (minutes after 1:00 pm)
Original Data Compressed Data
Abstract (if available)
Abstract
Advancing sensor and data gathering technology has resulted in a substantial increase in the amount and frequency of process data collected, creating a valuable opportunity for data-driven modelling techniques in process monitoring. Since process facilities are large interconnected systems, data collected may also be strongly correlated. Therefore, this dissertation approaches the problem of data-driven modelling of process data from two perspectives, first for univariate, uncorrelated data sets, and second for multivariate, correlated data sets. Common methods for monitoring and analysis of univariate data include the use of cumulative sums, moving average charts, interpolation and other simple statistical methods. For correlated, multivariate data, methods such as principal component analysis, partial least squares, and contribution plots are conventionally used. ❧ In this dissertation, the application of online wavelet transforms (OWT) to several aspects of real-time analysis of univariate data was considered, namely, fault detection, data cleansing and data compression. In the first application, the OWT is used to build a model and detect faults in an uncorrelated tag (variable) which cannot use traditional data-driven modelling that relies on underlying relationships between variables. The OWT is successfully able to detect and identify various common sensor faults including missing and frozen values, linear drift and spikes. Recursive least squares (RLS) is then used to predict replacement values and is a significant improvement on current standards, particularly with highly variable data. The new algorithm also updates statistics with each new value, so decision making is accurate but does not require repeated calculations on the whole data set. This application directly affects real-time decision making as faults can be addressed immediately, and any predictive models built from the data will be representative of real operation. The OWT is also adapted to create a novel data compression algorithm to economically store large quantities of data in real-time. The industry standard swinging door algorithm requires extensive manual input, in addition to other drawbacks. The new method is an online filter that is completely automated, with calculations unique to each tag. ❧ There are numerous methods for building data-driven models for multivariate data sets, including the commonly used principal component analysis (PCA) and its extension, dynamic PCA (DPCA). However, there are several drawbacks to PCA and DPCA, including the inability to handle any missing or corrupted data. Two new algorithms are developed in this work by incorporating a robust PCA method called principal component pursuit (PCP) into DPCA to create a robust DPCA method. PCP uses a convex minimization problem to recover a fault-free matrix even when there are gross sparse errors present, and is therefore more robust to outliers than other methods. The PCP method is incorporated into DPCA in two ways. First, PCP and DPCA are combined sequentially where PCP is used to generate a ‘fault free’ training data set as a preprocessing step. In the second method, PCP is incorporated into DPCA to create an integrated robust DPCA method. The two methods are compared to each other and to traditional DPCA in two case studies using real process data. Both the methods present a significant improvement for fault detection over DPCA models trained on faulty data. Extension of this robust DPCA method to a recursive method is then discussed. In the final part of the dissertation, the extension of these methods from sensor fault detection to process fault detection is discussed and simulated data from the Tennessee-Eastman process is used to show the method’s effectiveness.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dynamic latent structured data analytics
PDF
Inferential modeling and process monitoring for applications in gas and oil production
PDF
Modeling and recognition of events from temporal sensor data for energy applications
PDF
A signal processing approach to robust jet engine fault detection and diagnosis
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
PDF
Integrated reservoir characterization for unconventional reservoirs using seismic, microseismic and well log data
PDF
Robust and proactive error detection and correction in tables
PDF
A method for characterizing reservoir heterogeneity using continuous measurement of tracer data
PDF
Utilizing real-world traffic data to forecast the impact of traffic incidents
PDF
Scalable optimization for trustworthy AI: robust and fair machine learning
PDF
Robust and adaptive algorithm design in online learning: regularization, exploration, and aggregation
PDF
Concurrent monitoring and diagnosis of process and quality faults with canonical correlation analysis
PDF
Process data analytics and monitoring based on causality analysis techniques
PDF
Structure learning for manifolds and multivariate time series
PDF
Uncertainty quantification and data assimilation via transform process for strongly nonlinear problems
PDF
Understanding diffusion process: inference and theory
PDF
Optimization of coupled CO₂ sequestration and enhanced oil recovery
PDF
Deriving real-world social strength and spatial influence from spatiotemporal data
PDF
Modeling and simulation of complex recovery processes
PDF
Adaptive and resilient stream processing on cloud infrastructure
Asset Metadata
Creator
Deshpande, Alisha
(author)
Core Title
Robust real-time algorithms for processing data from oil and gas facilities
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Chemical Engineering
Publication Date
09/22/2018
Defense Date
03/08/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithm development,multivariate data analysis,OAI-PMH Harvest,oil and gas production,principal component analysis,principal component pursuit,real-time data analysis,robust algorithms,wavelet transform
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Qin, Joe S. J. (
committee chair
), Ershaghi, Iraj (
committee member
), Shahabi, Cyrus (
committee member
)
Creator Email
alishad@usc.edu,alishad89@hotmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-348509
Unique identifier
UC11257920
Identifier
etd-DeshpandeA-5143.pdf (filename),usctheses-c40-348509 (legacy record id)
Legacy Identifier
etd-DeshpandeA-5143.pdf
Dmrecord
348509
Document Type
Dissertation
Rights
Deshpande, Alisha
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithm development
multivariate data analysis
oil and gas production
principal component analysis
principal component pursuit
real-time data analysis
robust algorithms
wavelet transform