Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dynamic latent structured data analytics
(USC Thesis Other)
Dynamic latent structured data analytics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DYNAMIC LATENT STRUCTURED DATA ANALYTICS by Yining Dong A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2016 Copyright 2016 Yining Dong ii Dedication To my family. iii Acknowledgments I would first like to express sincere appreciation to my advisor, Pro- fessor S. Joe Qin, for being a truly excellent mentor throughout my time at USC. His encouragement and guidance have been indispensable to me. His brilliant insights and thoughtful feedback have allowed me to blossom as a doctoral student. Furthermore, his continuing assistance and support be- yond my time as a Ph.D. student will ultimately allow me to reach my full potential. It has been a great pleasure and an honor working with him. I would also like to express my gratitude to my committee members, Professor Jay Kuo from Ming Hsieh Department of Electrical Engineering, and Professor Qiang Huang from Daniel J. Epstein Department of Industrial and Systems Engineering. I would like to thank them for attending my dissertation defense and providing insightful comments. I would like to thank the members of Professor Qin’s research group. Thank you for always supporting me. Hu, Yingying, Jingran, Yu, Tao, Zhi- jie, Johnny, Alisha, Yuan, Zhaohui, Qinqin, and I feel honored to be part of this community of dedicated and talented individuals. I would like to thank the Center for Interactive Smart Oilfield Tech- nologies (CiSoft) for the financial support during my graduate studies. iv Last but not least, I want to thank my parents for their endless love, care and encouragement. v Table of Contents Dedication ii Acknowledgments iii ListofTables viii ListofFigures ix Abstract xii Chapter1. Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 PLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Chapter2. Dyanmic-InnerPCAforDynamicDataModeling 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 PCA and DiPCA Objective Functions . . . . . . . . . . . . . . . 21 2.3 Dynamic-Inner PCA Algorithm . . . . . . . . . . . . . . . . . . 23 2.3.1 Extracting One Dynamic Component . . . . . . . . . . . 23 2.3.2 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.3 DiPCA Model withl Components . . . . . . . . . . . . . 27 2.3.4 Determination of Model Orders . . . . . . . . . . . . . . 29 2.4 DiPCA Geometric Properties and Relations . . . . . . . . . . . 31 2.4.1 DiPCA Geometric Properties . . . . . . . . . . . . . . . . 31 2.4.2 DiPCA Model Relations . . . . . . . . . . . . . . . . . . . 35 2.4.3 Dynamic Latent Structured Whitening Filter . . . . . . . 37 2.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.1 Simulation Data . . . . . . . . . . . . . . . . . . . . . . . 39 vi 2.5.2 Data from an Industrial Boiler . . . . . . . . . . . . . . . 44 2.5.3 TEP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Chapter3. DiPCA-basedProcessMonitoring 52 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 PCA based Process Monitoring . . . . . . . . . . . . . . . . . . 54 3.2.1 PCA Decomposition . . . . . . . . . . . . . . . . . . . . . 54 3.2.2 PCA Fault Detection . . . . . . . . . . . . . . . . . . . . . 55 3.3 DiPCA-based Process Monitoring . . . . . . . . . . . . . . . . . 59 3.3.1 DiPCA decomposition . . . . . . . . . . . . . . . . . . . . 59 3.3.2 DiPCA Fault Detection Indices . . . . . . . . . . . . . . . 60 3.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.1 Simulation Data . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.2 Data from an Industrial Boiler . . . . . . . . . . . . . . . 68 3.4.3 TEP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter4. DiPLSModeling 79 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 PLS and DiPLS Objective Functions . . . . . . . . . . . . . . . . 80 4.3 Dynamic-Inner PLS Algorithm . . . . . . . . . . . . . . . . . . . 81 4.3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.2 Outer Modeling . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.3 Inner Modeling and Deflation . . . . . . . . . . . . . . . 84 4.3.4 Procedure of DiPLS Modeling . . . . . . . . . . . . . . . 86 4.3.5 Determination of Model Parameters . . . . . . . . . . . . 88 4.4 DiPLS Geometric Properties and Relations . . . . . . . . . . . . 89 4.4.1 DiPLS Geometric Properties . . . . . . . . . . . . . . . . 89 4.4.2 DiPLS Model Relations . . . . . . . . . . . . . . . . . . . 93 4.4.3 Exhausting DiPLS Residuals . . . . . . . . . . . . . . . . 96 4.4.4 Eigenvector and Singular Vector Interpretations . . . . . 98 4.5 Case Studies on Simulation data . . . . . . . . . . . . . . . . . . 99 4.5.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.5.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 vii 4.5.3 Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.6 Case study on the Tennessee Eastman Process . . . . . . . . . . 105 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Chapter5. DiPLS-basedProcessMonitoring 109 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 PLS based process monitoring . . . . . . . . . . . . . . . . . . . 110 5.3 DiPLS-based Fault Detection Indices . . . . . . . . . . . . . . . 113 5.4 Case Study on TEP . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter6. Conclusions 126 Bibliography 129 viii List of Tables Table 1.1 PCA NIPALS Algorithm . . . . . . . . . . . . . . . . . . . . . 4 Table 1.2 PLS NIPALS Algorithm . . . . . . . . . . . . . . . . . . . . . 12 Table 2.1 DiPCA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30 Table 2.2 Description 22 measurements and 11 MVs . . . . . . . . . . . 49 Table 3.1 Fault detection indices for DiPCA . . . . . . . . . . . . . . . 63 Table 3.2 Description of 15 disturbances . . . . . . . . . . . . . . . . . 70 Table 3.3 Fault detection rates of three DiPCA fault detection indices on TEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Table 4.1 DiPLS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 87 Table 4.2 0 ; 1 for each factor in Scenario1 . . . . . . . . . . . . . . . . 102 Table 4.3 0 ; 1 , 3 , 4 for each factor in Scenario2 . . . . . . . . . . . . . 104 Table 4.4 0 ; 1 , 3 , 4 for each factor in Scenario3 . . . . . . . . . . . . . 106 Table 4.5 Mean squared errors of PLS and DiPLS for XMEAS(38) . . . 108 Table 5.1 Fault detection rates of two DiPLS fault detection indices on TEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Table 5.2 False alarm rates of two DiPLS fault detection indices on TEP123 ix List of Figures Figure 2.1 DiPCA dynamic whitening filter structure . . . . . . . . . . 38 Figure 2.2 Autocorrelation and crosscorrelation of x k of simulation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Figure 2.3 Autocorrelation and crosscorrelation of t k of simulation data 41 Figure 2.4 Autocorrelation and crosscorrelation of e k of simulation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 2.5 Autocorrelation and crosscorrelation of v k of simulation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 2.6 Crosscorrelation of e k of data from an industrial boiler . . . 44 Figure 2.7 Crosscorrelation of t k of data from an industrial boiler . . . 45 Figure 2.8 Crosscorrelation of v k of data from an industrial boiler . . . 46 Figure 2.9 Schematic diagram of the Tennessee Eastman Process . . . 47 Figure 2.10 autocorrelation and crosscorrelation of t k of TEP data . . . 48 Figure 2.11 autocorrelation and crosscorrelation of v k of TEP data . . . 50 Figure 3.1 Comparison of control regions for three indices . . . . . . . 58 Figure 3.2 Control region of t k indicated by' v . . . . . . . . . . . . . . 61 Figure 3.3 Fault detection results of fault 1 on simulation data . . . . . 65 Figure 3.4 Fault detection results of fault 2 on simulation data . . . . . 66 x Figure 3.5 Fault detection results of faulty data with unstable dynamics 67 Figure 3.6 Control region for dynamic components at different time stamps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Figure 3.7 Fault detection results of IDV(4) . . . . . . . . . . . . . . . . 72 Figure 3.8 XMV(10) in training dataset and the IDV(4) testing dataset 73 Figure 3.9 XMV(9) in training dataset and the IDV(4) testing dataset . 74 Figure 3.10 Fault detection results of IDV(5) . . . . . . . . . . . . . . . . 75 Figure 3.11 XMV(11) in training dataset and the IDV(5) testing dataset 76 Figure 3.12 XMV(4) in training dataset and dataset the IDV(5) testing dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Figure 4.1 Prediction result of DiPLS and PLS for Scenario 1 . . . . . . 101 Figure 4.2 Prediction result of DiPLS and PLS for Scenario 2 . . . . . . 103 Figure 4.3 Prediction result of DiPLS and PLS for Scenario 3 . . . . . . 105 Figure 4.4 Prediction result of DiPLS and PLS for TEP . . . . . . . . . 107 Figure 5.1 XMEAS(38) in training dataset and the IDV(8) testing dataset115 Figure 5.2 DiPLS fault detection results of IDV(8) . . . . . . . . . . . . 117 Figure 5.3 PLS fault detection results of IDV(8) . . . . . . . . . . . . . 118 Figure 5.4 XMEAS(38) in training dataset and IDV(14) dataset . . . . . 119 Figure 5.5 DiPLS fault detection results of IDV(14) . . . . . . . . . . . 120 Figure 5.6 PLS fault detection results of IDV(14) . . . . . . . . . . . . . 121 xi Figure 5.7 XMV(10) in training dataset and IDV(14) dataset . . . . . . 122 Figure 5.8 DiPLS fault detection results on fault free dataset . . . . . . 123 Figure 5.9 PLS fault detection results on fault free dataset . . . . . . . 124 xii Abstract Data collected from industrial processes are often of high dimen- sions, highly cross-correlated and auto-correlated. Therefore, such data can often be represented with a lower dimensional latent model, which extracts major cross-correlations and/or auto-correlations and leaves only insignificant noise in the residuals. Principal component analysis (PCA) and partial least squares (PLS) are two latent modeling methods that have been widely used for static data modeling and process monitoring. How- ever, it is not appropriate to directly apply PCA and PLS to cross-correlated data from a dynamic process, since PCA and PLS are only able to model static components of the data. In this dissertation, a novel dynamic-inner PCA (DiPCA) algo- rithm and a novel dynamic-inner PLS (DiPLS) algorithm are proposed for dynamic data modeling. DiPCA extracts a set of dynamic latent variables that capture the most dynamic variations in the data. After the dynamic variations are extracted, the residuals are essentially uncorrelated in time and static PCA can be applied. Unlike existing dynamic PCA algorithms, DiPCA generates a subspace of principal time series that are most predictable from their past data. xiii DiPLS extracts two sets of latent variables: one set from the input data, and another set from the output data. These latent variables are ex- tracted such that the output latent variables are most predictable from the current and past input latent variables. In the case where no dynamic re- lationships exist between the input and the output, DiPLS reduces to PLS, as one would expect. Unlike existing dynamic PLS algorithms, DiPLS pro- vides an explicit model between output latent variables and the current and past input latent variables. In addition, outer model that generates the la- tent variables in DiPLS is built with the assumption of a dynamic inner model structure, rather than a static inner model structure as found in the literature, making the outer modeling compatible with the inner dynamic modeling. For the purpose of process monitoring, DiPCA and DiPLS based fault detection indices are derived. DiPCA based monitoring focuses on process faults, and the derived fault detection indices monitor the dynamic relationships, principal static relationships, and residual innovations separately. In contrast, DiPLS based monitoring emphasizes quality related faults. The derived DiPLS based fault detection indices monitor the quality related faults and quality irrelevant faults separately, thus leading to greater interpretability of the detection results. Case studies on simulation data, a real industrial boiler process xiv dataset and Tennessee Eastman Process demonstrate the effectiveness of the proposed DiPCA and DiPLS modeling algorithms and their corre- sponding process monitoring strategies. 1 Chapter 1 Introduction 1.1 Background In modern industrial processes, data is gathered from thousands of sensors for the purpose of monitoring and control [27, 58, 28]. Process control and process monitoring can greatly affect production performance and profits in such processes. Process control ensures quality variables are within a specific range [8, 60, 49], and process monitoring detects faults when they occur [5, 87]. To ensure good process control and monitoring performance, it is important to develop models that can accurately describe the underlying processes [59]. In general, process modeling methods can be classified into two types: physics-based modeling and data-driven modeling. Physics-based methods derive mathematical models based on process knowledge, such as first principles. In a large scale chemical process, a physics-based mathematical model is often prohibitively complex and difficult to derive. In addition, physics-based models are very case specific and cannot be easily generalized and applied to different processes, which affects the gen- eralization of physics-based modeling methods. In contrast, data-driven 2 models derive mathematical models from the analysis of process data and is more general and thus, can be easily generalized to data from various different processes. Therefore, when a large amount of process data are available, data-driven modeling methods are more suited to model the underlying processes. In chemical processes, due to mass balance, energy balance and other operational restrictions, there are often significant correlations among pro- cess variables. Therefore, it is reasonable to assume that the process vari- ables are generated from a lower dimensional hidden latent variable model. Latent variables are not directly observed (usually a transformation of ob- served variables) and can be interpreted as the driving force behind the cor- responding chemical processes. PCA (Principal Component Analysis) and PLS (Partial Least Squares) are the two most popular methods among var- ious latent variable modeling methods [37, 29, 12, 22, 68]. They have been successfully and widely applied to describe the correlations among process variables and extract the underlying latent structure [81, 82, 33, 53]. 1.1.1 PCA PCA, first proposed by Pearson[55] is popularly used as a dimen- sional reduction tool in computer science, economics, and biomedical en- gineering [84, 4, 16, 70, 85]. The dimension reduction feature of PCA is achieved by extracting a direction such that the projection of the measure- 3 ments onto this direction has the largest variance. Let x2R m denote a sam- ple vector ofm variables, X2R Nm denote the data matrix with each row representing a sample, p denote the direction corresponding to the largest variance withjjpjj 2 = 1, and t denote the latent score vector, which is the projection of X onto p. t is related to X and p as t = p T x Therefore, the objective of maximizing the variance of t can be formulated as max p p T X T Xp s.t. kpk = 1 (1.1) By using a Lagrange multiplier, the solution to this optimization problem is obtained as follows, X T Xp =p (1.2) where = p T X T Xp. This implies that p is the eigenvector of X T X corre- sponding to the largest eigenvalue. After the loading vector p is obtained, the latent score vector t can be calculated as t = Xp. To extract the next principal component, the same optimization problem can be solved on the deflated matrix: X := X tp T It can be proven that the principal component loadings of X are the eigen- vectors of X T X, with the corresponding eigenvalues in a descending order. 4 In addition to eigen-decomposition, the non-linear iterative partial least squares (NIPALS) algorithm was developed to calculate PCA models with the advantages of handling missing values in X [78]. NIPALS algo- rithm is summarized in Table 1.1 Table 1.1: PCA NIPALS Algorithm 1. Scale X to zero mean and unit variance. Set X 1 = X, andi = 1. 2. Choose a starting t i as some column of X i and iterate the fol- lowing equations until t i converges or a maximum number of iterations has reached. p i = X T i t i =jjX T i t i jj t i = X i p i 3. Residual Deflation: X i+1 = X i t i p T i 4. Seti :=i + 1 and return to Step 2 untili =i max . Let P denote a collection ofl loading vectors, a sample vector x can be approximated as ^ x = PP T x The quality of this approximation depends on l, i.e. how many principal components are retained. Underestimating the number of principal compo- nents leads to lower predictability, while overestimating introduces noises. 5 Several methods have been proposed to select the number of principal com- ponents, such as Akaike information criterion (AIC) [1], Cumulative Per- cent Variance (CPV) [74], scree test [9, 91], cross validation [80, 15] and vari- ance of reconstruction error (VRE) [61]. Among these methods, CPV is pop- ularly used for its good performance in practical applications. It chooses the smallest number of principal components required to capture a certain percentage of the total variance (usually 95 percent). After the number of principal components is determined, the mea- surement space is divided into the principal component space (PCS) and the residual subspace (RS). The PCS is spanned by P, and the RS is spanned by ~ P, which is the orthogonal complement of P. The PCS contains normal or major variations, and the RS contains small variations or noises. When applied to data from industrial processes, the RS also represents the physi- cal constraints between process variables [34]. 1.1.1.1 PCA in Statistical Process Monitoring One important application of PCA is statistical process monitoring (SPM) [25, 46]. By applying PCA to data collected from normal operating conditions, different fault detection indices can be defined to monitor dif- ferent aspects of the process [75]. For example, SPE (Squared Prediction Error), which is also calledQ index, was proposed to monitor faults in the RS [53]. Faults in the RS correspond to violations of the physical constraints 6 or abnormal innovations. TheQ index is defined as Q(x) =jj~ xjj 2 =jjx T ~ P ~ P T xjj 2 (1.3) To monitor PCS,T 2 index is defined as T 2 = x T P 1 P T x (1.4) where is the convariance matrix of the principal components t. Faults in PCS correspond to abnormal major variations. Statistical methods can be applied to calculate the control limits ofQ andT 2 indices with a certain confidence level. Faults are detected if fault detection indices are above the corresponding control limits. Other fault detection indices with different emphases were also developed for PCA based process monitoring [21, 48, 65], such as' index [88]. Although PCA modeling method has been successfully applied in real industrial processes, there are limitations. First, PCA model is linear. It assumes that process variables are linearly related. Second, PCA model is static. It only considers static relationships among process variables. How- ever, real industrial processes are often nonlinear and dynamic. Therefore, directly applying traditional PCA method will leave nonlinear and dynamic characteristic unmodeled. 7 1.1.1.2 PCA Variants To address this issue, several revised PCA methods have been de- veloped. For example, kernel PCA [69] and probabilistic PCA [71] are two popular methods that have been proposed to deal with nonlinearities. Ker- nel PCA maps variables from the original measurement space to a higher dimensional feature space, and then standard PCA is performed. Proba- bilistic PCA assumes a probability distribution of x, which is given by p(xjz; W;; ) =N(Wz +; I) where z is a set of lower dimensional latent variables, W is the loading ma- trix, is the mean of observations, e is noise with zero mean and isotropic covariance. Maximum-likelihood estimates can be computed for model pa- rameters. In addition, multiple PCA models can be combined as a probabil- ity mixture model. Successful applications of kernel PCA and probability PCA on fault detection can be found in a lot of literatures [36, 89, 13, 32, 19]. Compared to the extensive studies on nonlinear PCA, work address- ing dynamic data is limited. Here we enumerate all dynamic PCA methods that have been developed to our best knowledge. 1. PCA on augmented data matrix [34]. This method applies standard PCA on augmented data matrix defined as X = [x 1 x 2 x N ] T 8 whereN is the number of sample points. With time lagged measure- ments, the augmented data matrix is defined as Z = 2 6 6 6 6 6 6 6 6 6 4 x T s x T s1 ::: x T 1 x T s+1 x T s ::: x T 2 . . . . . . . . . . . . x T N x T N1 ::: x T N1 3 7 7 7 7 7 7 7 7 7 5 = [X s X s1 X 1 ] (1.5) This is a straightforward extension of the traditional PCA method, where both cross-correlation and auto-correlation are captured in the augmented loading matrix P. Parallel analysis in combination with cross-correlation plots of the scores are proposed to determine the number of time lags[34].Although easy to implement, this method suffers from several issues. First, it does not provide an explicit representation of the dynamic relationships. Both dynamic and static relationships are buried implicitly in the augmented loading matrix P, which makes it difficult to interpret the model and explore the underlying variable dependence structure. Second, as shown in the matrix form of Z, the dimension of the problem is overwhelming when the number of time lags increases, and this can cause computa- tional issues. Third, the number of principal components can be larger than the number of variables, and thus fails to perform dimension reduction. Several variants were proposed to deal with various cases, 9 such as nonlinearity and batch processes [10, 45, 43, 3]. However, aug- mented data matrices are used in all of these algorithms. Therefore, they share the same limitations as Ku et al. [34]. 2. Dynamic latent variables (DLV) [39]. This method extracts latent vari- ables such that the auto-covariance at a particular time lag is maxi- mized. The objective function is max w T X T 1 X s w s.t. kwk = 1 where X s and X 1 are defined as 1.5. After certain latent variables are extracted, a vector autoregressive model (VAR), which is referred as an inner model here, is built to represent the dynamic relations, where auto-covariances at more than one time lag are used. This algorithm provides an explicit representation of the dynamic relationships, and a compact model to represent the data structure (the dimension of the loading vector w has the same dimension as x). However, when extracting the latent variables, only the maximum auto-covariance at one time lag is considered; the auto-covariance at other time lags are ignored. This makes the VAR model, aiming to predict the current value from its past, not inductive by the objective function. 3. Structured DLV [41]. This method was developed as an improvement of DLV . It applies standard PCA to a weighted combination of a num- 10 ber of lagged data matrix. The objective function is max w T ( 1 X T 1 + 2 X T 2 + + s X T s )( 1 X 1 + 2 X 2 + + s X s )w s.t. kwk = 1;kk = 1 After a certain number of latent variables are extracted, a VAR model is built to capture the dynamic relationships of the latent variables. This method gives a compact model to represent the data structure and provides an explicit dynamic model. However, although the ob- jective encourages the extraction of dynamic relationships, sometimes collinear static relationships can dominate a latent variable, which im- plies that the objective does not always extract dynamic relationships in the latent variables. In summary, existing dynamic PCA methods suffer from the follow- ing problems. First, dimensions and computational complexity increase dramatically with the number of time lags if augmented data matrices are utilized. Second, models are implicit and difficult to interpret. Third, model objective functions cannot guarantee that dominant dynamics are extracted in the latent variables. Fourth, dynamic models for latent variables are not consistent with the objective function where the latent variables are extracted from. To address these issues, a novel dynamic-inner PCA (DiPCA) modeling algorithm is proposed in this dissertation. 11 1.1.2 PLS In contrast to PCA, PLS models the relationships between process variables and quality variables (or the input and output variables). To achieve good predictions, latent variables from inputs and outputs are extracted such that their covariance is maximized [79]. Consider the input matrix X and output matrix Y, the objective of PLS is max q T Y T Xw s.t. kwk = 1;kqk = 1 (1.6) where w and q are input and output weights, respectively. The NIPALS algorithm is popularly used to solve this optimization problem, which in- cludes outer modeling, inner modeling, and residual deflation [77]. Table 1.2 gives the procedure of NIPALS algorithm for PLS. Details for the PLS al- gorithm can be found in [24] and [20]. Afterl latent variables are extracted, the output matrix Y can be predicted as ^ Y = l X i=1 b i t i q i This implies the selection of the number latent variables affects the pre- diction result. Similar to the selection of the number of principal compo- nents in PCA, underestimating the number of latent variables leads to lower predictability, overestimating the number latent variables introduces errors that can cover the important features. Cross-validation is popularly used to select the number of latent variables. It achieves a good bias and variance trade off by minimizing the predicted error sum of squares (PRESS). 12 Table 1.2: PLS NIPALS Algorithm 1. Scale X and Y to zero mean and unit variance. Set X 1 = X, X 1 = X andi = 1. 2. Outer Modeling: Choose a starting u i as some column of Y i and iterate the following equations until t i converges or a maximum number of iterations has reached. w i = X T i u i =jjX T i u i jj t i = X i w i q i = Y T i t i =jjY T i t i jj u i = Y i q i 3. Inner Modeling: Calculate the inner regression coefficient, b i = u T i t i =t T i t i 4. Residual Deflation: p i = X T i t i =t T i t i X i+1 = X i t i p T i Y i+1 = Y i ^ u i q T i = Y i b i t i q T i 5. Seti :=i + 1 and return to Step 2 untili =i max . 13 1.1.2.1 PLS based Soft Sensors and Statistical Process Monitoring In industrial processes, while process variables can be easily and fre- quently sampled, quality variables are difficult to measure. The predictabil- ity of PLS makes it a widely used as a soft sensor or inferential sensor to es- timate the values of quality variables through process variables [83, 54, 18]. Another important application of PLS is PLS based process monitor- ing. Similar to PCA, PLS partitions the measurements space into a principal subspace and a residual subspace. The Principal subspace contains varia- tions related to the outputs, while the residual subspace contains variations that do not help to explain the outputs. Fault detection indices:Q index and T 2 index have been developed to monitor PCS and RS separately. The defi- nitions ofQ index andT 2 index in the case of PLS are similar to the ones in PCA [47]. Different from PCA based monitoring, usesT 2 to detect quality relevant faults, andQ index to detect quality irrelevant faults. 1.1.2.2 PLS Variants A major drawback of PLS is that the principal subspace can still con- tain variations that are orthogonal to the outputs. For example, when the principal subspace has more dimensions than the outputs, it is obvious that the principal subspace contains variations that are not relevant to the out- puts. The consequences is that the fault detection index T 2 is not only af- fected by quality related faults. Quality irrelevant faults also have impact 14 on T 2 , which can cause high false alarm rates. Several revised PLS algo- rithms have been proposed to address this issue, such as orthogonal PLS (OPLS), total PLS (TPLS) and concurrent PLS (CPLS) [73, 90, 64]. OPLS pre- processes the data such that the orthogonal variations to the outputs are removed from X. After that, standard PLS is applied to the outputs and the filtered inputs. Compare to OPLS, PLS and CPLS are post-processing meth- ods, where standard PLS is applied to data first. TPLS further decomposes the principal input matrix ^ X by projecting it onto the subspace defined by the predicted output matrix ^ Y. Therefore, the variations orthogonal to ^ Y are removed from ^ X. TPLS based fault detection indices have also been developed, and case studies demonstrate improved fault detection results over PLS. CPLS is proposed as an improvement of TPLS. It decomposes the input space into a covariation subspace that is quality relevant only, an input principal subspace and an input residual subspace. In addition, it de- composes the output space into a predictable subspace, an output principal subspace and an output residual subspace. By this decomposition, sepa- rate monitoring of output-relevant faults that affect the quality and input- relevant process faults are achieved. In addition, PLS suffers from the same problems as standard PCA: it cannot model nonlinear data or dynamic data effectively. To address the nonlinearity issues, several approaches have been proposed for PLS. For 15 example, kernel PLS and nonlinear PLS with neural networks are two pop- ular methods that have been proposed to deal with nonlinearities. Suc- cessful applications of these two methods are reported in many literatures [44, 67, 62, 2]. Compared to nonlinear PLS, there are not a lot of work on dynamic PLS. Here are a list of proposed dynamic PLS algorithms to the best of our knowledge. 1. A straightforward extension of PLS is to include a number of lagged inputs in the augmented input matrix [66, 63]. Finite impulse re- sponse (FIR) inner models are built between scores of inputs and outputs after outer models converge. The disadvantage of this method is that it does not give an explicit representation of the dy- namic relationship. The augmented loading matrix makes it difficult to interpret the model and explore the underlying data structures. Also, the dimension of the loading vector increases with the number of lags, which tends to make the model over-parameterized. 2. A modified PLS modeling algorithm is proposed by Kaspar and Ray [30]. It provides a compact representation as no lagged variables are included in the outer model. Prior dynamic knowledge was used to design filters such that dynamic components in the inputs are removed. A dynamic inner model is built between input scores and 16 output scores after a static model is built between filtered inputs and outputs. Lakshminarayanan et al. proposed a similar method by building dynamic inner relationship between inputs scores and output scores [35]. A shortcoming of these methods is that the dynamic inner model is inconsistent with the static outer model. 3. Li et al. proposed a dynamic PLS method by utilizing a weighted combination of lagged input data as input to PLS [40]. An inner model is built between output scores and a weighted combination of lagged input scores. This gives a compact inner and outer model. However, the inner model is not explicit and difficult to interpret. In summary, the existing dynamic PLS algorithms suffer from the fol- lowing problems. First, dimensions and computational complexity increase dramatically with the number of time lags if augmented data matrices are utilized. Second, outer model is not consistent with inner model. Third, models are implicit and difficult to interpret. To address these issues, a novel dynamic-inner PLS (DiPLS) modeling algorithm is proposed in this dissertation. 1.2 Objective The main objective of this dissertation is to provide novel PCA and PLS data modeling method that accounts for data dynamics. The proposed 17 dynamic PCA and PLS methods are called Dynamic-inner Principal Compo- nent Analysis (DiPCA) and Dynamic-inner Partial Least Squares (DiPLS) re- spectively. The main difference between existing dynamic modeling meth- ods and our proposed methods is the way outer model objective functions and dynamic inner models are formulated. In existing dynamic PCA and dynamic PLS methods, latent variables are extracted based on a pre-defined objective function. Then a structure of the dynamic inner model is selected and applied to the extracted latent variables. The determination of the outer model objective function is not directly related, or motivated by the dy- namic inner model structure, which causes the inconsistency between the outer models and inner models. In our methods, the structures of dynamic inner models are determined first, and the outer model objectives are de- rived based on the structures of dynamic inner models. In this way, the dy- namic inner models are explicit and consistent with outer model objective functions. In addition, the structures of DiPCA and DiPLS enable separate monitoring of dynamic relations and static relations.This allows for a clearer physical interpretation than the existing dynamic PCA and dynamic PLS monitoring methods. DiPLS and DiPCA based process monitoring schemes are also derived in this thesis. The following are the objectives achieved in this dissertation. 1. Propose the structures of DiPCA and DiPLS models and derive the 18 corresponding objective functions. 2. Propose iterative methods that are similar to NIPALS algorithms to calculate DiPCA and DiPLS model parameters. 3. Analyze the geometric properties of DiPCA and DiPLS and compare them to standard PCA and PLS. 4. Propose prediction algorithm for DiPLS such that DiPLS can be used as soft sensors. 5. Propose fault detection indices for DiPCA and DiPLS based process monitoring. 6. Derive the control limits for the proposed fault detection indices. 7. Discuss the physical meanings of DiPCA and DiPLS fault detection indices. 1.3 Outline Chapter 2 presents the DiPCA modeling method. First, the objec- tive of PCA and the method of calculating PCA model parameters are in- troduced. Then the DiPCA objective function is derived and the method of calculating DiPCA model parameters is discussed. An iterative method that is similar to NIPALS is proposed to solve the optimization problem. 19 Furthermore, geometric properties of DiPCA are explored. Case studies on a simulation data, TEP (Tennessee Eastman Process) dataset, and a real in- dustrial dataset demonstrate the effectiveness of the DiPCA method. In Chapter 3, fault detection indices based on the DiPCA model are proposed. Statistical methods are applied to calculate the control limits of the proposed fault detection indices. The proposed fault detection methods are tested on TEP . Chapter 4 transitions from DiPCA modeling method to DiPLS mod- eling method. A brief discussion about PLS is given first. Then the struc- ture of the dynamic inner model in PCA is extended to PLS. Based on the dynamic inner model, the objective of DiPLS is formulated and the calcula- tions of model parameters are derived. Similar to DiPLS, geometric proper- ties of DiPLS are studied as well. Case studies on simulation data and TEP data provide a comparison between PLS and DiPLS. In Chapter 5, DiPLS based fault detection indices and the corre- sponding control limits are proposed. Case study on TEP process data demonstrate improved fault detection results over PLS. Chapter 6 gives general conclusions for the dissertation. 20 Chapter 2 Dyanmic-Inner PCA for Dynamic Data Modeling 2.1 Introduction In this chapter, dynamic-inner principal component analysis (DiPCA) is proposed to address the drawbacks of existing dynamic PCA methods. DiPCA extracts one or more latent variables that are linear com- binations of the original variables and have maximized auto-covariance. In other words, the current values of these extracted latent variables are best predictable from their past values. In the complement, the residuals after extracting the most predictable latent variables from the data will be least predictable and, in the limiting case, tend to be white noise. In this sense, the DiPCA algorithm in this paper extracts a principal time series from a set of multi-dimensional time series. The residual with little or no auto-correlation can then be treated as static data with traditional PCA method for analysis. The remainder of this chaper is organized as follows. Section 2 re- views the traditional PCA algorithm and gives the objective function of the DiPCA algorithm. Section 3 presents the proposed DiPCA algorithm. Ge- ometric properties and relations of DiPCA are discussed in Section 4. Case 21 studies on simulation data, a boiler process dataset and TEP dataset are presented to show the effectiveness of the proposed algorithm in Section 5. Section 6 summarizes the work and gives conclusions. 2.2 PCA and DiPCA Objective Functions Let x2 R m denote a sample vector of m variables. Assuming that there areN samples for each variable, a data matrix X2R Nm , with each row representing a sample, can be used to build a PCA model. The objective of PCA is to extract a direction p of the largest variance in them dimensional measurement space. For an arbitrary direction p withjjpjj 2 = p T p = 1, the projection of X on to this direction line is t = Xp Therefore, the objective of maximizing the variance of t can be formulated as max p p T X T Xp s.t. kpk = 1 (2.1) It is clear from the objective of PCA that only static cross-correlations among the variables are extracted by PCA. If the data is from a dynamic pro- cess, traditional PCA will leave the dynamics unmodeled, making the prin- cipal components and even residuals having unmodeled dynamics. This restricts the applicability of PCA and makes it unsuitable for dynamic data modeling. 22 The main objective of this chapter is to develop a dynamic PCA ob- jective that is consistent with its inner dynamic VAR relation, which is most predictable by its past. While this seems to be a natural and useful compo- nent to derive from the data, no prior work has been found in the literature. This approach is referred to as dynamic inner PCA (DiPCA) in this chapter. In general, the dynamics in latent variables that predict the current from the past can be expressed as t k = 1 t k1 + + s t ks +r k with the latent variable as a linear combination of the original variables t k = x T k w wherekk = 1;kwk = 1, x k is the sample vector at time k, and w is the weight vector. If the number of time lags, s, is chosen long enough such that the residual r k is essentially white noise, for each latent variable, the prediction from the dynamic inner model is ^ t k = x T k1 w 1 + + x T ks w s = [x T k1 x T ks ]( w) where = [ 1 2 s ] T and w is the Kronecker product. Based on the above dynamic model, the objective function for extracting the latent variables should maximize the covariance betweent k and ^ t k . Let x k be the 23 sample vector at time k;k = 1; 2;:::;N +s. Therefore, the objective is to maximize 1 N s+N X k=s+1 w T x k [x T k1 x T ks ]( w) (2.2) This objective leads to the dynamic inner PCA algorithm to be derived in the next section. 2.3 Dynamic-Inner PCA Algorithm 2.3.1 Extracting One Dynamic Component Denote the data matrix as X = [x 1 x 2 x N+s ] T and form the following data matrices from X, X i = [x i x i+1 x N+i1 ] T for i = 1; 2; ;s + 1 Z s = [X 1 X 2 X s ] The objective of DiPCA that is consistent with (2.2) is formulated as max w; w T X T s+1 Z s ( w) s.t. kwk = 1;kk = 1 (2.3) wheres is the dynamic order of the model. The dimension of w is the same as the number of variables, which does not increase with the dynamic or- der of the model. After the loading vector w is extracted, the latent score t k is calculated as t k = x T k w. It is clear that the most dominant dynamic relationship is extracted betweent k andt k1 ; ;t ks by the objective func- tion. Therefore, an explicit dynamic model can be built between the latent 24 variables. Compared to other dynamic PCA algorithms, the objective func- tion of DiPCA leads to the extraction of only dynamic latent variables. The residuals after all dynamic components are extracted contain little dynamic information and, therefore, can be further analyzed using static PCA. Lagrange multipliers are used to solve the optimization in (2.3). De- fine J = w T X T s+1 Z s ( w) + 1 2 (1 T ) + w (1 w T w) where ( w) = ( I)w = (I w) Taking derivatives with respective to w and and set them to zero, we have @J @w = X T s+1 Z s ( I)w + ( I) T Z T s X s+1 w 2 w w = 0 @J @ = (I w) T Z T s X s+1 w = 0 Define G = X T s+1 Z s ( I), the above equations can be simplified as @J @w = G w + G T w 2 w w = 0 (2.4) @J @ = (I w) T Z T s X s+1 w = 0 (2.5) Pre-multiplying w T to (2.4) and T to (2.5) and using the fact that these 25 vectors are unit norm, we have 2 w = w T (G + G T )w = 2J = T (I w) T Z T s X s+1 w =J The above results imply that w and are both equal to the maximum value ofJ, and w is the eigenvector of G +G T corresponding to the largest eigenvalue. However, since and w are coupled together, there is no ana- lytical solution to the optimization problem (2.3). Defining the latent scores t as follows, t = Xw and denoting t i = X i w; fori = 1; 2; ;s + 1 as scores for the sub-matrices, (2.5) can be rewritten as follows, = [t 1 t 2 t s ] T t s+1 It is clear that depends on w implicitly through its dependence on t i , for i = 1; 2; ;s+1, entirely. Furthermore, (2.4) can be re-organized as follows, 2 w w = s X i=1 i (X T s+1 t i + X T i t s+1 ) Therefore, the following iterative method is proposed to solve the problem. (1) Initialize w with unit vector. 26 (2) Calculate w; by iterating the following relations until convergence. t = Xw and form t i from t fori = 1; 2; ;s + 1 = [t 1 t 2 t s ] T t s+1 w = s X i=1 i (X T s+1 t i + X T i t s+1 ) w := w=kwk :==kk (3) CalculateJ = P s i=1 i t T i t s+1 . This iteration algorithm is similar to NIPALS algorithm for PCA listed in Table 1.1. To extract the next latent variable, the same iteration algorithm can be applied to the residual matrices of X s+1 and Z s . The deflation methods will be discussed in the next subsection. 2.3.2 De ation After the weight vector w and latent scores t are obtained from the above algorithm, X is deflated as X := X tp T (2.6) where the loading vector p is defined as p = X T t=t T t (2.7) Form X i from X for i = 1; 2; ;s + 1 and repeat the same algorithm to extract the next latent variable. 27 Remark 2.1. Similar to the deflation method applied to PLS [82, 20], the pre- dicted ^ t s+1 can be used to deflate X s+1 as X s+1 := X s+1 ^ t s+1 q T ^ t s+1 = 1 t 1 + 2 t 2 + + s t s q = X T s+1 ^ t s+1 = ^ t T s+1 ^ t s+1 The deflation of other matrices in Z s remain the same as (2.6). In general, this deflation approach maximizes the predictive power of the latent vari- ables under PCA structure. However, this approach suffers from a serious flaw. Unlike PLS, all data share the same latent structure in PCA. Therefore, it is natural to require all the latent variables to have consistent time series. Mathematically, this is equivalent to t s+1 = X s+1 w (2.8) where t s+1 = [t s+1 t s+1 t s+N ] T . It is obvious that (2.8) holds for the first latent variable. However, since ^ t s+1 , instead of t s+1 , is used to deflate X s+1 , (2.8) no longer holds for the second latent variable. Therefore, although this deflation approach seems to be reasonable, it no longer extracts the principal auto-correlative time series beyond the first latent time series. 2.3.3 DiPCA Model with l Components Let t j ;j = 1; 2; ;l denote the j th latent component score vector and let T = [t 1 t 2 t l ] T . Form T i from T fori = 1; 2; ;s + 1 in the 28 same way as forming X i from X. A VAR model can be built to represent the dynamic relations between T s+1 and T 1 ; T 2 ; ; T s as follows. T s+1 = T 1 1 + T 2 2 + + T s s + V = [T 1 T 2 T s ] + V The least squares estimate for is ^ = 0 @ 2 4 T T 1 . . . T T s 3 5 [T 1 T 2 T s ] 1 A 1 2 4 T T 1 . . . T T s 3 5 T s+1 (2.9) Once ^ is obtained, T s+1 can be predicted as ^ T s+1 = [T 1 T 2 T s ] ^ (2.10) which can further be used to predict X s+1 as ^ X s+1 = ^ T s+1 P where P = [p 1 p 2 p l ] is the loading matrix, each p i is defined as (2.7). With the prediction from past data, the prediction errors (PE) from DiPCA can be calculated as follows. E s+1 = X s+1 ^ T s+1 P T (2.11) Then, traditional PCA is performed on the prediction error as follows, E s+1 = T r P T r + E r (2.12) 29 Therefore, X s+1 is decomposed as X s+1 = ^ T s+1 P T + T r P T r + E r The procedure of DiPCA modeling can be summarized in Table 2.1 as the DiPCA Algorithm. It is possible that the iteration converges to a local maximum. This will lead to a smaller value of J. To avoid local maxima, initialize w ran- domly and perform multiple trials. The optimal w and correspond to the largest value ofJ. 2.3.4 Determination of Model Orders In DiPCA modeling, three parameters need to be determined: dy- namic orders, the number of dynamic latent variablesl, and the number of static latent variables. First, assumings is determined, thenl can be chosen such that 95 percent of auto-covariance are captured by the firstl dynamic latent variables. Therefore,l is viewed as a function ofs, which can be writ- ten asl =l(s). To determine the optimals, a DiPCA model is built based on the training the data first. Then applying the model to the validation dataset allows us to obtain the prediction error matrix E V s+1 of the validation data matrix. According to the previous analysis, little dynamic relationships are left in E V s+1 . Therefore, the sample crosscorrelation of any two variables in E V s+1 should be close to 0, except whenlag = 0. The calculation of the cor- responding confidence bounds can be found in [7] . When all the pairs of 30 Table 2.1: DiPCA Algorithm 1. Scale X to zero-mean and unit-variance. Initialize w to a random unit vector. 2. Extracting latent variables. Iterate the following relations until convergence. t = Xw and form t i from t similar to the formation of X i for i = 1; 2; ;s + 1 = [t 1 t 2 t s ] T t s+1 w = s X i=1 i (X T s+1 t i + X T i t s+1 ) w := w=kwk :==kk 3. Deflation. Deflate X as X := X tp T ; p = X T t=t T t 4. Return to Step 2) to extract the next latent variable, untill latent variables are extracted. 5. Dynamic inner modeling. Build a VAR model for latent scores based on (2.9) and (2.10). Then, T s+1 is predicted as ^ T s+1 = [T 1 T 2 T s ][ T 1 T 2 T s ] T 6. Static modeling of prediction errors. Perform traditional PCA on the prediction error matrix E s+1 E s+1 = X s+1 ^ T s+1 P T = T r P T r + E r 31 variables are considered, the total violations of the confidence bounds can be obtained for any (s;l(s)). The parameter (s;l(s)) corresponding to the minimum violations is determined to be optimal. To determine the number of static components, cumulative percentage of variance (CPV) is applied [74]. 2.4 DiPCA Geometric Properties and Relations 2.4.1 DiPCA Geometric Properties The DiPCA algorithm has a different structure from static PCA and other dynamic PCA algorithms. Therefore, it is important to understand the geometric properties of the DiPCA projections and how the data space is partitioned. To explore the DiPCA geometric properties with j latent variables being extracted, we use a subscript to denote the succession from one latent variable to the next as follows. X j+1 = X j t j p T j with p j = X T j t j =t T j t j (2.13) From (2.13) and the relations in DiPCA algorithm, we can obtain the follow- ing lemma. Lemma2.1. Supposei<j. We have the following relationships among the resid- ual matrices and loading vectors. 1. X j = HX i+1 2. X j = X i+1 G 32 3. X j w i = 0 i<j 4. X T j t i = 0 i<j 5. w T i p i = 1 where H and G are some matrices formed by the DiPCA latent vectors and loadings and are given in the proof. Proof. 1. X j = HX i+1 . From (2.13) we have X j = X j1 t j1 p T j1 = X j1 t j1 t T j1 X j1 =t T j1 t j1 = (I t j1 t T j1 =t T j1 t j1 )X j1 (2.14) It is clear that this is a recursive relation for X j . Iterating this relation until X i+1 appears, Relation 1. in Lemma 1 will be resulted with H = (I t j1 t T j1 =t T j1 t j1 ) (I t i+1 t T i+1 =t T i+1 t i+1 ) 2. X j = X i+1 G. In (4.18) if t j is replaced by t j = X j w j , we have X j = X j1 t j1 p T j1 = X j1 X j1 w j1 p T j1 = X j1 (I w j1 p T j1 ) Again, this is another recursive relation for X j . Iterating this relation until X i+1 appears, Relation 2. of Lemma 1 will be resulted. 33 3. X j w i = 0 i<j From Relation 1. in Lemma 1 and (4.18) we have X j w i = HX i+1 w i = H(I t i t T i =t T i t i )X i w i = H(I t i t T =t T i t i )t i = 0 which proves Relation 3. of the Lemma 1. 4. X T j t i = 0 i<j From Relation 2. and (4.18) again, we have t T i X j = t T i X i+1 G = t T i (I t i t T i =t T i t i )X i G = 0 using the same reasoning. 5. w T i p i = 1 Using the expression for p i , Relation 5. of Lemma 1 can be easily shown as follows. w T i p i = w T i X T i t i =t T i t i = t T i t i =t T i t i = 1 With the above Lemma, the following Theorem is given to summa- rize the orthogonal properties of the DiPCA latent scores and loadings. 34 Theorem2.2. In the DiPCA algorithm, denoting X s = P s d=1 d X d , then w/ [X T s+1 X s + X T s X s+1 ]w which means that w is the eigenvector of the above symmetric matrix correspond- ing to the largest eigenvalue. In addition, for components i;j in the model, the following relations hold. 1. w T i w j = 0 8i6=j 2. t T i t j = 0 8i6=j 3. w T i p j = 0 8i<j Proof. To show relations 1. and 2. fori6=j, it suffices to show them for the case ofi < j, since the case ofi > j can be shown by symmetry. Using the expression for w j we have, w j =c s X d=1 d;j (X T s+1;j t d;j + X T d;j t s+1;j ) wherec is a proportional constant. Since X d;j and X s+1;j are sub-matrices of X j , w T i X T j = 0 implies w T i X T d;j = 0 and w T i X T s+1;j = 0, which further implies w T i w j = 0. This is obvious from Relations 3. of Lemma 1. Furthermore, using Relation 4. of Lemma 1, t T i t j = t T i X j w j = 0 35 Using Relation 3. of Lemma 1 again, w T i p j = w T i X T j t j =t T j t j = 0 2.4.2 DiPCA Model Relations Assuming the number of latent variables is chosen asl in the DiPCA model and collecting all latent score vectors and model vectors into the fol- lowing matrices, T = t 1 t 2 t l W = w 1 w 2 w l P = p 1 p 2 p l we have the following relation by iterating (2.13), X l+1 = X 1 l X j=1 t j p T j = X TP T (2.15) or X = TP T + X l+1 (2.16) In many cases, it is desirable to express the scores in terms of the original data for the convenience of model calculations. Post-multiplying (2.16) by W and using Relation 3. of Lemma 1, we can obtain XW = TP T W + X l+1 W = TP T W 36 or T = XR (2.17) where R = W(P T W) 1 (2.18) Pre-multiplying (4.26) by P T , we can obtain P T R = P T W(P T W) 1 = I (2.19) Therefore, (PR T )(PR T ) = PR T (2.20) which is idempotent. Relation (4.26) and (4.32) are the same as the relations of R and P matrices of PLS given in [50], [23]. (2.16) can be rewritten as X = XRP T + X l+1 For a given vector x k that would appear as a row in X, the score vector is calculated from (2.17) as follows t k = R T x k (2.21) and x k can be decomposed as x k = Pt k + ~ x k (2.22) The decomposition as (2.22) gives the partition of the space formed by cur- rent data. To explore how the past data are related to current data, (2.11) 37 should be used to derive the sample relation for the timek as follows. e k = x k P ^ t k where e k is the one step ahead prediction error and t k can be predicted using the result in (2.10) as ^ t k = s X i=1 T s+1i t ki = G(q 1 )t k where G(q 1 ) = P s i=1 T s+1i q i is the transfer function of the dynamic model andq 1 is the backward shift operator. Therefore, x k = PG(q 1 )t k + e k (2.23) With the additional static PCA on e k , for a sample vector x k , we have x k = PG(q 1 )t k + P r t k;r + e k;r (2.24) We call t k dynamic latent variables, and t k;r static latent variables. 2.4.3 Dynamic Latent Structured Whitening Filter DiPCA can also be viewed as a preprocessing filter that applied to x k before PCA modeling. Since there are possible dynamic relationships in the data, directly applying traditional PCA to x k is not appropriate. After DiPCA is first applied to the data x k , the prediction errors are essentially white, which makes them suitable to use PCA to model the latent struc- ture in e k . Therefore, DiPCA can be interpreted as a whitening filter which 38 removes all the dynamic relationships in data. Mathematically, based on (2.21) and (2.23), the dynamic whitening filter can be written as e k = [I PG(q 1 )R T ]x k The block diagram of DiPCA dynamic whitening filter is shown in Figure 2.1. Figure 2.1: DiPCA dynamic whitening filter structure 2.5 Case Studies Three case studies are presented in this section. The first case study is performed on simulation data, the second case study is performed on a boiler process dataset, and the second case study is performed on Tennessee Eastman Process (TEP) data. The effectiveness of the proposed DiPCA mod- eling algorithms are demonstrated in all three cases. 39 2.5.1 Simulation Data In this simulation, t k is generated from a VAR(1) process, and x k is generated from a latent variable model as 8 < : t k = c + At k1 + v k x k = Pt k + e k A = 0 @ 0:5205 0:1022 0:0599 0:5367 0:0139 0:4159 0:0412 0:6054 0:3874 1 A ; c = 0 @ 0:5205 0:5367 0:0412 1 A P = 0 B B B B @ 0:4316 0:1723 0:0574 0:1202 0:1463 0:5348 0:2483 0:1982 0:4797 0:1151 0:1557 0:3739 0:2258 0:5461 0:0424 1 C C C C A where e k 2R 5 N([0; 0:1 2 ]), and v k 2R N([0; 1 2 ]). 3000 data points are generated. First 1000 data points are used as a training dataset to train the DiPCA model. The next 1000 data points are used as a validation dataset to determine the dynamic orders and the number of dynamic latent variables l. The last 1000 data points are used as a testing dataset to evaluate the proposed DiPCA modeling method. By using the method described in Section III, s = 1 and l = 3 are selected. Figure 2.2 and Figure 2.3 show the autocorrelation and crosscor- relation of the original variables x k and the dynamic latent variables t k re- spectively. It is clear from both figures that strong dynamic relationships 40 exist in x k and t k . After DiPCA modeling, the autocorrelation and cross- correlation of the prediction error e k and the innovation v k are plotted in Figure 2.4 and Figure 2.5. The results show that all the autocorrelation and crosscorrelation coefficients are close to zero except at lag 0. This indicates that dynamic relationships are removed from x k and t k , and only static re- lationships remain in e k and v k , which can be further modeled by PCA. 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 14 0 5 10 −1 0 1 c 15 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 24 0 5 10 −1 0 1 c 25 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 0 5 10 −1 0 1 c 34 0 5 10 −1 0 1 c 35 0 5 10 −1 0 1 c 41 0 5 10 −1 0 1 c 42 0 5 10 −1 0 1 c 43 0 5 10 −1 0 1 c 44 0 5 10 −1 0 1 c 45 0 5 10 −1 0 1 c 51 0 5 10 −1 0 1 c 52 0 5 10 −1 0 1 c 53 0 5 10 −1 0 1 c 54 0 5 10 −1 0 1 c 55 Figure 2.2: Autocorrelation and crosscorrelation of x k of simulation data 41 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 Figure 2.3: Autocorrelation and crosscorrelation of t k of simulation data 42 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 14 0 5 10 −1 0 1 c 15 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 24 0 5 10 −1 0 1 c 25 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 0 5 10 −1 0 1 c 34 0 5 10 −1 0 1 c 35 0 5 10 −1 0 1 c 41 0 5 10 −1 0 1 c 42 0 5 10 −1 0 1 c 43 0 5 10 −1 0 1 c 44 0 5 10 −1 0 1 c 45 0 5 10 −1 0 1 c 51 0 5 10 −1 0 1 c 52 0 5 10 −1 0 1 c 53 0 5 10 −1 0 1 c 54 0 5 10 −1 0 1 c 55 Figure 2.4: Autocorrelation and crosscorrelation of e k of simulation data 43 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 Figure 2.5: Autocorrelation and crosscorrelation of v k of simulation data 44 2.5.2 Data from an Industrial Boiler Four sensors with 430 samples were collected from a boiler process. 330 samples are used to build the DiPCA model, and the remaining samples are used for testing.s = 2 andl = 2 are selected according to the method de- scribed in Section 3.4. In this example, there is no residual subspace for v k , therefore, the combined index' v reduces toT 2 v . Figure 2.6 shows the cross- correlations of e k , from which we can see that negligible dynamics remain in the prediction error e k after DiPCA modeling. Figure 2.7 and Figure 2.8 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 14 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 24 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 0 5 10 −1 0 1 c 34 0 5 10 −1 0 1 c 41 0 5 10 −1 0 1 c 42 0 5 10 −1 0 1 c 43 0 5 10 −1 0 1 c 44 Figure 2.6: Crosscorrelation of e k of data from an industrial boiler 45 show the crosscorrelation of t k and v k respectively. It is obvious that strong dynamic relationships exist in t k , while only static relationships remain in v k after a VAR model is built for t k . 0 5 10 −1 −0.5 0 0.5 1 c 11 0 5 10 −1 −0.5 0 0.5 1 c 12 0 5 10 −1 −0.5 0 0.5 1 c 21 0 5 10 −1 −0.5 0 0.5 1 c 22 Figure 2.7: Crosscorrelation of t k of data from an industrial boiler 46 0 5 10 −1 −0.5 0 0.5 1 c 11 0 5 10 −1 −0.5 0 0.5 1 c 12 0 5 10 −1 −0.5 0 0.5 1 c 21 0 5 10 −1 −0.5 0 0.5 1 c 22 Figure 2.8: Crosscorrelation of v k of data from an industrial boiler 47 h Figure 2.9: Schematic diagram of the Tennessee Eastman Process 2.5.3 TEP Data The Tennessee Eastman Process was developed to provide a realistic simulation of an industrial process for the evaluation of monitoring meth- ods [17, 11]. Figure 2.9 shows the schematic diagram of TEP , which contains 12 manipulated variables and 41 measured variables. The measured vari- ables contain 22 process variables sampled every 3 minutes, and 19 quality variables sampled with dead time and time delays. In this case study, 22 process variables XMEAS(1-22) and 11 manipulated variables XMV(1-11) are collected as X, the detail description of 33 variables are given in Table 48 3.2. 480 data points are used as training dataset. By using the parameter de- termination method described earlier,s = 3 andl = 13 are selected. Fig.2.10 shows the autocorrelation and crosscorrelation of the first four dynamic la- tent variables t k , and Figure 2.11 shows the autocorrelation and crosscor- relation of the innovations v k . From Figure 2.10, we can see that there are obvious autocorrelation and crosscorrelations in t k . After dynamic model- ing, little dynamic crosscorrelations are left in v k , which can be seen from Figure 2.11. 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 14 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 24 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 0 5 10 −1 0 1 c 34 0 5 10 −1 0 1 c 41 0 5 10 −1 0 1 c 42 0 5 10 −1 0 1 c 43 0 5 10 −1 0 1 c 44 Figure 2.10: autocorrelation and crosscorrelation of t k of TEP data 49 Table 2.2: Description 22 measurements and 11 MVs Variable Description Xmeas(1) A feed(stream 1) Xmeas(2) D feed(stream 2) Xmeas(3) E feed(stream 3) Xmeas(4) Total feed(stream 4) Xmeas(5) Recycle flow(stream 8) Xmeas(6) Reactor feed rate(stream 6) Xmeas(7) Reactor pressure Xmeas(8) Reactor level Xmeas(9) Reactor temperature Xmeas(10) Purge rate (stream 9) Xmeas(11) separator temperature Xmeas(12) separator level Xmeas(13) separator pressure Xmeas(14) separator underflow Xmeas(15) stripper level Xmeas(16) stripper pressure Xmeas(17) stripper underflow Xmeas(18) stripper temperature Xmeas(19) stripper steam flow Xmeas(20) compressor work Xmeas(21) reactor cooling water outlet temp Xmeas(22) condenser cooling water outlet temp Xmv(1) D feed flow(stream 2) Xmv(2) E feed flow(stream 3) Xmv(3) A feed flow(stream 1) Xmv(4) total feed flow(stream 4) Xmv(5) compressor recycle valve Xmv(6) purge valve(stream 9) Xmv(7) separator pot liquid flow Xmv(8) stripper liquid product flow Xmv(9) tripper steam valve Xmv(10) reactor cooling water flow Xmv(11) condenser cooling water flow 50 0 5 10 −1 0 1 c 11 0 5 10 −1 0 1 c 12 0 5 10 −1 0 1 c 13 0 5 10 −1 0 1 c 14 0 5 10 −1 0 1 c 21 0 5 10 −1 0 1 c 22 0 5 10 −1 0 1 c 23 0 5 10 −1 0 1 c 24 0 5 10 −1 0 1 c 31 0 5 10 −1 0 1 c 32 0 5 10 −1 0 1 c 33 0 5 10 −1 0 1 c 34 0 5 10 −1 0 1 c 41 0 5 10 −1 0 1 c 42 0 5 10 −1 0 1 c 43 0 5 10 −1 0 1 c 44 Figure 2.11: autocorrelation and crosscorrelation of v k of TEP data 2.6 Summary In this chapter, a dynamic inner PCA algorithm is developed for dy- namic data modeling. In the proposed method, a dynamic latent variable model is extracted first to capture the most auto-correlative dynamics in the data. The captured dynamic components form the most self-predictable subspace, while the remaining prediction-error subspace is least predictable and can be treated using static PCA. A cross-validation method is given to determine the order and the number of dynamic latent variables. 51 Geometric properties of DiPCA method are derived to give insight into the DiPCA model structure. The DiPCA objective guarantees that static variations in the data are left in the residuals after extracting dynamic com- ponents. In this case, the DiPCA model can be interpreted as a whitening filter with a dynamic latent structure, where the dimension of the dynamic latent space is generally smaller than the dimension of the original vari- able space. This feature is not available in other time series modeling or the Kalman filter approach. 52 Chapter 3 DiPCA-based Process Monitoring 3.1 Introduction In industrial processes, various types of faults can occur, and these faults affect both production performance and profits. Therefore, it is im- portant to detect faults when they occur. By applying multivariate analysis techniques, the information contained in the collected data can bed use for fault detection. Multivariate statistical methods, such as PCA, have been widely ap- plied to process monitoring [26, 42, 51]. Traditional PCA modeling method divides the measurement space into a principal component subspace and a residual subspace. The principal component subspace contains normal variations and the residual subspace contains noise or innovations. Sev- eral fault detection indices have been developed to monitor these two sub- spaces. Hotelling’sT 2 statistic is proposed to monitor the principal compo- nent subspace, andQ statistic is proposed to monitor the residual subspace. Another combined index is also proposed to monitor the two subspaces to- gether with a weight assigned to each subspace. One important shortcoming for PCA-based monitoring is the lack 53 of focus on time dependence, i.e., no autocorrelations in the data are ex- ploited. Having dynamic content in a component means that its future can be predictable from its past to a certain extent. The only uncertainty will be on the prediction error only, not the entire component. Therefore, ignoring dynamic information in the analysis can make the model inconsistent with data used for process monitoring. The consequence is that, while the false alarm rate can be controlled with a larger-than-necessary control limit, it leads to an overly large missed alarm rate. As mentioned in Chapter 1, several dynamic PCA methods have been proposed to model dynamic data. The corresponding fault detection schemes have been developed as well. However, due to the limitations of these dynamic PCA modeling methods discussed in Chapter 1, there are drawbacks of the corresponding monitoring methods. For example, regular PCA detection indices are applied to the augmented matrix in Ku. Therefore, the control regions determined by Hotelling’sT 2 fault detection index are hyper-ellipses centered at the original of the principal component coordinates. This means that the control region is time independent, and it does not reflect the information that are already known from the past. Not utilizing the past information makes the control region larger than neces- sary, which causes high missed alarm rate. In addition, the monitoring of dynamic relationships and static relationships are mixed together, which 54 makes this monitoring scheme lack a physical interpretation. In this chapter, a DiPCA-based process monitoring scheme is pro- posed. DiPCA models dynamic relationships and static relationships sepa- rately, and it gives an explicit representation of the dynamic latent model. Therefore, the fault detection indices developed are able to monitor the dy- namic relationships and static relationships separately. In addition, by uti- lizing the past information, the control region is more compact and reflects time dependence. The remainder of this chapter is organized as follows. Section 2 re- views the traditional PCA based process monitoring algorithms. Section 3 presents the proposed DiPCA based fault detection indices. Case studies on simulation data, data from a boiler process, and TEP data are presented to show the effectiveness of the proposed process monitoring algorithm in section 4. Section 5 gives conclusions. 3.2 PCA based Process Monitoring 3.2.1 PCA Decomposition As discussed in Chapter 2, PCA gives an eigen-decomposition of the sample covariance matrix X, S = 1 N1 X T X S = PP T + ~ P ~ ~ P T 55 where S is sample covariance, P is the principal loadings and spans the principal component subspace, ~ P is the residual loadings and spans the residual subspace. For a given sample vector x, it can be decomposed as x = ^ x + ~ x ^ x = Pt = PP T x ~ x = x ^ x = (I PP T )x where ^ x is the projection to the principal component subspace, and ~ x is the projection to the residual subspace. 3.2.2 PCA Fault Detection Among all the fault detection indices developed for PCA, theQ index or SPE (Squared Prediction Error), Hotelling’sT 2 index, and the combined index' are the most popular ones. 1. SPE orQ index The SPE index is defined as the squared norm of the residual vector x. SPE(x) =jj~ xjj 2 = x T ~ P ~ P T x By using Box’s theorem[6], the control limit of the SPE orQ index can be derived as 2 = g SPE 2 (h SPE ) (3.1) 56 where g SPE = 2 1 h SPE = 2 1 2 1 = m X i=l+1 i ; 2 = m X i=l+1 2 i 2 (h SPE ) is a 2 distribution withh SPE degree of freedom. indicates the confidence level, and i is the ith eigenvalue of the convariance matrix S,l is the number of principal components retained in the prin- cipal component subspace[53]. If a fault occurs such that the static re- lationships does not hold, the value of the SPE index will be above the control limit and a fault is declared. 2. T 2 index TheT 2 index complements the SPE orQ index and monitors the nor- mal variations in the principal component subspace. TheT 2 index is defined as T 2 = x T P 1 P T x It can be proven thatT 2 statistic follows a F distribution N(N 1) l(N 2 1) T 2 F l;Nl where F l;Nl is an F distribution with l and N l degrees of freedom[72]. Therefore, for a given confidence level , the con- trol limit can be calculated based on the F l;Nl distribution. If N 57 is large enough, the F distribution can be approximated by a 2 distribution[26]. In this case,T 2 has an approximate distribution T 2 2 (l) (3.2) An approximate control limit can be calculated using 2 (l) distribu- tion and the confidence level. If the normal variations increase sig- nificantly, the T 2 index will be above the control limit and a fault is declared. 3. Combined index The Combined index is proposed by Yue and Qin[88]. It assigns weights to the SPE andT 2 indices and combines them together. The combined index is defined as follows ' = SPE(x) 2 + T 2 (x) 2 l; = x T x where = P 1 P T 2 l; + I PP T 2 By using Box’s theorem[6],' approximately follows ' g 2 h (3.3) where g = tr(S) 2 tr(S) ; h = [tr(S)] 2 tr(S) 2 58 It can be shown tr(S) = l 2 l; + P m i=l+1 i 2 tr(S) 2 = l 4 l; + P m i=l+1 2 i 4 The control limit can be determined at a given confidence level, and a fault is detected if the value of' is greater than the control limit. The advantage of the combined index over the SPE orT 2 index is that the control region defined by the combined index is more compatible with the multinormal assumption of the data. Figure 3.1 shows a compari- Figure 3.1: Comparison of control regions for three indices son of the control regions for the combined index, the SPE index and 59 theT 2 index. Four dotted lines define the control region determined by the SPE and T 2 indices together, and the ellipse defines the con- trol region corresponding to the combined index. As it can be seen, although the control region determined by combined index is similar to the control region defined by the SPE and the T 2 index together, the ellipse shape of control region is more close to the multinormal distribution. 3.3 DiPCA-based Process Monitoring 3.3.1 DiPCA decomposition For a given sample vector x k , DiPCA decomposes it as x k = P ^ t k + P r t r;k + e r;k where P ^ t k represents the dynamic component that is predictable from the past, P r t r;k represents the principal part of the static component, and e r;k is the residual part of the static component. Dynamic latent scores t k , dy- namic residuals v k , static latent scores t r;k , and static residuals e r;k can be 60 calculated respectively as follows t k = R T x k v k = t k ^ t k ; ^ t k = s X i=1 i t ki e k = x k P ^ t k t r;k = P T r e k e r;k = (I P r P T r )e k 3.3.2 DiPCA Fault Detection Indices The monitoring of dynamic data can be divided into two parts: the monitoring of dynamic relationships and the monitoring of static relation- ships. 1. Monitoring of dynamic latent relationships The monitoring of dynamic relationships can be performed on t k , ^ t k or v k . Although t k and ^ t k have significant dynamics, the prediction error v k of t k is essentially static. Since t k and ^ t k are dynamic and could even be unstationary, monitoring ^ t k can result in high false alarm rate. Therefore, it is more appropriate to monitor t k and ^ t k through v k . A simulation study in the next section will illustrate this. Although there is little autocorrelation remaining in v k , there can still be high cross-correlations in v k . Therefore, it is necessary to build a PCA model on v k and construct a combined index to monitor v k . The 61 combined index for v k is defined as ' v = v T k v v k = T 2 v 2 v + Q v 2 v v = P v 1 v P T v 2 v + I P v P T v 2 v where v = 1 N1 V T V, T 2 v and Q v are the fault detection indices for PCA based fault detection on v k , and 2 v and 2 v are the correpond- ing control limits as in traditional PCA. The calculations of the control limits have been discussed in Section 3.2.2 as Equation 3.3. Figure 3.2 shows the control region for the combined index on v k . In Figure 3.2, Figure 3.2: Control region of t k indicated by' v the elliptical area indicates the control region of the combined index 62 ' v , which is centered around the predicted dynamic principal com- ponents ^ t k . The data is considered normal if t k fall within the range of the control region, and a fault is declared if t k fall outside of the control region. In tradition PCA, the control region of the combined index is centered at the origin of the coordinates determined by PCA decomposition, and it is constant and does not move with time. Therefore, a large control region is required to keep a low rate of false alarms for the dy- namic relationships, and this can lead to a high missed alarm rate or Type II error. In DiPCA, the control region of v k is centered around ^ t k , which indicates that the control region moves with time. Therefore, for the same false alarm rates, it provides a more compact control re- gion than traditional PCA methods, which leads to a lower missed alarm rate as well. 2. Monitoring of static relationships After the dynamic portion is removed from x k , there is essentially no auto-correlations left in the prediction error e k . Therefore, e k can be treated as the static portion of x k , and the traditional PCA based pro- cess monitoring strategy can be applied to e k . T 2 r andQ r index for e k 63 are defined as Equation 3.1 and Equation 3.2 T 2 r = t T k;r r 1 t r;k = e T k P r 1 r P r T e k Q r =jje r;k jj 2 = e T k (I P r P T r )e k where r = 1 N1 T T r T r . Control limits forT 2 r andQ r can be determined as . Table summarizes the three fault indices developed for DiPCA model and the corresponding control limits. Table 3.1: Fault detection indices for DiPCA Statistics Description Calculation Control limits ' v Monitoring dynamic relationships v T k v k g v 2 hv; T 2 r Monitoring static re- lationships t T r;k 1 r t r;k 2 A; Q r Monitoring static residuals e 2 r;k g 2 h; 3.4 Case Studies Three case studies are presented in this section. The three datasets used in this section are the same as the three datasets used in Section 2.5. In the first case study on simulation data, three scenarios of faults are con- structed to test the proposed fault detection indices. In the second study, moving control regions are illustrated for a dataset from a real industrial 64 boiler process. In the third case study, the fault detection strategy is tested on TEP dataset with 15 types of faults. 3.4.1 Simulation Data The settings of this simulation is the same as the settings in Section 2.5.1, where t k is generated from a VAR(1) process, and x k is generated from a latent variable model as 8 < : t k = c + At k1 + v k x k = Pt k + e k The values of vectors and matrices in this model are the same as the ones in Section 2.5.1. Therefore, the DiPCA model built in Section 2.5.1 can be used as the model for normal data. Three fault scenarios are considered to test the DiPCA based fault detection indices, where 1000 test points are used in all scenarios. Since there is no residual subspace for v k ,' reduces toT 2 v . 1. t k := t k + [0 5 0] T ; k> 500. In this case, a bias fault is added to the second dynamic latent variable. Therefore, it is clear that this fault affects the dynamic relationships. In addition, the abnormal dynamic relationships cannot be completely removed by DiPCA dynamic whitening filter. Therefore, it affects the monitoring of principal static relationships as well. Figure 3.3 shows the fault detection results by three fault detection indices respectively. From Figure 3.3, it can be seen that this fault can be detected by bothT 2 v 65 andT 2 r indices. However, since this fault does not affect the residual part of the static relationships, it is not detectedQ r index. 0 200 400 600 800 1000 0 50 100 T v 2 0 200 400 600 800 1000 0 50 100 T r 2 0 200 400 600 800 1000 0 0.2 0.4 Q 2 Figure 3.3: Fault detection results of fault 1 on simulation data 2. Whenk> 500, x k := x k + 5 [0:0455 0:3877 0:8365 0:2911 0:2513] T In this case, the fault is added in the residual part of the static relation- ships. Therefore, this fault is only detected by Q index, other indices remain unaffected as shown in Figure 3.4. 66 0 200 400 600 800 1000 0 10 20 T v 2 0 200 400 600 800 1000 0 10 20 T r 2 0 200 400 600 800 1000 0 10 20 30 Q 2 Figure 3.4: Fault detection results of fault 2 on simulation data 3. A := 1:2 A, other matrices in the model remain the same. In this case, the eigenvalues of A is greater than 1, which indicates that the VAR model is unstable. Therefore, there are unstable dynamics in- volved in the data. A new DiPCA model for the data with unstable dynamics is trained by using the first 500 data points. 100 data points are used as test dataset to check the false alarm rate of three fault de- tection indices. The results are shown in Figure 3.5. In Figure 3.5,T 2 d is the fault detection index for t k , which is not adopted as a fault de- 67 0 50 100 0 500 1000 1500 T d 2 0 50 100 0 5 10 15 20 T v 2 0 50 100 1 2 3 4 T r 2 0 50 100 0 0.5 1 1.5 2 x 10 −7 Q 2 Figure 3.5: Fault detection results of faulty data with unstable dynamics tection index for DiPCA model. The reason is that when the dynamic latent variables t k contains unstable dynamics, the monitoring of t k leads to high false alarm rate as shown in Figure 3.5. In Figure 3.5, although the testing data are generated from the same model as the training dataset, the values T 2 d index are above its control limits for all 100 data points. This implies that usingT 2 d results in a high false alarm rate when data contains unstable dynamics. Therefore, T 2 d in- dex should not be used for fault detection purpose. Compared toT 2 d 68 index, the indexT 2 v for monitoring v k does not give false alarms at all time steps. Therefore, monitoring v k is more appropriate for DiPCA model. 3.4.2 Data from an Industrial Boiler In this section, the control regions are plotted based on the DiPCA model developed for four sensor data collected from a real industrial boiler process in Section 2.5.2. Figure 3.6 shows the control regions for dynamic principal components at six consecutive time steps from left to right. In Fig- ure 3.6, the red points represent the predicted dynamic principal compo- nents ^ t k and the black stars represent the current dynamic principal compo- nents t k . The blue ellipses indicate the control regions. In process monitor- ing, a sample is considered normal if t k is inside the control region centered around ^ t k , and faulty if t k is outside the control region. We can see from Figure 3.6 that all t k ’s are within the control region when no fault occurs. In addition, the control region varies with time. Contrast this with static PCA where the control region does not change over time. In addition, both ^ t k and t k at the sixth time step are out of the fault detection region of the first time step. However, this does not give false alarms since the control regions are time adaptive. This implies that DiPCA control regions are more com- pact compared to the control regions of static PCA. If the control regions of DiPCA and PCA are determined at the same false alarm rate, DiPCA based 69 fault detection will have a lower missed detection rate. 0.8 1 1.2 1.4 1.6 1.8 2 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 PC1 PC2 Figure 3.6: Control region for dynamic components at different time stamps 3.4.3 TEP Data TEP is popularly used as a benchmark to evaluate the performance of process monitoring strategies[86]. In addition to the normal dataset that can be used to build the model for normal operating condition as Section 2.5.3, 15 datasets with various disturbances are provided for the test of the effectiveness of fault detection indices. Each dataset has 960 sample points, 70 Table 3.2: Description of 15 disturbances Variable Number Process variable Type IDV(1) A/C ratio, B composition con- stant (stream 4) Step IDV(2) B composition, A/C ratio con- stant (stream 4) Step IDV(3) D feed temperature (stream 2) Step IDV(4) Reactor cooling water inlet tem- perature Step IDV(5) Condenser cooling water inlet temperature Step IDV(6) A feed loss (stream 1) Step IDV(7) C header pressure loss - reduced availability (stream 4) Step IDV(8) A,B,C feed composition (stream 4) Random variation IDV(9) D feed temperature (stream 2) Random variation IDV(10) C feed temperature (stream 4) Random variation IDV(11) Reactor cooling water inlet tem- perature Random variation IDV(12) Condenser cooling water inlet temperature Random variation IDV(13) Reaction kinetics Slow drift IDV(14) Reactor cooling water valve Sticking IDV(15) Condenser cooling water valve Sticking and disturbances are introduced after 160 samples. Table 3.2 gives a detail description of these 15 disturbances. The control limits for' v ,T 2 r andQ r are derived based on the DiPCA model built in Section 2.5.3. The false alarm rates of' v ,T 2 r andQ r indices are 5.54%, 6.58% and 0.82% respectively. Ta- ble 3.3 shows the fault detection rates of three fault detection indices for 15 known faults. The fault detection results show that most of the faults can 71 Table 3.3: Fault detection rates of three DiPCA fault detection indices on TEP Fault Type ' v T 2 r Q r IDV(1) 100 99.50 100 IDV(2) 99.00 98.62 97.87 IDV(3) 6.65 7.53 12.55 IDV(4) 97.49 99.50 27.73 IDV(5) 22.08 22.33 97.74 IDV(6) 100 99.37 100 IDV(7) 83.56 100 88.33 IDV(8) 95.86 94.10 95.98 IDV(9) 7.53 8.41 13.17 IDV(10) 15.18 13.93 77.16 IDV(11) 76.66 88.83 42.53 IDV(12) 95.23 95.61 99.00 IDV(13) 94.86 92.35 96.74 IDV(14) 100 100 100 IDV(15) 7.15 8.28 13.43 be detected successfully by the proposed fault detection indices, except for faults IDV(3), IDV(9) and IDV(15). As discussed in, these three faults are hard to detect for all the popular process monitoring methods. For the case of IDV(4), Table 3.3 shows it can be detected successfully by' v andT 2 r in- dices, but notQ r index. Figure 3.7 shows the plots of fault detection indices for fault IDV(4). In fault IDV(4), there is a step increase in the reactor cool- ing water inlet temperature. This change causes an increase in the cooling water flow XMV(10) to compensate the temperature increment and increas- ing oscillations in other variables. Figure 3.8 shows the plots of XMV(10) in both training dataset and the IDV(4) testing dataset, where the vertical 72 0 200 400 600 800 1000 0 5 10 15 ϕ v 0 200 400 600 800 1000 0 200 400 T r 2 0 200 400 600 800 1000 0 5 10 Q r Figure 3.7: Fault detection results of IDV(4) 73 line indicates the separation of training dataset and testing dataset. The left to the vertical line is the plot of XMV(10) in the training dataset, and the right to the vertical line represents the plot of XMV(10) in the IDV(4) testing dataset. It can be seen there is an obvious increase in XMV(10) after fault IDV(4) is introduced. Figure 3.9 shows the plots of XMV(9) in both training −500 0 500 1000 −4 −2 0 2 4 6 8 10 12 Figure 3.8: XMV(10) in training dataset and the IDV(4) testing dataset dataset and IDV(4) dataset. The frequency of the oscillations in XMV(9) in- creased after the fault is introduced and this increase persists. This indicates a change of dynamic relationships and explains why' v andT 2 r indices can 74 detect this fault successfully. −500 0 500 1000 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure 3.9: XMV(9) in training dataset and the IDV(4) testing dataset For the case of IDV(5), Table 3.3 shows it can be detected success- fully by Q r index, but not ' r or T 2 r . Figure 3.10 shows the plots of fault detection indices for fault IDV(5). In fault IDV(5), there is a step increase in condenser cooling water inlet temperature. This change causes an increase in the condenser cooling water flow, XMV(11). Figure 3.11 shows the plots of XMV(11) in both training dataset and IDV(5) dataset. It can be seen that there is a gradual increase in XMV(11) after the fault is introduced. Figure 75 0 200 400 600 800 1000 0 5 10 ϕ v 0 200 400 600 800 1000 0 50 100 150 T r 2 0 200 400 600 800 1000 0 10 20 30 Q r Figure 3.10: Fault detection results of IDV(5) 76 −500 0 500 1000 −3 −2 −1 0 1 2 3 4 5 6 Figure 3.11: XMV(11) in training dataset and the IDV(5) testing dataset 77 3.12 shows the plots of XMV(4) in training dataset and IDV(5) dataset. It can be seen from Figure 3.12 that the fault IDV(5) causes a short period oscilla- tion. The dynamic relationships differ from normal dynamic relationships during the oscillation period, and recover after the oscillation ends. There- fore,' v index only detects abnormal dynamics during the short oscillation period. This explains the low detection rate of' v index. −500 0 500 1000 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure 3.12: XMV(4) in training dataset and dataset the IDV(5) testing dataset 78 3.5 Summary In this chapter, a DiPCA based process monitoring strategy is pro- posed. Three fault detection indices and the corresponding control limits are derived. The task of monitoring dynamic data consists of the moni- toring of dynamic relationships and the monitoring of static relationships. After DiPCA modeling, the principal dynamics are removed and the predic- tion errors are essentially white. Therefore, it is appropriate to apply PCA based monitoring on the prediction errors. The monitoring of dynamic re- lationships is conducted through the error term in the VAR model for the dynamic principal components. The control regions for the dynamic prin- cipal components center around their predictions, and these control regions move with time. Therefore, the control regions are more compact than tra- ditional PCA. Case studies on simulation data, boiler process data, and TEP data demonstrated the effectiveness of the proposed fault detection indices. 79 Chapter 4 DiPLS Modeling 4.1 Introduction In this chapter, a dynamic inner PLS (DiPLS) algorithm is proposed. The proposed algorithm provides an explicit inner model and outer model with a consistent dynamic model structure. An iterative method, derived by applying the Lagrange multiplier technique, is proposed to optimize the objective function. The resulting dynamic model makes the output dynam- ically related to the latent variables, while the latent variables are a projec- tion of the input variables to a lower-dimensional subspace. The explicit and consistent representation makes it easy to interpret the results. The remaining sections of the chapter is organized as follows. Sec- tion 2 reviews the traditional PLS algorithm and derives the objective func- tion of DiPLS. Section 3 presents the proposed DiPLS algorithm. Section 4 discusses several examples to show the effectiveness of the algorithm. Sec- tion 5 gives conclusions and discussions. 80 4.2 PLS and DiPLS Objective Functions A complete form of PLS is first given by Wold[82] to perform regres- sion with inter-related input variables, which is common for routine oper- ation data. The PLS algorithm performs regression with inter-related vari- ables by projecting to a lower dimensional latent space one dimension at a time. This resembles a version of conjugate gradient methods for the linear regression problem. This approach not only avoids direct inversion of a po- tentially ill-conditioned matrix in ordinary least squares, it also provides a way to trade off between the model prediction variance and bias by select- ing an appropriate number of latent variables through cross-validation. Consider the input matrix X and output matrix Y, the objective of PLS is max q T Y T Xw s.t. kwk = 1;kqk = 1 (4.1) where w and q are input and output weights, respectively. It is clear from the objective that only static relations in the input and output are extracted by PLS. In the case that dynamic relationships exist be- tween the input and output data, PLS will leave the dynamics unmodeled. To build a dynamic PLS model with consistent inner and outer rela- tionships, both the inner model and outer model should aim to extract the same dynamic inner relation such as u k = 0 t k + 1 t k1 + + s t ks +r k 81 with the latent variables related to data as follows u k = y T k q t k = x T k w where x k and y k are the input and output vectors at timek. For each factor, the inner model prediction should be ^ u k = x T k w 0 + x T k1 w 1 + + x T ks w s = [x T k x T k1 x T ks ]( w) where = ( 0 1 s ) T and w is the Kronecker product. The outer model that is consistent with the above inner dynamic model should maxi- mize the covariance betweenu k and ^ u k , that is, to maximize 1 N N+s X k=s q T y k [x T k x T k1 x T ks ]( w) (4.2) This objective leads to the dynamic inner PLS (DiPLS) algorithm to be de- rived in the next section. 4.3 Dynamic-Inner PLS Algorithm 4.3.1 Objective Let x k and y k be the input and output vectors at timek;k = 0; 1;:::N+ s, and N + s + 1 samples of input and output are collected to form the following matrices X = [x 0 x 1 x s+N ] T Y = [y 0 y 1 y s+N ] T 82 Define X i = [x i x i+1 x i+N ] T , fori = 0; 1; 2; ;s Z s = [X s X s1 X 0 ] Y s = [y s y s+1 y s+N ] T The objective of DiPLS that is consistent with (4.2) should be formulated as max q T Y T s Z s ( w) s.t. kwk = 1;kqk = 1;kk = 1 (4.3) wheres is the impulse response order of the model. The dimension of w is the same as the number of input variables, while the dimension of is the same as the order of the impulse response. Ifs = 0, Y s is related to X s only, and DiPLS reduces to static PLS. 4.3.2 Outer Modeling Lagrange multipliers are used to solve this optimization in (4.3). De- fine J =q T Y T s Z s ( w) + 1 2 q (1 q T q) + 1 2 (1 T ) + 1 2 w (1 w T w) (4.4) where ( w) = ( I)w = (I w) Taking derivatives with respective to q; w; and setting the results 83 to zero leads to: @J @q = Y T s Z s ( w) q q = 0 @J @w = ( I) T Z T s Y s q w w = 0 @J @ = (I w) T Z T s Y s q = 0 (4.5) Since vectors q, w, and have unit norm, pre-multiplying q T , w T , and T to the above relations leads to q =q T Y T s Z s ( w) w =w T ( I) T Z T s Y s q = T (I w) T Z T s Y s q Rearranging the above results, we haveJ = q = w = . To solve for vec- tors q; w;, the following iterative method can be used to solve the prob- lem. 1. Initializing two of three vectors, w; q;. 2. Calculate w; q; by iterating the following three relations until con- vergence. q = Y T s Z s ( w); q := q=kqk (4.6) w = ( I) T Z T s Y s q; w := w=kwk (4.7) = (I w) T Z T s Y s q; :==kk (4.8) 84 3. CalculateJ = q T Y T s Z s ( w) where J is the value of the objective function from (4.3). Sometimes, the iteration converges to a local optimum. This will lead to a smaller J. To avoid local optimum, initialize the vectors w; q; randomly in Step 1. and perform multiple trials. Take the resulting w; q; that maximizes the value ofJ. 4.3.3 Inner Modeling and De ation After the outer model is obtained, the latent of input and output can be calculated as t = [t 0 t 1 t s+N ] T = Xw u = [u 0 u 1 u s+N ] T = Yq (4.9) Similar to X i and Y s , define t i and u s as follows t i = [t i t i+1 t i+N ] T u s = [u s u s+1 u s+N ] T It is obvious that t i = X i w; i = 0; 1; ;s u s = Y s q; (4.10) As the objective in (4.2) aims to extract a dynamic relationship be- tween inputs and outputs. Input scores and output scores calculated from the outer model are maximally dynamically related. Dynamic inner models should be used to capture these dynamic co-variations. 85 The inner model describing the dynamic relationship between in- put scores and output scores should, therefore, be built between u s and t s ; t s1 ; t 0 as follows u s = 0 t s + 1 t s1 + + s t 0 + r s (4.11) where r s is the residual of the regression. Ordinary least squares can be applied to solve 0 ; 1 ; ; s . After ’s are solved, (4.11) can be used to predict u s from t s ; t s1 ; ; t 0 , as ^ u s = 0 t s + 1 t s1 + + s t 0 (4.12) After t and ^ u s are calculated, the loading vector p for X can be derived as p = X T t=t T t Delation can be performed as X := X tp T Y s := Y s ^ u s q T (4.13) After the residuals of X and Y are calculated, subsequent factors can be obtained from these residuals by repeating the same procedure. 86 4.3.4 Procedure of DiPLS Modeling In the DiPLS outer modeling, the Kronecker expressions can be sim- plified using (4.10) as follows w = ( I) T Z s Y s q = s X i=0 i X T si Y s q = s X i=0 i X T si u s q = Y T s Z s ( w) = Y T s s X i=0 i X si w = Y T s s X i=0 i t si = [ 0 1 s ] T = [t s t s1 t 0 ] T u s The procedure of DiPLS modeling can be summarized in Table 4.1. Remark 4.1. In the case that it is desirable to include the auto-regression of the output, the objective of DiPLS can be modified as max q T ( 0 Y T s + 1 Y T s1 + + f Y T sf ) ( 0 X s + 1 X s1 + + s X 0 )w s.t. kwk = 1;kqk = 1;kk = 1;k k = 1 (4.14) where ’s are the weights of different lag of Y, and f is the order of the output. After (4.14) is solved, inner model can be built as an ARX model of u s u s = ' 0 u s1 + +' f u sf + 0 t s + + s t 0 + r s (4.15) Then the prediction of output score ^ u s can be calculated and used to predict the output. When there is only one output, the traditional PLS is also known as PLS1. In PLS1, the loading vector q = 1, and the output score u = 87 Table 4.1: DiPLS Algorithm 1. Scale X and Y to zero-mean and unit-variance. Initialize with [1; 0; ; 0] 0 , and u s as some column of Y s . 2. Outer modeling. Iterate the following relations until con- vergence achieved. w = s X i=0 i X T si u s ; w := w=kwk t = Xw and form t si from t fori = 0; 1; ;s q = Y T s s X i=0 i t si ; q := q=kqk u s = Y s q = [ 0 1 s ] = [t s t s1 t 0 ] T u s ; :==kk 3. Inner modeling. Build a linear model between t s ; t s1 ; ; t 0 and u s by least squares. u s = 0 t s + 1 t s1 + + s t 0 + r s and calculate the predicted ^ u s as follows ^ u s = 0 t s + 1 t s1 + + s t 0 4. Deflation. Deflate X and Y as X := X tp T ; p = X T t=t T t Y s := Y s ^ u s q T 5. Repeat to Step 2 until enough latent variables are extracted. y. Therefore, the iteration procedure reduces to one-step calculation of the loading vector w and the input score t. We define DiPLS in the case of only 88 one output to be DiPLS1. Similar to PLS1, q = 1 and u s = y s . However, since and w are inter-related, the iteration procedure is still necessary. By using q = 1 and u s = y s , the iteration procedure in Step 2 in Table 4.1 can be simplified as follows. w = s X i=0 i X T si y s ; w := w=kwk t = Xw and form t si from t fori = 0; 1; ;s = [ 0 1 s ] = [t s t s1 t 0 ] T u s ; :==kk Once the iteration converges, the portion of the output explained by the DiPLS factor is ^ y s = ^ u s = s X i=0 i t si and the residual of y s is the portion not explained, that is, y s := y s ^ u s 4.3.5 Determination of Model Parameters In DiPLS modeling, there are two parameters to be determined. One is the lag number s, the other is the number of latent variables. The lag numbers is also the impulse response order of the data. From the simula- tion results in Section 4, we can see the prediction is not sensitive tos. When more lags than the dynamic order of the system are included in augmented input, DiPLS tends to diminish the coefficients corresponding to the excess lags. Compared to the number of lags, the number of latent variables usu- 89 ally has more impact on the prediction results. Therefore, the number of latent variables need to be determined carefully. Cross-validation is a pop- ular method used in PLS to select the number of latent variables, aiming to minimize the prediction error. Similar procedures can be applied to DiPLS to obtain best prediction results. 4.4 DiPLS Geometric Properties and Relations 4.4.1 DiPLS Geometric Properties The DiPLS algorithm is different from the more traditional dynamic PLS algorithm, which simply augments the input data with time lagged values. Therefore, it is important to understand the geometric properties of the DiPLS projections and how the data space is partitioned when a dy- namic inner model as well as a dynamic outer model is employed. While the DiPLS objective (4.3) maximizes the dynamic covariance between a la- tent variable of Y and a latent variable of X, the actual DiPLS algorithm in Table 4.1 has embedded in it important geometric properties such as orthog- onality to ensure numerical stability and robustness. To explore the DiPLS geometric properties extractingj latent variables (LV), we use a subscript to denote the iteration from one LV to another as follows. X j+1 = X j t j p T j with p j = X T i t j =t T j t j (4.16) Y s;j+1 = Y s;j ^ u j q T j (4.17) 90 where the subscriptj denotes the quantity of theith LV . From (4.16) and the relations in Table 4.1, we can obtain the following lemma. Lemma4.1. Supposei<j. We have the following relationships among the resid- ual matrices and loading vectors. 1. X j = HX i+1 2. X j = X i+1 Z 3. X j w i = 0 8 i<j 4. X T j t i = 0 8 i<j 5. w T i p i = 1 where H and Z are some matrices formed by the DiPLS latent vectors and loadings. Proof. 1. From (4.16) we have X j = X j1 t j1 p T j1 = X j1 t j1 t T j1 X j1 =t T j1 t j1 = (I t j1 t T j1 =t T j1 t j1 )X j1 (4.18) It is clear that this is a recursive relation for X j . Iterating this relation until X i+1 appears, Lemma 1 is proven with H = (I t j1 t T j1 =t T j1 t j1 ) (I t i+1 t T i+1 =t T i+1 t i+1 ) 91 2. In (1) if t j is replaced by t j = X j w j , we have X j = X j1 t j1 p T j1 = X j1 X j1 w j1 p T j1 = X j1 (I w j1 p T j1 ) Again, this is another recursive relation for X j . Iterating this relation until X i+1 appears, Relation (2) of Lemma 1 will be resulted. 3. From Relation (1) in Lemma 1 and (4.18) we have X j w i = HX i+1 w i = H(I t i t T i =t T i t i )X i w i = H(I t i t T =t T i t i )t i = 0 which proves Relation (3) of the Lemma. 4. From Relation (2) and (4.18) again, we have t T i X j = t T i X i+1 Z = t T i (I t i t T i =t T i t i )X i Z = 0 using the same reasoning. 5. Using the expression for p i , Relation (5) of Lemma 1 can be easily shown as follows. w T i p i = w T i X T i t i =t T i t i = t T i t i =t T i t i = 1 with the above Lemma we give the following theorem to summarize some orthogonal properties of the DiPLS latent scores and loadings. 92 Theorem4.2. For the DiPLS algorithm in Table 4.1, the following relations hold. 1. w T i w j = 0 8i6=j 2. t T i t j = 0 8i6=j 3. w T i p j = 0 8i<j Proof. To show relations (1) and (2) fori6=j, it suffices to show them for the case ofi < j, since the case ofi > j can be shown by symmetry. Using the expression for w j in Table 4.1 we have, w j =c s X d=0 d;j X T sd;j u s;j where c is a proportional constant. Since X sd;j is a sub-matrix of X j , w T i X T j = 0 implies w T i X T sd;j = 0, which further implies w T i w j = 0. This is obvious from Relations (3) of Lemma 1. Furthermore, using Relation (4) of Lemma 1, t T i t j = t T i X j w j = 0 Using Relation (3) of Lemma 1 again, w T i p j = w T i X T j t j =t T j t j = 0 93 4.4.2 DiPLS Model Relations Assuming the number of latent variables is chosen asl in the DiPLS model and denoting T = t 1 t 2 t l ^ U s = ^ u s;1 ^ u s;2 ^ u s;l W = w 1 w 2 w l P = p 1 p 2 p l Q = q 1 q 2 q l we have the following relations by iterating (4.16) and (4.17), X l+1 = X 1 l X j=1 t j p T j = X TP T (4.19) Y s;l+1 = Y s;1 l X j=1 ^ u s;j q T j = Y s ^ U s Q T (4.20) or X = TP T + X l+1 (4.21) Y s = ^ U s Q T + Y s;l+1 (4.22) The DiPLS model estimations for X and Y are ^ X = TP T (4.23) ^ Y s = ^ U s Q T (4.24) 94 Since the scores of DiPLS model are related to the residuals X j by w j , it is desirable to express the scores in terms of the original data for the con- venience of model calculations. Post-multiplying (4.21) by W and using Relation (3) of Lemma 1, we can obtain XW = TP T W + X l+1 W = TP T W or T = XR (4.25) where R = W(P T W) 1 (4.26) This relation is the same as the PLS R matrix given in [50], [23]. The DiPLS projection of of X can be derived from (4.23) as follows. ^ X = XRP T (4.27) For a given input vector x k , which acts as a row of X, the DiPLS scores, input estimate, and input residual are, respectively, t k = R T x k (4.28) ^ x k = PR T x k = Pt k (4.29) ~ x k = (I PR T )x k (4.30) 95 where ~ x k is the residual of the DiPLS projection. From (4.26) it is easy to see that P T R = P T W(P T W) 1 = I (4.31) Therefore, (PR T )(PR T ) = PR T (4.32) which is idempotent. Since PR T is not symmetric in general, PR T is an oblique projector, similar to the PLS projections shown in [38], PjR ? = PR T (4.33) R ? jP = I PR T (4.34) where PjR ? is a projector onto the subspace SpanfPg along the the sub- space SpanfRg ? . Although the DiPLS projections share the same expres- sions as the PLS projections, the DiPLS partitions of the data space are rad- ically different from the PLS partitions. The DiPLS output estimate is less conventional, but can be derived from (4.24) by making use of the thej th inner model in Table 1, ^ u s;j = s X i=0 i;j t si;j =b j (q 1 )t s;j whereb j (q 1 ) = P s i=0 i;j q i is the transfer function of thej th inner model andq 1 the backward shift operator. Denoting further B(q 1 ) = diagfb 1 (q 1 ); b 2 (q 1 ); ; b l (q 1 )g 96 Equation (4.24) can be expressed as follows, ^ Y s = ^ U s Q T = T s B(q 1 )Q T = X s RB(q 1 )Q T (4.35) where T s is defined similarly to X s . The estimated ^ y k , which takes a row of ^ Y s , for a given input vector x k or t k is ^ y k = QB(q 1 )t k = QB(q 1 )R T x k (4.36) Note that (4.29) and (4.36) jointly give the DiPLS latent variable model. The estimate of the output is related to the input by a transfer matrix, where both current and past data are needed to estimate the output. 4.4.3 Exhausting DiPLS Residuals A major attractive feature of PLS is its ability to deal with highly collinear input data in X. The objective of PLS is to extract latent variables that are most relevant to exploring the output variations while effectively exploring the variance in the input space. In the case that the rank of X is less than the number of input variables, it is known that the number of PLS factors does not go beyond the rank of X ([56]). In other words all variations in X are represented with the number of LV’s up to the rank of X. The above feature, however, is not available for traditional dynamic PLS that simply augments lagged inputs to form an extended input matrix, as in this case the extended input matrix usually has a much larger dimension than the original data matrix. Fortunately, this feature is preserved in DiPLS, since 97 dynamic inner relations are built in the model. We summarize this feature of DiPLS in the following theorem. Theorem4.3. Assumingrank(X) =r<m, wherem is the number of columns of X, then, afterr DiPLS factors are extracted, the input residuals X r+j = 0 forj 0 Proof. Based on Relation (1) of Lemma 1, it is sufficient to show that X r+1 = 0. Afterr DiPLS factors are extracted, all latent scores are collected in T r = [t 1 t 2 t r ]. By iterating the DiPLS outer model relations, we have X r+1 = X r t r p T r = (I t r t T r =t T r t r )X r = = (I t r t T r =t T r t r ) (I t 1 t T 1 =t T 1 t 1 )X Using the orthogonality of t j in Theorem 1, it is straightforward to show that the above relation is equivalent to X r+1 = (I T r (T T r T r ) 1 T T r )X which implies that X r+1 is the projection of X on tospan(T r ) ? . Since T r = XR andrank(T r ) =rank(X) =r, columns of T r form an orthogonal basis ofspan(X). As a consequence,span(X) =span(T r ). There- fore, projection of span(X) on to span(T r ) ? is nil, which implies X r+1 = 0. 98 This theorem has several implications in practice. First, DiPLS needs no more thanm factors to exhaust all variabilities in the data. This is a dras- tic improvement over the conventional dynamic PLS where the input ma- trix is augmented with lagged data, which usually requires a large number of LVs to explore variablilities in the data. Second, when the input data are not fully excited, which is typical for routinely collected operational data, DiPLS needs only up to the rank of X to extract all variabilities in the data. Since the number of LV’s is closely related to and upper-bounded by the rank of X, the extracted latent factors are fewer than those extracted by tra- ditional DPLS and are easy to interpret. 4.4.4 Eigenvector and Singular Vector Interpretations As in the static PLS, where the loadings can be interpreted as eigen- vectors or singular vectors of some matrices, the DiPLS loadings also have eigenvector and singular vector interpretations. Focusing on the DiPLS al- gorithm in Table 4.1 and denote X = s X i=0 i X si The outer model relations in Table 4.1 lead to w/ X T u s = X T Y s q (4.37) q/ Y T s s X i=0 i X si w = Y T s X w (4.38) 99 where ”/” denotes that the quantities on both sides differ by a proportional constant. The above relations can be rearranged as follows, w/ X T Y s Y T s X w q/ Y T s X X T Y s q which indicates that w and q are indeed eigenvectors of the respective ma- trices. Since the DiPLS objective aims to maximize the co-variation, these eigenvectors are the ones that correspond to the largest eigenvalues of the respective matrices. Consequently, from (4.37) it is straightforward to show that w and q are the left and right singular vector of X T Y s corresponding to the largest singular value. This interpretation is analogous to that given in [31] for static PLS. Nevertheless, we do not recommend using singular value decom- position (SVD) to calculate the DiPLS vectors since the matrix X varies with. The DiPLS algorithm in Table 4.1 is recommended which converges faster than using SVD in the outer model iteration. 4.5 Case Studies on Simulation data In this section, three sets of data are simulated, each corresponding to a scenario. In Scenario 1, both input X and Y are generated from a static model. In Scenario 2, input X is generated from a static model, and output Y is generated from a dynamic model. In Scenario 3, input X is generated 100 from a dynamic model, and output Y is generated from a static model. The advantages of DiPLS over traditional PLS is demonstrated in these three basic examples. 4.5.1 Scenario 1 X and Y are generated from a static process. x k = Pt k + e k y k = Cx k + v k P = 0 B B B B @ 0:5765 0:2856 0:1614 0:3660 0:0458 0:9060 0:5889 0:4645 0:9942 0:3572 0:3450 0:7396 0:4036 0:6851 0:2262 1 C C C C A , C = 0 B B B B @ 0:7451 0:4928 0:7320 0:4738 0:5652 1 C C C C A T where e k 2 R 5 N([0; 0:5 2 ]), and v k 2 R N([0; 0:5 2 ]), t k 2 R 3 N([0; 2 2 ]). 1000 data points are generated. The first 500 data points are used as training dataset to train the model, and the next 400 data points are used as the development dataset to select the parameter, and the last 100 data points are used as test dataset to evaluate the prediction result. The optimal parameters determined by cross validation iss = 0 and the number of com- ponents is 3.s = 0 indicates that DiPLS reduces to traditional PLS, which is consistent with the static model of inputs and outputs. Fig. 4.1 shows the prediction results of DiPLS and PLS, from which we can see DiPLS gives the same result as PLS. This is consistent with our analysis. If we increase s to 1, the values of 0 ; 1 , which are the weights 101 0 10 20 30 40 50 60 70 80 90 100 PLS Preidction Y 0 10 20 30 40 50 60 70 80 90 100 DiPLS Preidction Y Figure 4.1: Prediction result of DiPLS and PLS for Scenario 1 of time lag 0 and 1, are listed in Table 4.2. We can tell from the table that 2 0 >> 2 1 for each iteration, which implies the input data with lag 1 has little impact in the outer model building. DiPLS reduces to PLS even though excess time lags are included. 102 Table 4.2: 0 ; 1 for each factor in Scenario1 Value factor 1 factor 2 factor 3 0 0.9998 -0.9590 -0.9978 1 -0.217 0.2833 0.0670 4.5.2 Scenario 2 X is generated from a dynamic process, while Y is generated from a static process. t k = A 1 t k1 + A 2 t k2 + f k x k = Pt k + e k y k = Cx k + v k where f k 2R 3 N([0; 0:5 2 ]), P and C are the same as Scenario 1. A 1 = 0 @ 0:6767 0:5809 0:9315 1:2812 0:5343 1:6000 1:5083 0:9991 0:7529 1 A A 2 = 0 @ 0:7155 0:0652 1:1192 1:1132 0:5371 0:1691 0:5571 1:0748 0:2330 1 A We use the same number of data points for the training dataset, de- velopment dataset, and testing dataset for scenario 2 as scenario 1. The op- timal parameters determined by cross validation iss = 0 and the number of factors is 3. The reason thats = 0 is because DiPLS considers the covariance between the input and the output, not the covariance of the input or the output. Therefore,s is only determined by the relationship between input and output. This example shows that, even though the input variables con- tain dynamics, the DiPLS model relation is static as long as the input-output relation is static. This example also indicates that the dynamics left in the 103 input residuals should be further modeled dynamically if so interested. The prediction results of DiPLS and PLS are shown in Fig.4.2 From the figure, 0 10 20 30 40 50 60 70 80 90 100 PLS Preidction Y 0 10 20 30 40 50 60 70 80 90 100 DiPLS Preidction Y Figure 4.2: Prediction result of DiPLS and PLS for Scenario 2 we can see DiPLS gives the same result as PLS, which is consistent with the analysis. When s is increased to 3, the values of 0 ; 1 ; 2 ; 3 are listed in Table 4.3. We can tell from the table that 0 is much larger than the square of other’s, therefore, the input data with no lag dominates the DiPLS result. DiPLS performs like a PLS model even though excessive lagged input data are included. 104 Table 4.3: 0 ; 1 , 3 , 4 for each factor in Scenario2 Value factor 1 factor 2 factor 3 0 0.9940 0.9156 -0.9060 1 0.0821 0.3233 -0.1166 2 0.0691 0.0690 0.3554 3 0.0197 -0.2290 0.1980 4.5.3 Scenario 3 X is generated from static process, Y is generated from dynamic process. x k = Pt k + e k y k = Cx k + C 2 x k1 + v k where P; C are the same as Scenario 1. C 2 = 1:9939 0:7728 1:0146 1:1563 1:2307 The same number of data points as in Scenario 1 are used as training dataset, development dataset, testing dataset. The optimal parameters de- termined by cross validation iss = 1 and the number of components is 4. Since y k is related to both x k and x k1 ,s = 1 is consistent with the dynamic structure between X and Y. The prediction results of DiPLS and PLS are shown in Fig.4.3. From Fig.4.3, we can see when there are dynamics be- tween inputs and outputs, DiPLS gives much better results than PLS. Ifs is increased to 4, the values of 0 ; 1 ; 2 ; 3 are listed in Table 4.4 We can see the squares of 2 ; 3 are much smaller than 0 ; 1 in gen- eral. Therefore, the input data with a lag of 2 and a lag of 3 will have little 105 0 10 20 30 40 50 60 70 80 90 100 −3 −2 −1 0 1 2 3 PLS Preidction Y 0 10 20 30 40 50 60 70 80 90 100 −3 −2 −1 0 1 2 3 DiPLS Preidction Y Figure 4.3: Prediction result of DiPLS and PLS for Scenario 3 impact on the result. The DiPLS models built with s = 1 and s = 3 are similar. 4.6 Case study on the Tennessee Eastman Process The Tennessee Eastman Process(TEP) was introduced in Chapter 2 to test the performance of DiPCA algorithm. The process contains 12 manipu- lated variables and 41 measured variables. The measured variables contain 106 Table 4.4: 0 ; 1 , 3 , 4 for each factor in Scenario3 Value factor 1 factor 2 factor 3 factor 4 0 0.4251 0.1102 0.4528 0.9514 1 0.9009 -0.9850 0.8808 0.2832 2 -0.0707 0.1069 -0.1062 -0.0694 3 -0.0517 -0.0789 -0.0888 -0.0996 22 process variables sampled every 3 minutes, and 19 quality variables sam- pled with dead time and time delays. In this case study, 22 process variables XMEAS(1-22) and 11 manipulated variables XMV(1-11) are used as input, XMEAS(38), which is the calculated data from the analyzer, is used as out- put. Sampling frequency for XMEAS(38) is 0.25h and dead time is 0.25h. 400 data points are used as the training dataset, 80 data points are used as the development dataset, and 960 data points are used as the testing dataset. Data are pre-shifted to compensate the 0.25h delay. The optimal parameters determined by cross validation iss = 5, and the number of components is 3. The prediction results of DiPLS and PLS is shown in Fig.4.4 Note that for XMEAS(38), there is only one measured value every five data points due to the lower sampling rate, and the subsequent four values are artificial. Therefore, only 1/5 of the 960 data points are com- pared. Fig.4.4 shows that DiPLS captures trends in the data better than PLS. Table tb:tepmse shows the mean squared error (MSE) of PLS and DiPLS methods respectively, from which it can be seen that DiPLS has a much 107 0 20 40 60 80 100 120 140 160 180 200 −4 −3 −2 −1 0 1 2 3 4 DiPLS Prediction XMEAS(38) 0 20 40 60 80 100 120 140 160 180 200 −4 −3 −2 −1 0 1 2 3 4 PLS Prediction XMEAS(38) Figure 4.4: Prediction result of DiPLS and PLS for TEP 108 Table 4.5: Mean squared errors of PLS and DiPLS for XMEAS(38) PLS DiPLS MSE 0.95 0.55 smaller MSE than PLS. Therefore, DiPLS performs better than PLS in dy- namic process modeling and can be used as a better inferential sensor for industrial processes. 4.7 Summary In this chapter, a dynamic inner PLS is proposed for dynamic pro- cess modeling. The proposed method gives an explicit representation of the structures, and it also provides consistent inner and outer models. For the data from static processes, DiPLS reduces to traditional PLS and gives the same results as PLS. Cross-validation can be used to determine the optimal number of lags and the number of components. Case studies on simulation data and Tennessee Eastman Process show the effectiveness of the proposed method. 109 Chapter 5 DiPLS-based Process Monitoring 5.1 Introduction PLS has been popularly used as a prediction and process monitoring tool since its development[14, 76, 52, 57]. Compared to PCA, which pro- vides a latent structure for one block of data, PLS provides a latent structure between two blocks of data, an input block and an output block. Therefore, it divides the input space into an output related subspace and an output irrelevant subspace. Based on this division, PLS based process monitoring monitors output related faults and output irrelevant faults separately. The monitoring of output related faults is performed on the principal part of the input, and the monitoring of output irrelevant faults is performed on the residual part of the input. However, in PLS modeling, the principal part of the input can still contain variations orthogonal to the output, which makes it inappropriate to monitor output related faults. Several revised PLS algorithms have been proposed to address this issue, such as total PLS[90] and concurrent PLS[64]. The drawback of these PLS-based process monitoring strategies is that they are not valid when there are dependencies between the output 110 and lagged inputs. When the output is related to the past inputs, the prediction accuracy of the output can be further reduced by taking into account past inputs. Ignoring the past input information in the analysis leads to poor modeling of the data, and, although the false alarm rate can be managed, this leads to a high missed alarm rate In Chapter 4, the DiPLS modeling algorithm is presented for dy- namic data modeling. In this chapter, a DiPLS-based process monitoring scheme is proposed. The proposed monitoring method detects output related fault and output irrelevant faults separately. By utilizing the past input information, the control region of the DiPLS monitoring method is more compact. The remainder of this chapter is organized as follows. Section 2 re- views the traditional PLS based process monitoring algorithms. Section 3 presents the proposed DiPLS based fault detection indices. Case studies on TEP data are presented to show the effectiveness of the proposed monitor- ing algorithm in Section 4. Section 5 gives conclusions. 5.2 PLS based process monitoring PLS divides the input space into the principal subspace and the resid- ual subspace. The principal subspace contains variations related to the out- put, and the residual subspace contains variations not related to the output. PLS decomposes a given sample x into an output related part and an output 111 irrelevant part as x = ^ x + ~ x ^ x = Pt = PR T x ~ x = x ^ x = (I PR T )x where ^ x is output related part of x, and ~ x is output irrelevant part of x. Similar to PCA, the typical PLS process monitoring approach is to use T 2 index to monitor the principal subspace, or quality related faults, and useQ index to monitor the residual subspace, or quality irrelevant faults. 1. T 2 index T 2 index is defined as T 2 = t T 1 t = x T R 1 R T x (5.1) where is defined as = 1 N 1 T T T where N is the number of samples, and T is the score matrix from PLS modeling. Similar to PCA, it can be shown that if the data follows a multivariate normal distribution, T 2 is related to the F distribution as[72] N(Nl) l(N 2 1) T 2 F l;Nl IfN is large,T 2 follows a 2 distribution approximately as T 2 2 l 112 The control limits for the T 2 index can be calculated at a given con- fidence level based on the F or 2 distribution. The sample is con- sidered normal if theT 2 index is below the control limit, otherwise, a fault is declared. 2. Q index Similar to PCA, the Q index is defined as the squared norm of the residual, Q =jj~ x 2 jj 2 = x T (I PR T ) T (I PR T )x (5.2) By using Box’s theorem[6], the control limit 2 at confidence level forQ can be derived as[53] 2 = g 2 h; where g = trS(I PR T ) 2 trS(I PR T ) h = [trS(I PR T )] 2 trS(I PR T ) 2 where S is the sample covariance matrix . The sample is considered normal is the Q index within the control limit, otherwise, a fault is declared. 113 5.3 DiPLS-based Fault Detection Indices For a given input vector x k , DiPLS decomposes it as x k = ^ x k + ~ x k = Pt k + ~ x k The output vector y k can be predicted as ^ y k = QB(q 1 )t k where B(q 1 ) is a transfer function. Therefore, t k is a reduced dimension vector that helps to predict y k , and the monitoring of quality related fault can be performed through t k . The residual ~ x k does not contain information about the input, and the monitoring of ~ x k detects process faults that are quality irrelevant. Similar to PLS, theT 2 index and theQ index are proposed to detect quality related faults and quality irrelevant faults respectively. 1. T 2 index TheT 2 index is defined as T 2 = t T k 1 t k = x T k R 1 R T x k The definition of theT 2 index has the same mathematical form as 5.1, therefore, the same control limit can be used. 2. Q index 114 TheQ index is defined as Q =jj~ x 2 jj 2 = x T (I PR T ) T (I PR T )x The definition of theQ index has the same mathematical form as 5.2, therefore, the same control limit can be used. Although the mathematical representations of theT 2 andQ indices and the corresponding control limits are the same as PLS, the different meanings of the input scores t k between PLS and DiPLS makes the fault detection performance different. This will be demonstrated in the next section. 5.4 Case Study on TEP In this section, the proposed DiPLS fault detection indices are tested on TEP data with 15 known faults. The DiPLS model is built in Chapter 4. The fault detection results are shown in Table 5.1. In the 15 known types of faults, fault IDV(1), IDV(2), IDV(6), IDV(7), IDV(8), IDV(12), and IDV(13) are quality related faults. It can be seen from Table 5.1 that the fault detec- tion indices for both PLS and DiPLS have high detection rate. Figure 5.1 shows the plots of XMEAS(38) in training dataset and IDV(8) dataset. The vertical line separates the plots of XMEAS(38) in training dataset and test- ing dataset. The left to the vertical line represents the plot of XMEAS(38) in the training dataset, and the right to the vertical line represents the plot of XMEAS(38) in the testing dataset. It is clear that compared to XMEAS(38) 115 −100 −50 0 50 100 150 200 −15 −10 −5 0 5 10 15 20 Figure 5.1: XMEAS(38) in training dataset and the IDV(8) testing dataset 116 Table 5.1: Fault detection rates of two DiPLS fault detection indices on TEP Fault Type T 2 DiPLS Q DiPLS T 2 PLS Q PLS IDV(1) 100 100 100 100 IDV(2) 98.73 99.36 100 100 IDV(6) 100 100 100 100 IDV(7) 98.09 100 100 100 IDV(8) 96.81 99.36 100 92.36 IDV(12) 96.18 98.09 100 94.27 IDV(13) 97.45 96.82 98.09 96.18 IDV(4) 16.56 100 100 69.43 IDV(5) 41.40 36.94 78.34 44.59 IDV(10) 61.78 43.95 98.73 37.58 IDV(11) 28.03 80.89 94.27 64.33 IDV(14) 12.10 100 100 96.18 IDV(3) 26.75 13.38 60.51 10.83 IDV(9) 27.39 10.19 47.77 14.01 IDV(15) 23.57 14.01 53.50 18.47 in the training dataset, there is a significant change in XMEAS(38) after the fault is introduced. The change is not compensated and continues till the end. Figure 5.2 shows the fault detection results of DiPLS fault detection indices. Figure 5.3 shows the fault detection results of PLS fault detection indices. It can be seen from both figures that PLS monitoring methods and DiPLS monitoring methods detect this fault successfully. In the 15 known types of faults, fault IDV(4), IDV(5), IDV(10), IDV(11), IDV(14) are quality irrelevant faults. Therefore, the fault detection index T 2 should have low detection rates on these faults. Compare the detection results ofT 2 indices of PLS and DiPLS in Table 5.1, it can be seen that DiPLS basedT 2 index has 117 0 50 100 150 200 0 200 400 600 T 2 DiPLS 0 50 100 150 200 0 500 1000 1500 Q DiPLS Figure 5.2: DiPLS fault detection results of IDV(8) 118 0 50 100 150 200 0 500 1000 1500 2000 2500 T 2 PLS 0 50 100 150 200 0 100 200 300 400 Q PLS Figure 5.3: PLS fault detection results of IDV(8) 119 a lower detection rate than PLS basedT 2 index. Figure 5.4 shows the plots of XMEAS(38) in the training dataset and IDV(14) dataset. It is obvious that there is no change in XMEAS(38) after the fault is introduced, which is further explained that IDV(14) is a quality irrelevant fault. Figure 5.5 and −100 −50 0 50 100 150 200 −4 −3 −2 −1 0 1 2 3 Figure 5.4: XMEAS(38) in training dataset and IDV(14) dataset Figure 5.6 show the fault detection results of DiPLS and PLS respectively. It can be seen from Figure 5.5 thatT 2 index for DiPLS cannot detect this fault successfully, which agrees with our analysis. However, T 2 index for PLS has high detection rate even though this fault is output irrelevant, which is 120 against our analysis. Therefore, DiPLS basedT 2 is more effective than PLS basedT 2 index. Figure 5.7 shows the plots of XMV(10) in training dataset 0 50 100 150 200 0 5 10 15 20 T 2 DiPLS 0 50 100 150 200 0 500 1000 1500 Q DiPLS Figure 5.5: DiPLS fault detection results of IDV(14) and IDV(14) dataset. An obvious oscillation starts after the fault is intro- duced and last till the end, which indicates a process fault. From Figure 5.5 and Figure 5.6, it can be seen that this process fault is detected success- fully by Q index of both DiPLS and PLS methods. Table shows the false alarm rate by applying DiPLS and PLS fault detection indices to a fault free testing dataset. Figure 5.8 and Figure 5.9 show the fault detection results 121 0 50 100 150 200 0 200 400 600 800 1000 T 2 PLS 0 50 100 150 200 0 50 100 150 200 250 Q PLS Figure 5.6: PLS fault detection results of IDV(14) 122 −500 0 500 1000 −25 −20 −15 −10 −5 0 5 10 15 20 25 Figure 5.7: XMV(10) in training dataset and IDV(14) dataset 123 on the fault free dataset respectively. It can be seen the false alarm rate is significantly reduced by using DiPLS based fault detection indices. Table 5.2: False alarm rates of two DiPLS fault detection indices on TEP T 2 DiPLS Q DiPLS T 2 PLS Q PLS False Alarm Rate 17.83 7.64 54.78 19.11 0 50 100 150 200 0 5 10 15 20 T 2 DiPLS 0 50 100 150 200 0 20 40 60 80 Q DiPLS Figure 5.8: DiPLS fault detection results on fault free dataset 124 0 50 100 150 200 0 20 40 60 80 100 T 2 PLS 0 50 100 150 200 0 5 10 15 Q PLS Figure 5.9: PLS fault detection results on fault free dataset 125 5.5 Summary In this chapter, a DiPLS based process monitoring strategy is pro- posed. Two fault detection indices and the corresponding control limits are derived. T 2 index is proposed to monitor the output or quality related faults, whileQ index is proposed to monitor the output or quality irrelevant faults. Case study on TEP data showed that all quality related faults can be successfully detected by DiPLS fault detection indices. The false alarm rates are also improved by DiPLS based fault detection compared to traditional PLS based fault detection. In addition,T 2 index for DiPLS has lower fault detection rates on quality irrelevant faults thanT 2 index for PLS, which in- dicates a better separation of the input space. 126 Chapter 6 Conclusions This dissertation presents a framework for dynamic latent struc- tured data analysis. Data from industrial processes are often highly cross-correlated and auto-correlated. Therefore, it is reasonable to assume the data are generated from a lower dimensional dynamic latent model. To obtain the dynamic latent model from the observed data, two important problems need to be solved. The first problem is how to extract the dynamic latent variables from the observed data, and the second problem is how to build a model for the extracted dynamic latent variables. In this framework, we propose to combine these two problems together such that the extraction of dynamic latent variables is consistent with the model built for the dynamic latent variables. Under this framework, DiPCA and DiPLS algorithms are developed as extensions of traditional PCA and PLS algorithms for dynamic data modeling. DiPCA extracts dynamic latent variables to capture the most auto- correlative dynamics in the data. The captured dynamic components form the most self-predictable subspace, while the remaining prediction-errors are least predictable and can be treated using static PCA. In this case, the 127 DiPCA model can be interpreted as a whitening filter with a reduced- dimensional latent structure, where the dimension of the dynamic latent space is generally much smaller than the dimension of the original variable space. This feature is not available in other time series modeling or the Kalman filter approach. In addition, DiPCA based process monitoring strategy is derived, where the dynamic relationships, principal static relationships, and residuals are monitored separately. This feature is not available in monitoring methods based on other dynamic PCA algorithms. While DiPCA explores the dynamic latent structure for one data block, DiPLS extracts the dynamic latent relationships between two data blocks, the input and output, or the process variables and quality variables. For data from dynamic processes, DiPLS provides an explicit representation of the latent structure and consistent inner and outer models. For the data from static processes, DiPLS reduces to traditional PLS. In addition, DiPLS based process monitoring strategy is derived, which monitors the quality related faults and quality irrelevant faults separately. The efficiency of the fault detection scheme is validated through a case study on Tennessee Eastman Process data. Future directions of research can involve DiPCA and DiPLS based faults diagnosis and reconstruction. In the case of nonlinear processes, ker- nel DiPCA and DiPLS could be an interesting extension. Another promising 128 research topic would be how to further decompose the principal subspace and residual subspace obtained by DiPLS modeling. Since there can still be large variations remaining in the residual subspace, it could be interesting to apply DiPCA to the residual part. 129 Bibliography [1] H. Akaike, “Information theory and an extension of the maximum like- lihood principle,” in Selected Papers of Hirotugu Akaike. Springer, 1998, pp. 199–213. [2] G. Baffi, E. Martin, and A. Morris, “Non-linear projection to latent structures revisited (the neural network PLS algorithm),” Computers & Chemical Engineering, vol. 23, no. 9, pp. 1293–1307, 1999. [3] B. R. Bakshi, “Multiscale PCA with application to multivariate statis- tical process monitoring,” AIChE journal, vol. 44, no. 7, pp. 1596–1610, 1998. [4] C. M. Bishop, “Pattern recognition,” Machine Learning, vol. 128, 2006. [5] G. Box and T. Kramer, “Statistical process monitoring and feedback ad- justmenta discussion,” Technometrics, vol. 34, no. 3, pp. 251–267, 1992. [6] G. E. Box et al., “Some theorems on quadratic forms applied in the study of analysis of variance problems, i. effect of inequality of vari- ance in the one-way classification,” The annals of mathematical statistics, vol. 25, no. 2, pp. 290–302, 1954. [7] G. E. Box, G. M. Jenkins, and G. C. Reinsel, Time series analysis: forecast- ing and control. John Wiley & Sons, 2011, vol. 734. [8] E. F. Camacho and C. B. Alba, Model predictive control. Springer Science & Business Media, 2013. [9] R. B. Cattell, “The scree test for the number of factors,” Multivariate behavioral research, vol. 1, no. 2, pp. 245–276, 1966. [10] J. Chen and K.-C. Liu, “On-line batch process monitoring using dy- namic PCA and dynamic PLS models,” Chemical Engineering Science, vol. 57, no. 1, pp. 63–75, 2002. 130 [11] L. H. Chiang, R. D. Braatz, and E. L. Russell, Fault detection and diagnosis in industrial systems. Springer Science & Business Media, 2001. [12] L. H. Chiang, M. E. Kotanchek, and A. K. Kordon, “Fault diagnosis based on fisher discriminant analysis and support vector machines,” Computers & chemical engineering, vol. 28, no. 8, pp. 1389–1401, 2004. [13] S. W. Choi, C. Lee, J.-M. Lee, J. H. Park, and I.-B. Lee, “Fault detec- tion and identification of nonlinear processes based on kernel PCA,” Chemometrics and intelligent laboratory systems, vol. 75, no. 1, pp. 55–67, 2005. [14] B. S. Dayal and J. F. MacGregor, “Recursive exponentially weighted PLS and its applications to adaptive control and prediction,” Journal of Process Control, vol. 7, no. 3, pp. 169–179, 1997. [15] G. Diana and C. Tommasi, “Cross-validation methods in principal component analysis: a comparison,” Statistical Methods and Applica- tions, vol. 11, no. 1, pp. 71–82, 2002. [16] C. Ding and X. He, “K-means clustering via principal component anal- ysis,” in Proceedings of the twenty-first international conference on Machine learning. ACM, 2004, p. 29. [17] J. J. Downs and E. F. Vogel, “A plant-wide industrial process control problem,” Computers & chemical engineering, vol. 17, no. 3, pp. 245–255, 1993. [18] L. Fortuna, S. Graziani, A. Rizzo, and M. G. Xibilia, Soft sensors for mon- itoring and control of industrial processes. Springer Science & Business Media, 2007. [19] Z. Ge and Z. Song, “Mixture bayesian regularization method of PPCA for multimode process monitoring,” AIChE Journal, vol. 56, no. 11, pp. 2838–2849, 2010. [20] P . Geladi and B. R. Kowalski, “Partial least-squares regression: a tuto- rial,” Analytica chimica acta, vol. 185, pp. 1–17, 1986. 131 [21] D. M. Hawkins, “The detection of errors in multivariate data using principal components,” Journal of the American Statistical Association, vol. 69, no. 346, pp. 340–344, 1974. [22] Q. P . He, S. J. Qin, and J. Wang, “A new fault diagnosis method using fault directions in fisher discriminant analysis,” AIChE journal, vol. 51, no. 2, pp. 555–571, 2005. [23] I. S. Helland, “On the structure of partial least squares regression,” Communications in statistics-Simulation and Computation, vol. 17, no. 2, pp. 581–607, 1988. [24] A. H¨ oskuldsson, “PLS regression methods,” Journal of chemometrics, vol. 2, no. 3, pp. 211–228, 1988. [25] J. E. Jackson, A user’s guide to principal components. John Wiley & Sons, 2005, vol. 587. [26] S. Joe Qin, “Statistical process monitoring: basics and beyond,” Journal of chemometrics, vol. 17, no. 8-9, pp. 480–502, 2003. [27] P . Kadlec, B. Gabrys, and S. Strandt, “Data-driven soft sensors in the process industry,” Computers & Chemical Engineering, vol. 33, no. 4, pp. 795–814, 2009. [28] M. Kano and Y. Nakagawa, “Data-based process monitoring, process control, and quality improvement: Recent developments and applica- tions in steel industry,” Computers & Chemical Engineering, vol. 32, no. 1, pp. 12–24, 2008. [29] M. Kano, S. Tanaka, S. Hasebe, I. Hashimoto, and H. Ohno, “Mon- itoring independent components for fault detection,” AIChE Journal, vol. 49, no. 4, pp. 969–976, 2003. [30] M. H. Kaspar and W. H. Ray, “Dynamic PLS modelling for process control,” Chemical Engineering Science, vol. 48, no. 20, pp. 3447–3461, 1993. 132 [31] M. H. Kaspar and W. H. Ray, “Partial least squares modelling as succes- sive singular value decompositions,” Computers & chemical engineering, vol. 17, no. 10, pp. 985–989, 1993. [32] D. Kim and I.-B. Lee, “Process monitoring based on probabilistic PCA,” Chemometrics and intelligent laboratory systems, vol. 67, no. 2, pp. 109– 123, 2003. [33] J. V . Kresta, J. F. MacGregor, and T. E. Marlin, “Multivariate statistical monitoring of process operating performance,” The Canadian Journal of Chemical Engineering, vol. 69, no. 1, pp. 35–47, 1991. [34] W. Ku, R. H. Storer, and C. Georgakis, “Disturbance detection and iso- lation by dynamic principal component analysis,” Chemometrics and in- telligent laboratory systems, vol. 30, no. 1, pp. 179–196, 1995. [35] S. Lakshminarayanan, S. L. Shah, and K. Nandakumar, “Modeling and control of multivariable processes: dynamic PLS approach,” AIChE Journal, vol. 43, no. 9, pp. 2307–2322, 1997. [36] J.-M. Lee, C. Yoo, and I.-B. Lee, “Fault detection of batch processes us- ing multiway kernel principal component analysis,” Computers & chem- ical engineering, vol. 28, no. 9, pp. 1837–1847, 2004. [37] J.-M. Lee, C. Yoo, and I.-B. Lee, “Statistical process monitoring with independent component analysis,” Journal of Process Control, vol. 14, no. 5, pp. 467–485, 2004. [38] G. Li, S. J. Qin, and D. Zhou, “Geometric properties of partial least squares for process monitoring,” Automatica, vol. 46, pp. 204–210, 2010. [39] G. Li, B. Liu, S. J. Qin, and D. Zhou, “Dynamic latent variable modeling for statistical process monitoring,” in Proc. IFAC World Congress, Milan, Italy, 2011, pp. 12 886–12 891. [40] G. Li, B. Liu, S. J. Qin, and D. Zhou, “Quality relevant data-driven mod- eling and monitoring of multivariate dynamic processes: The dynamic T-PLS approach,” Neural Networks, IEEE Transactions on, vol. 22, no. 12, pp. 2262–2271, 2011. 133 [41] G. Li, S. J. Qin, and D. Zhou, “A new method of dynamic latent- variable modeling for process monitoring,” Industrial Electronics, IEEE Transactions on, vol. 61, no. 11, pp. 6438–6445, 2014. [42] W. Li, H. H. Yue, S. Valle-Cervantes, and S. J. Qin, “Recursive PCA for adaptive process monitoring,” Journal of process control, vol. 10, no. 5, pp. 471–486, 2000. [43] W. Lin, Y. Qian, and X. Li, “Nonlinear dynamic principal component analysis for on-line process monitoring and diagnosis,” Computers & Chemical Engineering, vol. 24, no. 2, pp. 423–429, 2000. [44] F. Lindgren, P . Geladi, and S. Wold, “The kernel algorithm for PLS,” Journal of Chemometrics, vol. 7, no. 1, pp. 45–59, 1993. [45] N. Lu, Y. Yao, F. Gao, and F. Wang, “Two-dimensional dynamic PCA for batch process monitoring,” AIChE Journal, vol. 51, no. 12, pp. 3300– 3304, 2005. [46] J. MacGregor, “Multivariate statistical methods for monitoring large data sets from chemical processes,” in AIChE Meeting, 1989. [47] J. F. MacGregor, C. Jaeckle, C. Kiparissides, and M. Koutoudi, “Process monitoring and diagnosis by multiblock PLS methods,” AIChE Journal, vol. 40, no. 5, pp. 826–838, 1994. [48] K. Mardia, “Mahalanobis distances and angles,” Multivariate Analysis (IV), pp. 495–511, 1977. [49] T. E. Marlin, Process control. McGraw-Hill New York, 2000. [50] H. Martens, T. Naes, and K. Norris, “Multivariate calibration by data compression,” 1987. [51] M. Misra, H. H. Yue, S. J. Qin, and C. Ling, “Multivariate process mon- itoring and fault diagnosis by multi-scale PCA,” Computers & Chemical Engineering, vol. 26, no. 9, pp. 1281–1293, 2002. 134 [52] P . Nomikos and J. F. MacGregor, “Multi-way partial least squares in monitoring batch processes,” Chemometrics and intelligent laboratory sys- tems, vol. 30, no. 1, pp. 97–108, 1995. [53] P . Nomikos and J. F. MacGregor, “Multivariate SPC charts for monitor- ing batch processes,” Technometrics, vol. 37, no. 1, pp. 41–59, 1995. [54] Y. C. Pan, S. J. Qin, P . Nguyen, and M. Barham, “Hybrid inferential modeling for vapor pressure of hydrocarbon mixtures in oil produc- tion,” Industrial & Engineering Chemistry Research, vol. 52, no. 35, pp. 12 420–12 425, 2013. [55] K. Person, “On lines and planes of closest fit to system of points in space. philiosophical magazine, 2, 559-572,” 1901. [56] S. J. Qin, “Recursive PLS algorithms for adaptive data modeling,” vol. 23, pp. 503–514, 1998. [57] S. J. Qin, “Recursive PLS algorithms for adaptive data modeling,” Com- puters & Chemical Engineering, vol. 22, no. 4, pp. 503–514, 1998. [58] S. J. Qin, “Survey on data-driven industrial process monitoring and diagnosis,” Annual Reviews in Control, vol. 36, no. 2, pp. 220–234, 2012. [59] S. J. Qin, “Process data analytics in the era of big data,” AIChE Journal, vol. 60, no. 9, pp. 3092–3100, 2014. [60] S. J. Qin and T. A. Badgwell, “A survey of industrial model predictive control technology,” Control engineering practice, vol. 11, no. 7, pp. 733– 764, 2003. [61] S. J. Qin and R. Dunia, “Determining the number of principal compo- nents for best reconstruction,” Journal of Process Control, vol. 10, no. 2, pp. 245–250, 2000. [62] S. J. Qin and T. J. McAvoy, “Nonlinear PLS modeling using neural net- works,” Computers & Chemical Engineering, vol. 16, no. 4, pp. 379–391, 1992. 135 [63] S. J. Qin and T. McAvoy, “Nonlinear FIR modeling via a neural net PLS approach,” Computers & chemical engineering, vol. 20, no. 2, pp. 147–159, 1996. [64] S. J. Qin and Y. Zheng, “Quality-relevant and process-relevant fault monitoring with concurrent projection to latent structures,” AIChE Journal, vol. 59, no. 2, pp. 496–504, 2013. [65] A. Raich and A. Cinar, “Statistical process monitoring and distur- bance diagnosis in multivariable continuous processes,” AIChE Journal, vol. 42, no. 4, pp. 995–1009, 1996. [66] N. Ricker, “Use of biased least-squares estimators for parameters in discrete-time pulse-response models,” no. 27, 1988. [67] R. Rosipal and L. J. Trejo, “Kernel partial least squares regression in reproducing kernel hilbert space,” Journal of machine learning research, vol. 2, no. Dec, pp. 97–123, 2001. [68] E. L. Russell, L. H. Chiang, and R. D. Braatz, “Fault detection in indus- trial processes using canonical variate analysis and dynamic principal component analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 51, no. 1, pp. 81–93, 2000. [69] B. Sch¨ olkopf, A. Smola, and K.-R. M ¨ uller, “Kernel principal compo- nent analysis,” in International Conference on Artificial Neural Networks. Springer, 1997, pp. 583–588. [70] J. H. Stock and M. W. Watson, “Forecasting using principal compo- nents from a large number of predictors,” Journal of the American statis- tical association, vol. 97, no. 460, pp. 1167–1179, 2002. [71] M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 61, no. 3, pp. 611–622, 1999. [72] N. D. Tracy, J. C. Young, and R. L. Mason, “Multivariate control charts for individual observations,” Journal of Quality Technology, vol. 24, no. 2, pp. 88–95, 1992. 136 [73] J. Trygg and S. Wold, “Orthogonal projections to latent structures (o- pls),” Journal of chemometrics, vol. 16, no. 3, pp. 119–128, 2002. [74] S. Valle, W. Li, and S. J. Qin, “Selection of the number of principal com- ponents: the variance of the reconstruction error criterion with a com- parison to other methods,” Industrial & Engineering Chemistry Research, vol. 38, no. 11, pp. 4389–4401, 1999. [75] V . Venkatasubramanian, R. Rengaswamy, S. N. Kavuri, and K. Yin, “A review of process fault detection and diagnosis: Part iii: Process history based methods,” Computers & chemical engineering, vol. 27, no. 3, pp. 327–346, 2003. [76] B. M. Wise and N. B. Gallagher, “The process chemometrics approach to process monitoring and fault detection,” Journal of Process Control, vol. 6, no. 6, pp. 329–348, 1996. [77] H. Wold, “Estimation of principal components and related models by iterative least squares. multivariate analysis. edited by: Krishnaiaah pr. 1966.” [78] H. Wold, Path models with latent variables: The NIP ALS approach. Acad. Press, 1975. [79] H. Wold, “Partial least squares,” Encyclopedia of statistical sciences, 1985. [80] S. Wold, “Cross-validatory estimation of the number of components in factor and principal components models,” Technometrics, vol. 20, no. 4, pp. 397–405, 1978. [81] S. Wold, K. Esbensen, and P . Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987. [82] S. Wold, A. Ruhe, H. Wold, and W. Dunn, III, “The collinearity problem in linear regression. the partial least squares (PLS) approach to gener- alized inverses,” SIAM Journal on Scientific and Statistical Computing, vol. 5, no. 3, pp. 735–743, 1984. 137 [83] S. Wold, M. Sj¨ ostr¨ om, and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemometrics and intelligent laboratory systems, vol. 58, no. 2, pp. 109–130, 2001. [84] J. Yang, D. Zhang, A. F. Frangi, and J.-y. Yang, “Two-dimensional PCA: a new approach to appearance-based face representation and recog- nition,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 1, pp. 131–137, 2004. [85] K. Y. Yeung and W. L. Ruzzo, “Principal component analysis for clus- tering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763–774, 2001. [86] S. Yin, S. X. Ding, A. Haghani, H. Hao, and P . Zhang, “A compari- son study of basic data-driven fault diagnosis and process monitoring methods on the benchmark tennessee eastman process,” Journal of Pro- cess Control, vol. 22, no. 9, pp. 1567–1581, 2012. [87] S. Yin, S. X. Ding, X. Xie, and H. Luo, “A review on basic data-driven approaches for industrial process monitoring,” IEEE Transactions on In- dustrial Electronics, vol. 61, no. 11, pp. 6418–6428, 2014. [88] H. H. Yue and S. J. Qin, “Reconstruction-based fault identification using a combined index,” Industrial & engineering chemistry research, vol. 40, no. 20, pp. 4403–4414, 2001. [89] Y. Zhang and S. J. Qin, “Improved nonlinear fault detection technique and statistical analysis,” AIChE Journal, vol. 54, no. 12, pp. 3207–3220, 2008. [90] D. Zhou, G. Li, and S. J. Qin, “Total projection to latent structures for process monitoring,” AIChE Journal, vol. 56, no. 1, pp. 168–178, 2010. [91] M. Zhu and A. Ghodsi, “Automatic dimensionality selection from the scree plot via the use of profile likelihood,” Computational Statistics & Data Analysis, vol. 51, no. 2, pp. 918–930, 2006.
Abstract (if available)
Abstract
Data collected from industrial processes are often of high dimensions, highly cross-correlated and auto-correlated. Therefore, such data can often be represented with a lower dimensional latent model, which extracts major cross-correlations and/or auto-correlations and leaves only insignificant noise in the residuals. Principal component analysis (PCA) and partial least squares (PLS) are two latent modeling methods that have been widely used for static data modeling and process monitoring. However, it is not appropriate to directly apply PCA and PLS to cross-correlated data from a dynamic process, since PCA and PLS are only able to model static components of the data. ❧ In this dissertation, a novel dynamic-inner PCA (DiPCA) algorithm and a novel dynamic-inner PLS (DiPLS) algorithm are proposed for dynamic data modeling. DiPCA extracts a set of dynamic latent variables that capture the most dynamic variations in the data. After the dynamic variations are extracted, the residuals are essentially uncorrelated in time and static PCA can be applied. Unlike existing dynamic PCA algorithms, DiPCA generates a subspace of principal time series that are most predictable from their past data. ❧ DiPLS extracts two sets of latent variables: one set from the input data, and another set from the output data. These latent variables are extracted such that the output latent variables are most predictable from the current and past input latent variables. In the case where no dynamic relationships exist between the input and the output, DiPLS reduces to PLS, as one would expect. Unlike existing dynamic PLS algorithms, DiPLS provides an explicit model between output latent variables and the current and past input latent variables. In addition, outer model that generates the latent variables in DiPLS is built with the assumption of a dynamic inner model structure, rather than a static inner model structure as found in the literature, making the outer modeling compatible with the inner dynamic modeling. ❧ For the purpose of process monitoring, DiPCA and DiPLS based fault detection indices are derived. DiPCA based monitoring focuses on process faults, and the derived fault detection indices monitor the dynamic relationships, principal static relationships, and residual innovations separately. In contrast, DiPLS based monitoring emphasizes quality related faults. The derived DiPLS based fault detection indices monitor the quality related faults and quality irrelevant faults separately, thus leading to greater interpretability of the detection results. ❧ Case studies on simulation data, a real industrial boiler process dataset and Tennessee Eastman Process demonstrate the effectiveness of the proposed DiPCA and DiPLS modeling algorithms and their corresponding process monitoring strategies.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Concurrent monitoring and diagnosis of process and quality faults with canonical correlation analysis
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Robust real-time algorithms for processing data from oil and gas facilities
PDF
Inferential modeling and process monitoring for applications in gas and oil production
PDF
Process data analytics and monitoring based on causality analysis techniques
PDF
Statistical modeling and process data analytics for smart manufacturing
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
PDF
Exploiting latent reliability information for classification tasks
PDF
Nonlinear dynamical modeling of single neurons and its application to analysis of long-term potentiation (LTP)
PDF
Digital signal processing techniques for music structure analysis
PDF
Nanostructure interaction modeling and estimation for scalable nanomanufacturing
PDF
A data-driven approach to image splicing localization
PDF
Prediction modeling and statistical analysis of amino acid substitutions
PDF
Scalable multivariate time series analysis
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Generating gestures from speech for virtual humans using machine learning approaches
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Distributed edge and contour line detection for environmental monitoring with wireless sensor networks
PDF
Novel and efficient schemes for security and privacy issues in smart grids
Asset Metadata
Creator
Dong, Yining
(author)
Core Title
Dynamic latent structured data analytics
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
12/08/2016
Defense Date
10/18/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
data-driven modeling,dynamic data modeling,dynamic PCA,dynamic PLS,latent variable model,OAI-PMH Harvest,statistical process monitoring
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Qin, S. Joe (
committee chair
), Huang, Qiang (
committee member
), Kuo, C.-C. Jay (
committee member
)
Creator Email
caicai8766@gmail.com,yiningdo@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-325113
Unique identifier
UC11214666
Identifier
etd-DongYining-4964.pdf (filename),usctheses-c40-325113 (legacy record id)
Legacy Identifier
etd-DongYining-4964.pdf
Dmrecord
325113
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Dong, Yining
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
data-driven modeling
dynamic data modeling
dynamic PCA
dynamic PLS
latent variable model
statistical process monitoring