Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Multi-population optimal change-point detection
(USC Thesis Other)
Multi-population optimal change-point detection
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MULTI-POPULATION OPTIMAL CHANGE-POINT DETECTION by Grigory Sokolov A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2014 Advisor: Alexander Tartakovsky Co-advisor: Sergey Lototsky Copyright 2014 Grigory Sokolov Acknowledgments I would like to use this opportunity to express my special thanks to the people who made this thesis possible. Alexander Tartakovsky, my advisor and mentor: I thank him for his great understanding, patience and support he constantly provided to me. Sergey Lototsky, my co-advisor, who has always been there for me. Aleksey Polunchenko, my future post-doctorate mentor: his help has been truly invaluable. I am looking forward to joining his research team at Binghamton University this Fall. Georgios Fellouris, who put trust in me and walked me through many a difficult problem; without him this work would have been impossible. Susan Friedlander, who has kindly been keeping an eye on me throughout these years. ii Contents Acknowledgments ii Abstract v 1 Overview 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Optimality criteria, CUSUM and the Generalized Shiryaev–Roberts pro- cedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Multi-PopulationChange-PointDetection 14 2.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Single population detection rules and their asymptotic performance . . . 15 2.3 Multi-population detection rules . . . . . . . . . . . . . . . . . . . . . 17 2.4 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 NumericalResults 25 3.1 Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 The numerical method and its accuracy analysis . . . . . . . . . . . . . 30 3.3 Extension to multiple populations . . . . . . . . . . . . . . . . . . . . . 38 3.4 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4 UnstructuredDecentralizedChange-PointDetection 49 4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 The completely unstructured case . . . . . . . . . . . . . . . . . . . . . 53 4.2.1 A level-triggered communication scheme . . . . . . . . . . . . 57 4.2.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Performance evaluation and comparison . . . . . . . . . . . . . . . . . 60 5 Applications 63 5.1 Cybersecurity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 A case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 ReferenceList 70 iii ListofTables 3.1 Convergence analysis for ARL(S A ) assuming = 0:1. . . . . . . . . . 43 3.2 Convergence analysis for ARL(S A ) assuming = 0:5. . . . . . . . . . 44 3.3 Results of accuracy and convergence analysis for STADD(S A ) for = 0:1. 44 3.4 Results of accuracy and convergence analysis for STADD(S A ) for = 0:5. 45 3.5 Convergence analysis for ARL(S M ) assuming = (1:0; 1:4). . . . . . . 46 3.6 Convergence analysis for ARL(C M ) assuming = (1:0; 1:4). . . . . . . 47 3.7 Performance comparison for the minimum of SR procedures. . . . . . . 47 3.8 Performance comparison for the minimum of CUSUM procedures. . . . 47 4.1 Procedure comparison for frequent communications ( = 2). . . . . . . 61 4.2 Procedure comparison for infrequent communications ( = 10). . . . . 62 ListofFigures 3.1 Lagrange linear interpolation polynomials. . . . . . . . . . . . . . . . . 35 3.2 Performance of the Shyriaev-Roberts minimum rule. . . . . . . . . . . 46 3.3 Performance of the CUSUM minimum rule. . . . . . . . . . . . . . . . 48 4.1 Random sampling scheme. . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Procedure comparison for frequent communications ( = 2). . . . . . . 60 4.3 Procedure comparison for infrequent communications ( = 10). . . . . 61 5.1 SYN flood attack: number of attempted connections. . . . . . . . . . . 67 5.2 SYN flood attack: connections birth rate pdf with a Gaussian fit. . . . . 68 5.3 SYN flood attack: long run of the SR procedure; logarithm of the SR statistic vs time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.4 Detection of the SYN flood attack by the single- and multi-population procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 iv Abstract The main aim of this work is to generalize the classical single-population sequential change-point detection to a more general multi-population case, and design procedures that retain optimality properties of their single-population counterparts. Specifically, we propose an alternative performance criterion, better suited for multi-sensor applications. Second, we propose a numerical method to study the operating characteristics of a wide class of multi-population procedures. Specifically, the proposed method is based on the integral-equations approach and uses the collocation technique with the basis functions chosen so as to exploit a certain change-of-measure identity and the SR detec- tion statistic’s unique martingale property. Moreover, the design and analysis of the method are complete in that the question of the method’s accuracy is properly addressed, providing the error bounds and the convergence rates. We then discuss the problem of sequential change-point detection in distributed multi-sensor networks. Specifically, we assume that the sensors need to communicate with the fusion center at a much smaller rate than the rate at which they collect their observations. Additionally, we assume that the sensors can transmit a small number of bits per communication. We propose decentralized sequential detection rule that requires minimal transmission activity from the sensors and is asymptotically optimal for any scenario regarding the unknown subset of affected sensors. v We conclude with a case study, and consider the problem of efficient intrusion or traffic anomaly detection in large-scale high-speed computer networks, and show that proposed procedures are highly effective and advantageous over conventional ones. vi Chapter1 Overview 1.1 Introduction Sequential change-point detection is concerned with the design and analysis of pro- cedures for “online” detection of possible changes in the characteristics of a running process. Specifically, the process is assumed to be monitored through sequentially made observations, and should their behavior suggest the process may have statisti- cally changed, the aim is to conclude so within the fewest observations possible, sub- ject to a tolerable level of the risk of false detection Shiryaev (1978); Basseville and Nikiforov (1993); Poor and Hadjiliadis (2008). The subject’s areas of application are diverse and virtually unlimited, and include industrial quality and process control She- whart (1925, 1931); Kenett and Zacks (1998); Ryan (2011); Montgomery (2012), bio- statistics, economics, seismology Basseville and Nikiforov (1993), forensics, naviga- tion Basseville and Nikiforov (1993), cybersecurity Tartakovsky et al. (2005b, 2006b); Polunchenko et al. (2012); Tartakovsky et al. (2013), communication systems Basseville and Nikiforov (1993), and many more. Assume that the observations are indepen- dent throughout the entire period of surveillance with pre- and post-change distribu- tions fully specified (though not identical). The problem is well understood, and has been solved under a variety of criteria. For a recent survey, see, e.g., Tartakovsky and Moustakides (2010) and Polunchenko and Tartakovsky (2012). We focus on the the minimax approach, whose principal assumption is to view the change-point as unknown (but not random). The procedures of interest are the Page’s (1954) Cumulative Sum 1 (CUSUM), Shiryaev–Roberts (SR) procedure (due to the independent work of Shiryaev (1961, 1963) and Roberts (1966)), and its generalization – the SR–r procedure, pro- posed by Moustakides et al. (2011). A sequential change-point detection procedure, a rule whereby one stops and declares that a change is in effect, is defined as a stopping time adapted to the observed data. We consider the basic discrete-time case Pollak and Tartakovsky (2009a); Shiryaev and Zryumov (2010) which assumes the observations are independent throughout the entire period of surveillance with the pre- and post-change distributions fully specified (but not equal to one another). The change-point is treated as an unknown (but not ran- dom) nuisance parameter. The main aim of this work is to generalize the case of i.i.d. scalar observations to a more general multi-population case, and design several proce- dures that retain optimality properties of their single-population counterparts. Specif- ically, we propose an alternative performance criterion, better suited for multi-sensor applications (such as network security). We discuss the problem of sequential change- point detection in distributed multi-sensor networks under bandwidth and energy con- strains. Particular emphasis in this work is placed on two related detection procedures: a) the original Shiryaev–Roberts (SR) procedure (due to the independent work of Shiryaev (1961, 1963) and that of Roberts (1966); see also Girschick and Rubin (1952)), and b) its recent generalization—the Shiryaev–Roberts–r (SR–r) procedure introduced in Mous- takides et al. (2011) as a version of the original SR procedure with a headstart, akin to Lucas and Crosier (1982). As the SR procedure is a special case of the SR–r pro- cedure (with no headstart, i.e., when r = 0), we will collectively refer to both as the Generalized SR (GSR) procedure, in analogy to the terminology used in Tartakovsky et al. (2012). 2 Our interest in the GSR procedure is due to three reasons. First, the GSR procedure is relatively “young” (the SR–r procedure was proposed in Moustakides et al. (2011) in 2011), and has not yet been fully explored in the literature. Second, the SR procedure has already been proven to be multi-cyclic optimal. This was first established in Shiryaev (1961, 1963) in continuous time for the problem of detecting a shift in the drift of a Brownian motion; see also, e.g., Shiryaev (2002); Feinberg and Shiryaev (2006). An analogous result in discrete time was later obtained in Pollak and Tartakovsky (2009a); Shiryaev and Zryumov (2010), and shortly after generalized in (Polunchenko and Tar- takovsky, 2010, Lemma 1). Neither the famous Cumulative Sum (CUSUM) “inspec- tion scheme” Page (1954) nor the popular Exponentially Weighted Moving Average (EWMA) chart Roberts (1959) possess such a strong optimality property. This notwith- standing, there is currently a vacuum in the literature concerning numerical methodol- ogy to compute the performance of the GSR procedure in the multi-cyclic setup. As a matter of fact, to the best of our knowledge, only Moustakides et al. (2009); Tartakovsky et al. (2009); Moustakides et al. (2011) and Polunchenko et al. (2013b) address this question, and in particular, offer a comparative performance analysis of the CUSUM scheme, EWMA chart and the GSR procedure in the multi-cyclic setup; similar analysis in continuous time can be found, e.g., in Srivastava and Wu (1993). However, the ques- tion of the employed method’s accuracy is only partially answered, with no error bounds or convergence rates supplied. This is common in the literature on the computational aspect of change-point detection: to deal with the accuracy question in an ad hoc man- ner, if even. Some headway to fill in this gap was recently made in Polunchenko et al. (2013a, 2014b,a). The third, equally important reason to consider the GSR procedure is its asymptotic near optimality in the minimax sense of Pollak (1985); see Tartakovsky et al. (2012) for the corresponding result established using the GSR procedure’s exact multi-cyclic optimality. Furthermore, the GSR procedure is also proven Tartakovsky 3 and Polunchenko (2010); Polunchenko and Tartakovsky (2010) to be exactly Pollak- minimax optimal in two special cases. A practical implication of this is that the CUSUM chart is less minimax efficient than the GSR procedure, and the difference is especially contrast when the change is faint; for a few particular scenarios the difference is quan- tified, e.g., in Moustakides et al. (2009); Tartakovsky et al. (2009); Moustakides et al. (2011). To foster and facilitate further research on the GSR procedure, in this work we build on to the work done previously in Moustakides et al. (2009); Tartakovsky et al. (2009); Moustakides et al. (2011); Polunchenko et al. (2013a, 2014b,a) and develop an effi- cient numerical method to compute the performance of the GSR procedure in the multi- population setup. Specifically, the proposed method is based on the integral-equations approach and uses the standard collocation framework (see, e.g., (Atkinson and Han, 2009, Section 12.1.1)) in combination with a certain change-of-measure identity and a certain martingale property specific to the GSR procedure’s detection statistic. As a result, the proposed method’s accuracy, robustness and efficiency improve noticeably, which becomes crucial as the number of populations increases. We also show that the method’s rate of convergence is quadratic, and supply a tight upperbound on the method’s error. Since the method by design is not restricted to a particular data distri- bution or to a specific value of the GSR detection statistic’s headstart, it may help gain greater insight into the properties of the GSR procedure and aid a practitioner to set up the GSR procedure as needed while fully utilizing its potential. The remainder of the work is structured as follows: We first formally state the prob- lem and introduce some existing single-population procedures in Section 1.2. Chapter 2 is devoted to the generalization to a multi-population case, with a special focus on the optimality criterion appropriate to the aforesaid multi-population setting. The numer- ical method and its accuracy analysis are presented in Chapter 3. Last, in Chapter 5 4 we consider the problem of efficient traffic anomaly detection in large-scale high-speed computer networks. In collaboration with USC ISI, we show that proposed procedures are highly effective and advantageous over conventional ones. 1.2 Optimality criteria, CUSUM and the Generalized Shiryaev–Robertsprocedure This section is intended to formally state the problem, set down notation to be used throughout the work, introduce the CUSUM and GSR procedures and review their opti- mality properties. We begin with stating the problem. Letf(x) andg(x) be the observations’ pre- and post-change distribution densities, respectively;g(x)6f(x). Define the change-point, 0 6 61, as the unknown (but not random) serial index of the final pre-change observation (so it can potentially be infinite). That is, the probability density function (pdf) ofX n isf(x) for 16n6, andg(x) forn> + 1. The notation = 0 is to be understood as the case when the pdf ofX n isg(x) for all n> 1, i.e., the data,fX n g n>1 , are affected by change ab initio. Similarly, the notation =1 is to mean that the pdf ofX n isf(x) for alln> 1. Let P k (E k ) be the probability measure (corresponding expectation) given a known change-point = k, where 0 6 k 61. Particularly, P 1 (E 1 ) is the probability measure (corresponding expectation) assuming the observations’ distribution is always f(x) and never changes (i.e., =1). Likewise, P 0 (E 0 ) is the probability measure (corresponding expectation) assuming the observations’ distribution is g(x) from the start (i.e., = 0). From now on T will denote the stopping time associated with a generic detection procedure. 5 Given this context, the standard way to gauge the false alarm risk is through Lorden’s Lorden (1971) Average Run Length (ARL) to false alarm; it is defined as ARL(T ), E 1 [T ]. To introduce the multi-cyclic change-point detection problem, let ( ), n T : ARL(T )> o ; > 1; denote the class of procedures, T , with the ARL to false alarm at least > 1, a pre- selected tolerance level. One of the standard performance measures is the Pollak’s min- imax performance measure J (T ) = sup 0 E [TjT >] = SADD(T ): (1.1) Consider an alternative setup, where it is of utmost importance to detect the change as quickly as possible, even at the expense of raising many false alarms (using a repeated application of the same procedure) before the change occurs. Put otherwise, in exchange for the assurance that the change will be detected with maximal speed, one agrees to go through a “storm” of false alarms along the way (the false alarms are ensued from repeatedly applying the same procedure, starting from scratch after each false alarm). Formally, letT 1 ;T 2 ;::: be sequential independent repetitions of the same stopping time,T , and letT j , T 1 +T 2 + +T j ,j > 1, be the time of thej-th alarm. Define I , minfj > 1:T j > g so thatT I is the time of detection of a true change that occurs at time moment afterI 1 false alarms had been raised. One can then view the differenceT I (> 0) as the detection delay. Let STADD(T ), lim !1 E [T I ] (1.2) 6 be the limiting value of the Average Detection Delay (ADD) referred to as the Sta- tionary ADD (STADD). We hasten to note that the STADD and the Steady-State ADD (SSADD), or the Steady-State ARL, a detection delay measure popular in the areas of statistical process and quality control, are not the same thing; we will comment more on the difference later. The multi-cyclic change-point detection problem is: to findT opt 2 ( ) such that STADD(T opt ) = inf T2( ) STADD(T ) for every > 1. (1.3) As can be seen from the description, the multi-cyclic formulation is instrumental in detecting a change that takes place in a distant future (i.e., is large), and is preceded by a stationary flow of false detections, each with a cost much smaller than that of missing the change by a single observation. By way of example, such scenarios are encountered, e.g., in cybersecurity Polunchenko et al. (2012); Tartakovsky et al. (2013) and in the economic design of control charts Duncan (1956); Montgomery (1980); Lorenzen and Vance (1986); Ho and Case (1994). It is shown in Pollak and Tartakovsky (2009a); Shiryaev and Zryumov (2010) that the multi-cyclic change-point detection problem (1.3) is solved by the (original) Shiryaev– Roberts (SR) procedure Shiryaev (1961, 1963); Roberts (1966) introduced next; inci- dentally, the comparative performance analysis offered in Moustakides et al. (2009); Tartakovsky et al. (2009) demonstrates that both the CUSUM scheme and the EWMA chart are outperformed (in the multi-cyclic sense) by the SR procedure. To introduce the SR procedure, we first construct the corresponding likelihood ratio (LR), since the SR procedure is LR-based. LetH k : = k for 0 6 k < 1 be the hypothesis that the change takes place at time moment = k for 0 6 k < 1. LetH 1 : = 1 be the hypothesis that 7 no change ever occurs (i.e., =1). The joint distribution densities of the sample X 1:n , (X 1 ;:::;X n ),n> 1, under each of these hypotheses are given by p(X 1:n jH 1 ) = n Y j=1 f(X j ) and p(X 1:n jH k ) = k Y j=1 f(X j ) n Y j=k+1 g(X j ); fork<n; withp(X 1:n jH 1 ) =p(X 1:n jH k ) fork>n. The corresponding LR therefore is 1:n;=k , p(X 1:n jH k ) p(X 1:n jH 1 ) = n Y j=k+1 j ; fork<n; where from now on n , g(X n )=f(X n ) is the “instantaneous” LR for then-th obser- vation,X n . We now make an observation that will play an important role in this work. Let P d (t), P d ( 1 6t),t> 0,d =f0;1g, denote the cdf of the LR under measure P d , d =f0;1g, respectively. As the LR is the Radon–Nikod´ ym derivative of measure P 0 with respect to measure P 1 , one can conclude that dP 0 (t) =tdP 1 (t); t> 0; (1.4) cf. Polunchenko et al. (2013a, 2014b). It is assumed that measures P 0 and P 1 are mutually absolutely continuous. We will use this change-of-measure identity heavily in Section 1.3 to improve the accuracy, rate of convergence, and efficiency of our numerical method. 8 Formally, the original SR procedure Shiryaev (1961, 1963); Roberts (1966) and the CUSUM chart Page (1954) are defined as the stopping times S A , inf n> 1: R n >A ; (1.5) C A , inffn> 1 :V n >Ag; (1.6) whereA> 0 is a detection threshold used to control the false alarm risk, and R n , n X k=1 1:n;=k = n X k=1 n Y i=k i ; n> 1; (1.7) V n , max 16k6n 1:n;=k = max 16k6n n Y i=k i ; n> 1; (1.8) are the SR and CUSUM detection statistics; here and throughout the rest of the work in every definition of a detection procedure we will assume that inff?g =1. Note the recursion R n+1 = (1 +R n ) n+1 forn = 0; 1;::: withR 0 = 0; (1.9) V n+1 = max(1;V n ) n+1 forn = 0; 1;::: withV 0 = 1; (1.10) and we stress that R 0 = 0, i.e., the SR statistic starts from zero. Observe now that fR n ng n>0 is a zero-mean P 1 -martingale, i.e., E 1 [R n n] = 0 for anyn> 0. From this and the Optional stopping theorem (see, e.g., (Poor and Hadjiliadis, 2008, Subsec- tion 2.3.2) or (Shiryaev, 1995, Chapter VII)), one can conclude that E 1 [R S A S A ] = 0, whence ARL(S A ), E 1 [S A ] = E 1 [R S A ]>A. It is now easy for one to set the detec- tion threshold,A, so as to ensure ARL(S A ) > for any desired > 1. More specif- ically, it can be shown Pollak (1987) that ARL(S A ) = (A=)[1 +o(1)], as !1, where 2 (0; 1) is the limiting exponential overshoot, a model-dependent constant 9 that can be computed using nonlinear renewal theory Siegmund (1985); Woodroofe (1982). For practical purposes, the approximation ARL(S A ) A= is known to be extremely accurate under broad conditions. More importantly, as shown in Pollak and Tartakovsky (2009a); Shiryaev and Zryumov (2010), the SR procedure is exactly STADD(T )-optimal, i.e., formally: STADD(S A ) = inf T2( ) STADD(T ) for every > 1, whereA > 0 is the solution of the equation ARL(S A ) = . This strong optimality property of the SR procedure (1.5)-(1.9) was recently gener- alized in (Polunchenko and Tartakovsky, 2010, Lemma 1) where the SR procedure was allowed to have a headstart. This version of the SR procedure is known as the Shiryaev– Roberts–r (SR–r) procedure, and it was proposed in Moustakides et al. (2011). Specif- ically, the SR–r procedure regards starting off the original SR procedure (1.5)-(1.9) at a fixed (but specially designed)R r 0 = r,r > 0, i.e.,r > 0 is a headstart. This is sim- ilar to the idea proposed earlier in Lucas and Crosier (1982) for the CUSUM scheme. However, it turns out that, unlike for the CUSUM scheme, giving the SR procedure a headstart is practically “putting it on steroids”: the gain in performance far exceeds that observed in Lucas and Crosier (1982) for the CUSUM scheme. Formally, the SR–r procedure is defined as the stopping time S r A = inffn> 1: R r n >Ag; (1.11) where againA> 0 and is used to control the ARL to false alarm, and R r n+1 = (1 +R r n ) n+1 forn = 0; 1;::: withR r 0 =r> 0; (1.12) 10 and we remark that for r = 0 the SR–r procedure becomes the original SR proce- dure (1.7)-(1.5). For this reason from now on we will collectively refer to both proce- dures as the Generalized SR (GSR) procedure, following the terminology used in Tar- takovsky et al. (2012). Observe thatfR r n nrg n>0 is a zero-mean P 1 -martingale, i.e., E 1 [R r n nr] = 0 for alln> 0 and allr. As a result, one can generalize Pollak (1987) to conclude that ARL(S r A ) A r for sufficiently largeA> 0; (1.13) here2 (0; 1) is again the limiting exponential overshoot. This approximation is also quite accurate under broad conditions. More importantly, it is shown in (Polunchenko and Tartakovsky, 2010, Lemma 1) that the SR–r procedure minimizes the generalized STADD STADD(T ), (r E 0 [T ] + IADD(T )) /(ARL(T ) +r) (1.14) within class ( ); hereafter IADD(T ), 1 X k=0 E k [(Tk) + ] (1.15) is the Integral Average Detection Delay (IADD) andx + , maxf0;xg. From now on we will consider only the generalized STADD. It can be shown (see, e.g., (Pollak and Tartakovsky, 2009a, Theorem 2)) that forr = 0 the generalized STADD coincides with the original STADD given by (1.2); see also, e.g., Shiryaev (1961, 1963, 1978). This is a consequence of the equivalence of the multi-cyclic formulation and the generalized Bayesian formulation. Formally, from (Polunchenko and Tartakovsky, 2010, Lemma 1) 11 we have that STADD(S r A ) = inf T2( ) STADD(T ) for every > 1, whereA andr are such that ARL(S r A ) = is true. We conclude this section with a remark on the difference between the STADD given by (1.14) and the Steady-State ADD (SSADD); the latter is often called the Steady-State ARL, and is a control chart performance metric popular in the area of quality control as metric less prone to the adverse “inertia effect” Yashchin (1987, 1993); Woodall and Adams (1998). Formally, SSADD(T ) , lim k!1 ADD k (T ) with ADD k (T ) , E k [TkjT >k],k> 0; cf. Knoth (2006). The principal difference between the STADD and the SSADD is that the SSADD is assuming the procedure of choice,T , is applied only once, whereas STADD(T ) is assuming repetitive and independent application ofT . Hence, the steady-state regime involved in SSADD(T ) is different from the stationary regime involved in STADD(T ): the former is pertaining to the detection statistic, while the latter is pertaining to the change-point. 1.3 Performanceevaluation We develop a numerical method to evaluate the performance of the GSR proce- dure (1.11)-(1.12) in the multi-cyclic setup (1.3). Specifically, we “gear” the method toward numerical computation of two antagonistic performance measures associated with the GSR stopping timeS r A : a) the usual “in-control” ARL to false alarm, i.e., ARL(S r A ) , E 1 [S r A ], and b) STADD(S r A ), i.e., the Stationary Average Detection Delay (1.14). More concretely, we first derive an integral equation for each perfor- mance measure involved. Using the change-of-measure identity (1.4) we then show that the equation for ARL(S r A ) and that for the numerator of STADD(S r A )—see (1.14)— differ only by the right-hand side (which is completely known for either equation). As a result, both equations can be solved concurrently. Finally, we present our numerical 12 method to (simultaneously) solve the obtained equations, and offer an analysis of the method’s accuracy and rate of convergence. The proposed method is a build-up over one previously proposed in Tartakovsky et al. (2009); Moustakides et al. (2009, 2011) and recently extended in Polunchenko et al. (2014b); see also Polunchenko et al. (2013a). 13 Chapter2 Multi-PopulationChange-Point Detection 2.1 Problemformulation We observe sequentiallyL populations, whose observations are defined on a probability space ( ;F; P). More specifically, X (i) n is the n-th observation at population i and n F (i) n o is the filtration restricted to populationi, i.e.F (i) n =(X (i) k ;kn). At some unknown and unobserved time, a change may occur at only one population, whose identity is not known in advance. Thus, the underlying probability measure P is parameterized by the time of the change and the identity of the population where the change occurs. Therefore, we will denote by P (i) the probability measure when the change occurs at populationi at time, in which caseX (j) n f (j) for everyn andj6=i, whereas X (i) n f (i) for n and X (i) n g (i) for n > . Notice that under P 1 a change never occurs and for everyn it isX (i) n f (i) . On the other hand, under P (i) 0 for everyn it isX (j) n f (j) whenj6= i andX (i) n g (i) . We assume that all observations are temporally and spatially independent, so that all random variables n X (i) n o n;i are independent. The goal is to detect the change as soon as possible, without necessarily identifying the population where the change happened, based on sequential and simultaneous obser- vations of the populations. In other words, the goal is to find anfF n g-stopping timeT , whereF n = (X k ;k n) withX n = X (1) n ;:::;X (L) n . Before we formulate the 14 change-detection problem in the above setup, we will consider the classical case where we have the change-detection problem in only one population. 2.2 Single population detection rules and their asymp- toticperformance We start by considering a single-population change point detection problem, where we assume that all observations are independent, and define the log-likelihood ratio process U n = n X k=1 log k ; n> 1; where k is the instantaneous likelihood ratio as defined in 1.2. Define Kullback-Leibler information numbers I i = E 0 [U 1 ]. We assume that I > 0, so that the change-detection problem is well-posed, i.e. the densitiesf andg are distinct almost everywhere. More- over, we need to assume that I<1. In order to obtain their asymptotic performance, we need to introduce the one-sided SPRT W(A) = inffn 1 :U n Ag; which is an auxiliary stopping time for the detection problem. Indeed, using non-linear renewal theory it has been shown (see Tartakovsky et al. (2012)) that under the second moment condition E 0 jU 1 j 2 <1, for SR–r (1.11) with anyr 0, and CUSUM (1.6), E 0 [S r (A)] = logA +{c r I +o(1); ADD 1 [S r (A)] = logA +{c 1 I +o(1); E 0 [C(A)] = logA +{ I +o(1); 15 and ifr is either fixed orr(A)=A! 0 asA!1, E 1 [S r (A)] = A(1 +o(1)) v ; E 1 [C(A)] = A(1 +o(1)) Iv 2 : Here c r = E 0 " log 1 +r + 1 X k=1 e U k !# ; = E 0 h inf k U k i c 1 = E 0 " log 1 +R 1 + 1 X k=1 e U k !# ; { = lim A!1 E 0 U W(A) A ; v = lim A!1 E 0 h e (U W(A) A) i ; whereR 1 is a random variable with the P 1 -limiting distribution ofR n . Note that{ is the limiting average overshoot andv is the limiting average exponential overshoot of the SPRT statistic. Thus, choosingA so that E 1 [S r ] = and E 1 [C] = , one obtains: E 0 [S r ] = log + logv +{c r I +o(1) (2.1) ADD 1 [S r ] = log + logv +{c 1 I +o(1) (2.2) E 0 [C] = log + 2 logv + log I +{ I +o(1) (2.3) (2.4) For CUSUM and zero-initialized SR procedures, the worst-case scenario occurs when = 0, that is J [S 0 ] = E 0 [S 0 ]; J [C] = E 0 [C]: 16 Both CUSUM and SR–r are second-order optimal in the class of procedures ( ) = fT : E 1 [T ] g; i.e., as !1, inf T2( ) J [T ]J [S r ] =O(1); inf T2( ) J [T ]J [C] =O(1): For SR–r, if r is chosen so that SADD(S r ) = ADD 1 (S r ) and r = o(A) then it is third-order optimal: inf T2( ) J [T ]J [S r ] =o(1): 2.3 Multi-populationdetectionrules Given the optimality properties of the single-population (restricted to populationi) rules S (i) r i andC (i) , it is natural to consider the minimum or the mixture of the corresponding single-population rules, namely S r;M (A 1 ;:::;A L ) = min j S (j) r j (A j ); C M (A 1 ;:::;A L ) = min j C (j) (A j ) for the former, and S r (p;A) = inf ( n 1 : L X j=1 p j R (j) r j ;n A ) ; (2.5) C(p;A) = inf ( n 1 : L X j=1 p j V (j) n A ) : (2.6) 17 for the latter, wherep = (p 1 ;:::;p L ), p i > 0 for i = 1;:::;L and P L j=1 p j = 1. Alternatively, one can rewrite S r;M (p;A) = inf n 1 : max j p j R (j) r j ;n A ; (2.7) C M (p;A) = inf n 1 : max j p j V (j) n A : (2.8) In order to determine the asymptotic performance of these rules, we need the fol- lowing relationships, which state that asymptotically, as !1, under P (i) 0 the delay of S r;M and S r is essentially that of the single-population SR–r rule at populationi with thresholdA=p i . Same holds for CUSUM. Lemma1. For anyr 0, E (i) 0 [ S r (p;A)] = E (i) 0 [S r;M (p;A)] = log(A=p i ) +{ (i) c (i) r i I (i) +o(1); (2.9) E (i) 0 [ C(p;A)] = E (i) 0 [C M (p;A)] = log(A=p i ) +{ (i) (i) I (i) +o(1): (2.10) ADD (i) 1 [ S r (p;A)] = ADD (i) 1 [S r;M (p;A)] = log(A=p i ) +{ (i) c (i) 1 I (i) +o(1): (2.11) Proof. Recall the stopping time of one-sided SPRT W (i) (A) = inf n 1 :U (i) n A ; U (i) n = n X k=1 log (i) k : For the proof we will need limiting average overshoot { (i) = lim A!1 E (i) 0 h U (i) W (i) (A) A i : 18 Note that n U (i) n ; n 1 o is a random walk with mean E (i) 0 [U (i) 1 ] = I (i) and observe that the statistic logR (i) r i ;n can be represented in the form logR (i) r i ;n =U (i) n +G (i) r i ;n ; where G (i) r i ;n = log 1 +r i + n1 X k=1 e U (i) k ! : The sequenceG (i) r i ;n is slowly changing under P (i) 0 and converges in P (i) 0 distribution to a random variable G (i) r i = log 1 +r i + 1 X k=1 e U (i) k ! with expectation c (i) r i = E (i) 0 [G (i) r i ] = E (i) 0 " log 1 +r i + 1 X k=1 e U (i) k !# : Then for the minimum of SR–r procedures we get: S r;M (p;A) = inf n 1 : max j p j R (j) n;r j A = inf ( n 1 :p i R (i) n;r i max j p j R (j) n;r j p i R (i) n;r i ! A ) = inf ( n 1 : logR (i) n;r i + max j ( log p j R (j) n;r j p i R (i) n;r i !) log(A=p i ) ) = inf n n 1 :U (i) n +G (i) r i ;n + ~ Y (i) n log(A=p i ) o ; where ~ Y (i) n = max j ( log p j R (j) n;r j p i R (i) n;r i !) = max j6=i ( 0; log p j R (j) n;r j p i R (i) n;r i !) : 19 Note that ~ Y (i) n ! 0 as n!1 P (i) 0 -a.s, so the ~ Y (i) n ; n 1 are also slowly chang- ing under P (i) 0 . Thus Woodroofe’s nonlinear renewal theorem (see Woodroofe (1982)) applies and the result follows. Same arguments hold for CUSUM. Therefore E (i) 0 [S r;M (p;A)] = log(A=p i ) +{ (i) c (i) r i I (i) +o(1); E (i) 0 [C M (p;A)] = log(A=p i ) +{ (i) (i) I (i) +o(1): For the mixture of the SR–r procedures we get: S r (p;A) = inf ( n 1 :p i R (i) n;r i + X j6=i p j R (j) n;r j A ) = inf ( n 1 : logR (i) n;r i + log 1 + X j6=i p j R (j) n;r j p i R (i) n;r i ! log(A=p i ) ) = inf n n 1 :U (i) n +G (i) r i ;n + ^ Y (i) n log(A=p i ) o ; where ^ Y (i) n = log 1 + X j6=i p j R (j) n;r j p i R (i) n;r i ! : Since ^ Y (i) n ! 0 asn!1 P (i) 0 -a.s, the ^ Y (i) n ; n 1 are also slowly changing under P (i) 0 , and the result follows. Same arguments hold for CUSUM. Therefore E (i) 0 [ S r (p;A)] = log(A=p i ) +{ (i) c (i) r i I (i) +o(1); E (i) 0 [ C(p;A)] = log(A=p i ) +{ (i) (i) I (i) +o(1): We also need to compute, at least approximately, the expectation of each of these rules under P 1 , which is the content of the following lemma. 20 Lemma2. Forr either fixed orr(A)=A! 0 asA!1, E 1 [ S r (p;A)] = E 1 [S r;M (p;A)] = A(1 +o(1)) P L j=1 p j v (j) ; (2.12) E 1 [ C(p;A)] = E 1 [C M (p;A)] = A(1 +o(1)) P L j=1 p j I (j) (v (j) ) 2 : (2.13) Proof.S (i) r i (A) andC (i) (A) are exponential, asymptotically as ! 1(A ! 1) (see Pollak and Tartakovsky (2009b)). For minimum of the stopping times, min j S (j) r j (A=p j )E X j v (j) A=p j ! ) E 1 [min j S (j) r j ] = A(1 +o(1)) P j p j v (j) ; min j C (j) (A=p j )E X j I (j) v (j) 2 A=p j ! ) E 1 [min j C (j) ] = A(1 +o(1)) P j p j I (j) (v (j) ) 2 : To prove similar result for the mixture distribution, introduce the auxiliary stopping time and statistic (p;A) = inf n 1 : n A ; n (p) = L X j=1 p j e U (j) n ; and let E 0 denote expectation with respect to the mixture measure P j p j P (j) 0 . Then P 1 f(p;A)<1g = E 0 1 (p;A) (p) 1 l f(p;A)<1g = 1 A E 0 h e logAlog (p;A) (p) 1 l f(p;A)<1g i = 1 A L X j=1 p j E (j) 0 h e (log (p;A) (p)logA) 1 l f(p;A)<1g0 i : 21 For any 1iL, (p;A) = inf n 1 :U (i) n + logp i +Y (i) n logA ; where Y (i) n = log 1 + X j6=i (p j =p i )e U (j) n U (i) n ! : SinceY (i) n ! 0 asn!1 P (i) 0 -a.s, theY (i) n ; n 1 are also slowly changing under P (i) 0 , hence E (i) 0 h e (log (p;A) (p)logA) i =v (i) : Also, sinceU (i) n =n! I (i) P (i) 0 -a.s, sup n U (i) n =1 w.p.1) P (i) 0 f(p;A)<1g = 1. Therefore P 1 f(p;A)<1g! L X j=1 p j v (j) A asA!1: Finally, (see Tartakovsky and Veeravalli (2004); Pollak (1987)) E 1 [ S r ](p;A) 1 P 1 f(p;A)<1g asA!1: The result follows. Similar argument holds for CUSUM. 2.4 Performancemeasures Theorem1. Letr be either fixed orr(A)=A! 0 asA!1, such that SADD (i) (S r i ) = ADD 0 or ADD 1 . Then for any choice ofp 1 ;:::;p L , andT ? 2fS r;M ;C M ; S r ; Cg,T ? is second-order optimal with respect toJ (i) for anyi = 1;:::;L: inf T2( ) J (i) [T ]J (i) [T ? ] =O(1): (2.14) 22 Proof. The statement follows from (2.1) and (2.3) and Lemmas 1–2. On the other hand, we want to introduce a performance measure for our multi- population procedure. It is reasonable to follow the minimax approach, hence to design the schemes so that their performance is the same no matter where and when the change occurs. One possible generalization would be to put J [T ] = max j J (j) [T ] = max j SADD (j) [T ]: Then to equalize performance across populations, from Lemmas (1) and (2), log logp 1 I (1) +O(1) = = log logp L I (L) +O(1): But unless I (1) = = I (L) , such a choice ofp does not exist. For this reason, we put J [T ] = max j I (j) J (j) [T ] = max j I (j) SADD (j) [T ]: (2.15) Each scheme should satisfy the false alarm constraint with equality, thus E 1 [S r;M ] = E 1 [C M ] = E 1 [ S r ] = E 1 [ C] = : (2.16) Ifp is chosen according to (2.15), then from Lemma 1, keeping in mind that P p j = 1, one arrives at p i = exp { (i) c (i) r i P j exp { (j) c (j) r j for SR–r; (2.17) p i = exp { (i) (i) P j exp ({ (j) (j) ) for CUSUM; (2.18) 23 for 1iL. To obtainA, use (2.16) and Lemma 2: A = P j exp { (j) c (j) r j v (j) P j exp { (j) c (j) r j for SR–r; (2.19) A = P j exp { (j) (j) I (j) v (j) 2 P j exp ({ (j) (j) ) for CUSUM: (2.20) Thus ARL = (1 + o(1)) for each of the schemes, and for CUSUM and zero- initialized SR–r we get a unique solution forp;A. However,r provides forL additional parameters in SR–r scheme and there is still some flexibility in determining them. Theorem2. Letr be either fixed orr(A)=A! 0 asA!1, such that SADD (i) (S r i ) = ADD 0 or ADD 1 . Then ifp 1 ;:::;p L are chosen according to (2.17) or (2.18), andA according to (2.19) or (2.20), all ofT ? S 2fS r;M ; S r g, andT ? C 2fC M ; Cg are second- order uniformly optimal with respect toJ . Moreover, J [T ? S ] = log + log X j exp { (j) c (j) r j v (j) ! +o(1); (2.21) J [T ? C ] = log + log X j exp { (j) (j) I (j) v (j) 2 ! +o(1): (2.22) 24 Chapter3 NumericalResults The goal of this chapter is twofold. First, we obtain performance characteristics for the previously suggested procedures with high accuracy, and provide the means to choose the design parameters for stopping rules. Second, we see how fast the asymptotic expan- sions converge. 3.1 Integralequations We begin with notation and assumptions. First recall n , g(X n )=f(X n ), i.e., the “instantaneous” LR for the n-th data point, X n . For simplicity, 1 will be assumed absolutely continuous, although at an additional effort the case of strictly non-arithmetic 1 can be handled as well. LetP d (t), P d ( 1 6t),d =f0;1g,t> 0, be the cdf of the LR under the measure P d ,d =f0;1g. Also, denote K d (x;y), @ @y P d (R r n+1 6yjR r n =x) = @ @y P d y 1 +x ; d =f0;1g; (3.1) the transition probability density kernel for the (stationary) Markov processfR r n g n>0 . We now note that from the change-of-measure identitydP 0 (t) = tdP 1 (t),t> 0, mentioned earlier, and definition (3.1) one can readily deduce that (1 +x)K 0 (x;y) = yK 1 (x;y); cf. Polunchenko et al. (2013a, 2014b). This can be used, e.g., as a “short- cut” in deriving the formula for K 0 (x;y) from that for K 1 (x;y), or the other way around—whichever one of the two is found first. More importantly, as will be shown in Theorem 3 below, using (1 +x)K 0 (x;y) = yK 1 (x;y), one can “tie” ARL(S r=x A ) 25 and STADD(S r=x A ) to one another in such a way so that both can be computed simul- taneously, withK 0 (x;y) completely eliminated. This result will then be used to design our numerical method in the next subsection. Last but not least, as done in Polunchenko et al. (2013a, 2014b), we will also use this connection betweenK 0 (x;y) andK 1 (x;y) to improve the method’s accuracy and rate of convergence; see Subsection 3.2 below. We now state the first equation of interest. LetR r=x 0 =x> 0 be fixed. For notational brevity, from now on let `(x;A) , ARL(S r=x A ) , E 1 [S r=x A ]; we reiterate that this expectation is conditional onR r=x 0 =x. Using the fact thatfR r=x n g n>0 is Markovian, it can be shown that`(x;A) is governed by the equation `(x;A) = 1 + Z A 0 K 1 (x;y)`(y;A)dy; (3.2) cf. Moustakides et al. (2011). Next, introduce k (x;A), E k [(S r=x A k) + ],k> 0. Fork = 0 observe that 0 (x;A) = 1 + Z A 0 K 0 (x;y) 0 (y;A)dy; (3.3) which is an exact “copy” of equation (3.2) except that K 1 (x;y) is replaced with K 0 (x;y); cf. Moustakides et al. (2011). Fork> 1, sincefR r=x n g n>0 is Markovian, one can establish the recursion k+1 (x;A) = Z A 0 K 1 (x;y) k (y;A)dy; k> 0; (3.4) 26 with 0 (x;A) first found from equation (3.3); cf. Moustakides et al. (2011). Using this recursion one can generate the entire functional sequencef k (x;A)g k>0 by repetitive application of the linear integral operator K 1 u, [K 1 u](x), Z A 0 K 1 (x;y)u(y)dy; whereu(x) is assumed to be sufficiently smooth inside the interval [0;A]. Temporarily deferring formal discussion of this operator’s properties, note that using this operator notation, recursion (3.4) can be rewritten as k+1 =K 1 k ,k> 0, or equivalently, as k =K k 1 0 ,k> 0, where K k 1 u,K 1 K 1 | {z } k times u fork> 1; andK 0 1 is the identity operator from now on denoted asI, i.e.,K 0 1 u =Iu , u. Similarly, in the operator form, equation (3.2) can be rewritten as` = 1 +K 1 `, and equation (3.3) can be rewritten as 0 = 1 +K 0 0 . We now note that the sequencef k (x;A)g k>0 can be used to derive the equation for IADD(S r=x A ) defined by (1.15). Let (x;A), IADD(S r=x A ), and observe that , X k>0 k = X k>0 K k 1 0 = X k>0 K k 1 ! 0 = (IK 1 ) 1 0 ; (3.5) whence (x;A) = 0 (x;A) + Z A 0 K 1 (x;y) (y;A)dy; (3.6) 27 cf. Moustakides et al. (2011). The implicit use of the geometric series convergence theorem in (3.5) is justified by the fact that the spectral radius of the operatorK 1 is strictly less than 1; see, e.g., Moustakides et al. (2011). At this point, with equations for`(x;A), 0 (x;A), and for (x;A) obtained, one can compute STADD(S r=x A ) through STADD(S r=x A ) = [x 0 (x;A) + (x;A)]=[`(x;A) +x]; (3.7) for any x > 0; in particular, for x = 0 this gives STADD(S A ), i.e., the STADD for the original SR procedure (1.5)-(1.7). Thus, it may seem that the strategy to compute STADD(S r=x A ) is to first compute 0 (x;A) by solving equation (3.3), then use the obtained 0 (x;A) to compute (x;A) by solving equation (3.6), indepen- dently solve equation (3.2) to get `(x;A), and finally plug all these into (3.7) to get STADD(S r=x A ). Precisely this strategy was employed in Tartakovsky et al. (2009); Moustakides et al. (2009, 2011). However, we will now show that the computation of STADD(S r=x A ) can be made much simpler. Let (x;A), x 0 (x;A) + (x;A) so that STADD(S r=x A ) = (x;A)=[`(x;A) +x]. Theorem3. (x;A) = 1 +x + Z A 0 K 1 (x;y) (y;A)dy: (3.8) Proof. First, consider equation (3.3) and multiply it through by (1 +x) to obtain (1 +x) 0 (x;A) = 1 +x + Z A 0 (1 +x)K 0 (x;y) 0 (y;A)dy; 28 which using the change-of-measure identity (1+x)K 0 (x;y) =yK 1 (x;y) is equivalent to (1 +x) 0 (x;A) = 1 +x + Z A 0 K 1 (x;y)y 0 (y;A)dy: (3.9) Next, by adding R A 0 K 1 (x;y) (y;A)dy to both sides of (3.9), we obtain (1 +x) 0 (x;A) + Z A 0 K 1 (x;y) (y;A)dy = 1 +x + Z A 0 K 1 (x;y) (y;A)dy + Z A 0 K 1 (x;y)y 0 (y;A)dy; which after some algebra becomes 0 (x;A) + Z A 0 K 1 (x;y) (y;A)dy +x 0 (x;A) = 1+x + Z A 0 K 1 (x;y) [ (y;A) +y 0 (y;A)]dy: Finally, note that the expression in parentheses in the left-hand side above is the right-hand side of equation (3.6), i.e., it is equal to (x;A). Hence, recalling that (x;A) , (x;A) +x 0 (x;A), we arrive at the desired equation for (x;A), i.e., at equation (3.8). Using Theorem 3, i.e., equation (3.8), one can compute`(x;A) and (x;A) simul- taneously and without having to compute 0 (x;A) and (x;A) at all. Specifically, since (3.2) and (3.8) can be rewritten, respectively, as (IK 1 )` = 1 and (IK 1 ) = 1 +x in the operator form, one can see that both have the same integral operator in the left-hand side, and the right-hand side of either is completely known and does not require any preliminary evaluation. Thus, to evaluate the ARL(S r=x A ) and STADD(S r=x A ) one is effectively to solve two equations (IK 1 )u = 1 and (IK 1 )u =x, which can be 29 done simultaneously. This is an improvement over the method proposed and used earlier in Tartakovsky et al. (2009); Moustakides et al. (2009, 2011). It is also an extension of the method proposed recently in Polunchenko et al. (2014a); see also, e.g., Polunchenko et al. (2013a, 2014b). Combined, equations (3.2) and (3.8) form a “complete package” to compute any of the desired performance characteristics of the GSR procedure. The question to be considered next is that of computing these characteristics in practice. 3.2 Thenumericalmethodanditsaccuracyanalysis We now turn to the question of solving the main equations—(3.2) and (3.8)—presented in the preceding subsection. To this end, following Tartakovsky et al. (2009); Mous- takides et al. (2009, 2011); Polunchenko et al. (2013a, 2014b), observe first that both equations are renewal-type equations of the form u(x) =(x) + Z A 0 K 1 (x;y)u(y)dy; (3.10) where (x) is a given (known) function, K 1 (x;y) is as in (3.1), and u(x) is the unknown; note that whileu(x) does depend on the upper limit of integration, A > 0, for notational simplicity, we will no longer emphasize that, and use the notation u(x) instead ofu(x;A). To see that equation (3.10) is an “umbrella” equation for equations (3.2) and (3.8), observe that, e.g., to obtain equation (3.2) on the ARL to false alarm, it suffices to set (x) 1 for anyx2R. Similarly, choosing(x) = 1 +x will yield equation (3.8). Thus, any method to solve (3.10) for a given(x) can be applied to solve (3.2) and (3.8) as well. The problem however, is that (3.10) is a Fredholm integral equation of the 30 second kind, and such equations seldom allow for an analytical solution. Hence, a numerical approach is in order and the aim of this subsection is to present one. We first set the underlying space for the problem. LetX = C[0;A] be the space of continuous functions over the interval [0;A]. EquipX with the usual uniform norm kuk 1 , max x2[0;A] ju(x)j. We will assume thatP 1 (t) and the unknown functionu(x) are both continuous and well-behaved, i.e., both are differentiable as far as necessary. Under these assumptionsK 1 is a bounded linear operator fromX intoX , equipped with the usual L 1 -norm: kK 1 k 1 , sup x2[0;A] Z A 0 jK 1 (x;y)jdy: It can be shown Moustakides et al. (2011) thatkK 1 k 1 < 1. Thus, one can apply the Fredholm alternative (Atkinson and Han, 2009, Theorem 2.8.10) to deduce that (IK 1 ) 1 is a bounded operator, and subsequently conclude that (3.10) does have a solution and it is unique for any given(x). To solve (3.10) we propose to use the collocation method (Atkinson and Han, 2009, Section 12.1.1). The idea of this method is to first approximate the sought function, u(x), as u N (x) = N X j=1 u j;N j (x); N > 1; (3.11) wherefu j;N g 16j6N are constant coefficients to be determined, andf j (x)g 16j6N are suitably chosen (known) basis functions. For any such basis and any givenfu j;N g 16j6N , substitution ofu N (x) into the equation will yield a residualr N ,u N K 1 u N . Unless the true solution u(x) itself is a linear combination of the basis functions f j (x)g 16j6N , no choice of the coefficientsfu j;N g 16j6N will make the residual identi- cally zero uniformly at allx2 [0;A]. However, by requiringr N (x) to be zero at some 31 fz j g 16j6N , where z j 2 [0;A] for all j = 1; 2;:::;N, one can achieve a certain level of proximity of the residual to zero. These points,fz j g 16j6N , are called the collocation nodes, and their choice is discussed below. As a result, we obtain the following system ofN algebraic equations on the coefficientsu j;N u N = +K 1 u N ; (3.12) whereu N , [u 1;N ;:::;u N;N ] > , , [(z 1 );:::;(z N )] > , andK 1 is a matrix of size N-by-N whose (i;j)-th element is as follows: (K 1 ) i;j , Z A 0 K 1 (z i ;y) j (y)dy; 16i;j6N: (3.13) For the system of linear equations (3.12) to have one and only one solution, the func- tionsf j (x)g 16j6N need to form a basis in the appropriate functional space, i.e., in par- ticular,f j (x)g 16j6N need to be linearly independent; the necessary and sufficient con- dition thatf j (x)g 16j6N are to satisfy is det[ j (z i )]6= 0. Asf j (x)g 16j6N is a basis, expansion (3.11) is equivalent to acting on the sought function,u(x), by an interpolatory projection operator, N , that projectsu(x) onto the span off j (x)g 16j6N . This operator is defined as N u, P N j=1 u j;N j (x) withk N k 1 , max 06x6A P N j=1 j j (x)j> 1. By design, the described method is most accurate at the collocation nodes, fz j g 16j6N , since it is at these points that the residual is zero. For an arbitrary point x62fz j g 16j6N , the unknown function,u(x), can be evaluated as e u N (x) =(x) + Z A 0 K 1 (x;y)u N (y)dy =(x) + N X j=1 u j;N Z A 0 K 1 (x;y) j (y)dy: (3.14) 32 This technique is known as the iterated projection solution; see, e.g., (Atkinson and Han, 2009, Section 12.3); note thate u N (z j ) =u N (z j ) =u j;N , 16j6N. We now consider the question of the method’s accuracy and rate of convergence. To that end, it is apparent that the choice off j (x)g 16j6N must play a critical role. This is, in fact, the case, as may be concluded from, e.g., (Atkinson and Han, 2009, Theorem 12.1.12, p. 479). Specifically, usingkue u N k 1 as a sensible measure of the method’s error, and applying (Atkinson and Han, 2009, Formula 12.3.21, p. 499), we obtain kue u N k 1 6k(IK 1 ) 1 k 1 kK 1 (I N )uk 1 6k(IK 1 ) 1 k 1 kK 1 k 1 k(I N )uk 1 ; (3.15) whence one can see that the method’s error is determined byk(IK 1 ) 1 k 1 and by k(I N )uk 1 ; the latter is the interpolation error and can be found for each particular choice of N , which requires choosing the basisf j (x)g 16j6N and the collocation nodes fz j g 16j6N . The bigger problem, therefore, is to upperboundk(IK 1 ) 1 k 1 . To that end, the standard result k(IK 1 ) 1 k 1 6 1 1kK 1 k 1 is applicable, sincekK 1 k 1 < 1. However, it is well-known that this is often a very crude inequality, and it may not be practical to use it in (3.15) to upperboundku e u N k 1 . Since in our particular caseK 1 is the transition probability kernel of a stationary Markov process, a tighter (in fact, exact) upperbound onk(IK 1 ) 1 k 1 is possible to obtain. We now state the corresponding result first established in (Polunchenko et al., 2014b, Lemma 3.1); see also Polunchenko et al. (2013a). Lemma3.k (IK 1 ) 1 k 1 =k`k 1 . 33 With this lemma one can upperboundku e u N k 1 rather tightly. Specifically, from (3.15),kK 1 k 1 < 1, and Lemma 3, we obtainkue u N k 1 <k`k 1 k(I N )uk 1 , where the inequality is strict becausekK 1 k 1 is strictly less than 1. The only question now is the interpolation errork(I N )uk 1 , which is determined by the choice of N . To that end, for reasons to be explained below, we propose to seek the solution, u(x), within the piecewise linear polynomial space. Specifically, given a positive inte- ger N > 2, let N : 0 , x 0 < x 1 < ::: < x N1 , A denote a partition of the interval [0;A], and forj = 1;:::;N 1 setI N j , (x j1 ;x j ),h j ,x j x j1 (> 0), and h,h(N) = max 16j6N1 h j ; assume also thath! 0, asN!1. Next, setz j =x j1 , 16j6N and choose the basisf j (x)g 16j6N of the “hat” functions j (x) = 8 > > > > > > < > > > > > > : xx j2 h j1 ; ifx2I N j1 ;j > 1; x j x h j ; ifx2I N j ;j <N; 0; otherwise; (3.16) where 16j6N; cf. Polunchenko et al. (2013a, 2014b). E.g., in a two-population case with uniform partition this approach would result in a piecewise-defined surface of the form ~ (x;y) = 1 2 8 > > > > > > > > < > > > > > > > > : jyj + 1; ifjyj>jxj;jxj +jyj> 1; jxj + 1; ifjyj<jxj;jxj +jyj> 1; jxj 2jyj + 2; ifjyj>jxj;jxj +jyj< 1; 2jxjjyj + 2; ifjyj<jxj;jxj +jyj< 1, shifted and scaled accordingly, which is illustrated in Figure 3.1. For this choice of the functional basisf j (x)g 16j6N it is known (Atkinson and Han, 2009, Formula 3.2.9, p. 124) thatk(I N ) uk 1 6 ku xx k 1 h 2 =8, where u xx , @ 2 u(x)=@x 2 . Hence, the method’s rate of convergence is quadratic and 34 (a) In 1D. (b) In 2D. Figure 3.1. Lagrange linear interpolation polynomials. ku ~ u N k <k`k 1 ku xx k 1 h 2 =8. This result can now be “tailored” to the equations of interest, namely, to equations (3.2) and (3.8). Theorem4. GivenN > 2 sufficiently large k` e ` N k 1 <k`k 1 k` xx k 1 h 2 8 ; where` xx , @ 2 @x 2 `(x;A); note that the inequality is strict. Theorem5. GivenN > 2 sufficiently large k e N k 1 <k`k 1 k xx k 1 h 2 8 ; where xx , @ 2 @x 2 (x;A); note that the inequality is strict. The error bound given in Theorem 4 was first obtained in (Polunchenko et al., 2014b, Theorem 3.1). Note that it is proportional to the magnitude of the solution, i.e., to `(x;A) , ARL(S r=x A ), which can be large. Worse yet, the error bound is also pro- portional to the detection threshold squared lurking in numerator ofh 2 (for simplicity assume thath =A=N). Since`(x;A)A=x with2 (0; 1), one can roughly set A`(x;A) and conclude that the error bound is roughly proportional to` 3 (x;A), i.e., 35 to the magnitude of the solution cubed. This may seem to drastically offset the second power ofN buried in the denominator ofh 2 . However, as was already argued and con- firmed experimentally in Polunchenko et al. (2013a, 2014b), this does not happen. The reason is the (almost) linearity of`(x;A) , ARL(S r=x A ) with respect to the headstart r =x, as evident from the approximation (1.13). Specifically,`(x;A)A=x, and therefore @ 2 @x 2 `(x;A) 0; at least forx2 [0;A]: This makes the error bound given in Theorem 4 extremely close to zero, even for rel- atively small N. Consequently, `(x;A) can be computed rather accurately without requiring N to be large. This is one of the reasons to use the above piecewise linear basis (3.16). The error bound given in Theorem 5 is not as close to zero because, unlike`(x;A), the the function (x;A) is not linear inx. Nevertheless, as shown in Polunchenko et al. (2014b), the method’s accuracy and robustness for (x;A) are substantially better than those of the method proposed in Tartakovsky et al. (2009); Moustakides et al. (2009, 2011). There is one more role that the change-of-measure identity, (1 + x)K 0 (x;y) = yK 1 (x;y), plays in the proposed method: it is used to compute the matrix (3.13), which is needed to implement the method. Specifically, due to the change-of-measure identity, 36 the integrals involved in (3.13) can be computed exactly: using (3.13) and (3.16), and recalling thatz j =x j1 , 16j6N, the corresponding formula is as follows: (K 1 ) i;j = Z A 0 K 1 (x i ;y) j (y)dy = 1 h j1 (1 +x i ) P 0 x j1 1 +x i P 0 x j2 1 +x i x j2 P 1 x j1 1 +x i P 1 x j2 1 +x i 1 l fj>1g + 1 h j x j P 1 x j 1 +x i P 1 x j1 1 +x i (1 +x i ) P 0 x j 1 +x i P 0 x j1 1 +x i 1 l fj<Ng (3.17) for 16i;j6N; cf. Polunchenko et al. (2013a, 2014b). To wrap this subsection, note that the proposed method is a numerical framework that can also be used to assess the accuracy of the popular Markov chain approach, introduced in Brook and Evans (1972), and later extended, e.g., in Woodall (1983). To this end, as noted in Champ and Rigdon (1991), the Markov chain approach is equiv- alent to the integral-equations approach if the integral is approximated via the product midpoint rule. This, in turn, is equivalent to choosing the basis functions,f j (x)g 16j6N , as piecewise constants on N+1 , i.e., j (x) = 1 l fx2I N+1 j g , and equating the residual to zero at the midpoints of the intervalsI N+1 j , i.e., settingz j = (x j1 +x j )=2, 16j6N. In this case the (i;j)-th element of the matrixK defined by (3.13) is (K 1 ) i;j =P 1 x j 1 +z i P 1 x j1 1 +z i ; 16i;j6N; cf. Tartakovsky et al. (2009); Moustakides et al. (2009). It can be shown (see, e.g., Kryloff and Bogol ˘ iubov (1929) or (Kantorovich and Krylov, 1958, pp. 130–135)) 37 that this approach exhibits a superconvergence effect: the rate is also quadratic, even though the interpolation is based on polynomials of degree zero (i.e., constants, or step functions). However, in spite of the superconvergence and the much simpler matrixK, the constant in front ofh 2 in the corresponding error bound is large (larger than that for the “hat” functions). As a result, the partition size required by this method ends up being substantial. In fact, this method was employed, e.g., in Tartakovsky et al. (2009); Mous- takides et al. (2009), to compare the CUSUM chart and the original SR procedure, and the partition size used consisted of thousands of points to ensure reasonable accuracy. In Section 3.4 we will offer the comparison of this method and the proposed method for `(x;A) and STADD(S r A ), and confirm that the new method is superior; cf. Polunchenko et al. (2013a, 2014b). 3.3 Extensiontomultiplepopulations In order to generalize the results presented above, first notice that our algorithms are a particular case of a general recursive setup. To obtain integral equations in question, define the following vectors V n = (V 1 n ;V 2 n ;:::;V L n ) > ; (x) = ( 1 (x 1 ); 2 (x 2 );:::; (L) (x (L) )) > ; and similarly n . Introduce recursively V x n+1 = (V x n ) n+1 ; V x 0 =x: (3.18) 38 Note that SR and CUSUM statistics fall into this category with(x) = 1 +x for SR and(x) = max (1;x) for CUSUM. To generalize the stopping times, introduce T x A = inffn> 1: d(V x n ;A)> 0g; whered is the decision function. Both minimum and mixture of SR and CUSUM rules are covered, with d(R n ;A) = max j R j n A j ; d(R n ;A) = L X j=1 p j R j n A for SR and similarly for CUSUM. From now on all integration is assumed to be over the setfy :d(y;A)6 0g. To facilitate the exploration, noting the Markov nature of (3.18) we shall assume that the n has a density. That is, that K(x;y) = @ L @y 1 @y 2 @y L P (V n+1 6yjV n =x) = @ L @y 1 @y 2 @y L P ((x) n+1 6y) is a well-defined mappingR L R L 7!R + . Let (Kf)(x) denote (Kf)(x), Z K(x;y)f(y)dy: 39 For k = 0; 1; 2;::: define k (x) P(T x > k). Observe that 0 (x) = 1 for any x2R L . Using the fact that each component of the vector seriesfV n g n>1 is a Markov process one readily obtains fork = 0; 1; 2;::: k+1 (x) = Z K(x;y) k (y)dy; 0 (x) = 1: Let `(x) = E[T x ]. Since E[T x ] = P k>0 P(T x > k), it follows that `(x) = P k>0 k (x). The equation is X k>0 k+1 (x) = X k>0 " Z K(x;y) k (y)dy # = Z K(x;y) " X k>0 k (y) # dy; whence X k>0 k (x) = 0 (x) + Z K(x;y) " X k>0 k (y) # dy; and therefore, recalling that 0 (x) = 1 for anyx2R L , one arrives at `(x) = 1 + Z K(x;y)`(y)dy: The equations can now be adopted to ` 1 (x) = 1 + Z K 1 (x;y)` 1 (y)dy; i 0 (x) = 1 + Z K i 0 (x;y) i 0 (y)dy; where the sequence i k (x) E i k [(T x k) + ] can be acquired as i k+1 (x) = Z K 1 (x;y) i k (y)dy; 40 Finally, after linearizing the system the matrixK will take the form K = ^ K 1 ^ K 2 ^ K L ; where denotes the Kronecker product, and ^ K k are the single-population matrices (3.17). This formulation provides a generalization of the results presented in the Section 3.2, thus allowing for a precise numerical evaluation of the operational characteristics of a wide range of multi-population procedures. We now proceed to the next section where we compare the asymptotic expansions obtained in (2.21) and (2.22) with the numerical results. 3.4 Casestudy Consider first a single-population model where both the pre- and post-change observa- tions are i.i.d. unit-variance Gaussian random variables, having expected values zero and> 0, respectively. Formally, f(x) =N (0; 1) and g(x) =N (; 1): Under these conditionsL(X n ) =X n 2 2 , I = 2 =2, and one obtains K 1 (x;y) = 1 y p 2 2 exp ( 1 2 2 log y 1 +x + 2 2 2 ) 1 l fy=(1+x)>0g : As (1 +x)K 0 (x;y) = yK 1 (x;y), one could quickly obtain the formula forK 0 (x;y). Consequently, one can use (3.17) to find the matrixK required to implement the pro- posed method. 41 We now employ the proposed numerical method and its predecessor offered and applied in Tartakovsky et al. (2009); Moustakides et al. (2009, 2011) to evaluate the performance of the GSR procedure (1.11)-(1.12) for the Gaussian scenario. Our intent is to assess and compare the quality of each of the two methods. To test the accuracy of the method, we will perform our analysis for = 0:1; 0:5 and a wide range of values of the ARL to false alarm: = 10 2 ; 10 3 and 10 4 . To measure the accuracy and rate of convergence of either of the two methods we will rely on the standard Richardson extrapolation technique: ifu 2N ,u N andu N=2 are the solutions (of the corresponding integral equation) obtained assuming the partition size is 2N,N andN=2, respectively, then the rate of convergence,c, can be estimated as 2 c ku 2N u N k 1 ku N u N=2 k 1 so thatc log 2 ku 2N u N k 1 ku N u N=2 k 1 ; and the actual error,kuu N k 1 , can be estimated askuu N k 1 2 c ku N u N=2 k 1 . The first set of numerical results from the convergence analysis is summarized in Tables 3.1 and 3.2. Specifically, each table is for a specific value of, and reports the estimated convergence rate for various values of the ARL to false alarm for two methods: one proposed in this work and one proposed in the work of Moustakides et al. (2011). For the latter, the corresponding results are shown in brackets; in particular, [NaN] indicates that the method failed. We observe that, as expected, the rate of convergence of our method is quadratic. More importantly, it is attained almost right away, regardless of the magnitude of the change or the value of the ARL to false alarm. The reason is the use of the linear interpolation to approximate the solution of the corresponding integral equation, which is almost linear with respect to the headstart. Hence, one would expect our numerical scheme to work well even for smallN. The results presented in the tables 42 = 10 2 (A = 94:34) = 10 3 (A = 943:41) = 10 4 (A = 9434:08) N ARL(S A ) Rate ARL(S A ) Rate ARL(S A ) Rate 4 102.56 -7.4 1,022.0 -7.7 10,217 -7.7 [ 2:9 10 9 ] [8.4] [ 3:9 10 11 ] [2.5] [ 6:7 10 11 ] [-0.2] 8 100.72 2.2 1,004.7 2.2 10,044 2.2 [ 3:3 10 7 ] [13.2] [ 2:2 10 11 ] [0.8] [ 6:4 10 11 ] [-1.5] 16 100.39 2.0 1,001.4 2.0 10,011 2.0 [ 4:1 10 4 ] [7.5] [ 6:9 10 10 ] [2.0] [ 5:9 10 11 ] [0.2] 32 100.31 2.0 1,000.6 2.0 10,003 2.0 [128.35] [2.0] [ 8 10 9 ] [5.1] [ 4:9 10 11 ] [3.8] 64 100.29 2.0 1,000.4 2.0 10,001 2.0 [99.87] [-3.7] [ 1:8 10 8 ] [10.4] [ 3:2 10 11 ] [-1.5] 128 100.29 2.0 1,000.3 2.0 10,000 2.0 [100.24] [3.2] [ 4:3 10 5 ] [14.6] [ 1:3 10 11 ] [1.0] 256 100.28 2.0 1,000.3 2.0 10,000 2.0 [100.31] [2.3] [1,249.2] [3.1] [ 2 10 10 ] [3.8] 512 100.28 2.0 1,000.3 2.0 10,000 2.0 [100.28] [9.5] [1,000.1] [-5.6] [ 7:6 10 8 ] [8.3] 1024 100.28 2.0 1,000.3 2.0 10,000 2.0 [100.28] [-1.4] [1,000.1] [1.3] [ 3:8 10 6 ] [15.5] 2048 100.28 2.0 1,000.3 2.0 10,000 2.0 [100.28] [2.0] [1,000.2] [1.9] [12,725] [9.5] Table 3.1. Convergence analysis for ARL(S A ) assuming = 0:1. do confirm that. On the other hand, the method of Moustakides et al. (2011) can be seen to converge much slower, requiringN to be in the thousands or higher to deliver decent accuracy. Tables 3.3 and 3.4 present the convergence analysis of STADD. From the presented results one can conclude that: a) as expected, the convergence rate of the new method is, in fact, quadratic and b) it is achieved much quicker than for the predecessor method, for a broad range of values of the ARL to false alarm and change magnitudes. Hence, the method is not only more accurate, but is also more robust. This will, in fact, allow one to use the numerical scheme in practice for the multi-population case, even though the dimensions of the matrix grow exponentially in the number of populations. 43 = 10 2 (A = 74:76) = 10 3 (A = 747:62) = 10 4 (A = 7476:15) N ARL(S A ) Rate ARL(S A ) Rate ARL(S A ) Rate 4 102.85 2.2 1,024.8 2.2 10,244 2.2 [70.30] [0.2] [109.89] [-1.0] [115.50] [-1.1] 8 100.96 2.2 1,005.6 2.2 10,052 2.2 [96.63] [1.8] [225.94] [-0.8] [251.15] [-1.0] 16 100.57 2.0 1,001.7 2.0 10,013 2.0 [104.13] [2.9] [422.19] [-0.6] [523.63] [-1.0] 32 100.48 2.0 1,000.8 2.0 10,004 2.0 [101.46] [2.0] [692.12] [0.1] [1,053.3] [-0.9] 64 100.45 2.0 1,000.5 2.0 10,001 2.0 [100.53] [6.1] [942.84] [1.4] [2,037.3] [-0.8] 128 100.45 2.0 1,000.5 2.0 10,001 2.0 [100.45] [1.8] [1,039.6] [3.7] [3,721.6] [-0.5] 256 100.45 2.0 1,000.5 2.0 10,000 2.0 [100.45] [3.9] [1,016.1] [0.7] [6,184.9] [-0.1] 512 100.45 2.0 1,000.5 2.0 10,000 2.0 [100.44] [0.6] [1,001.1] [4.7] [8,824.8] [0.8] 1024 100.44 2.0 1,000.5 2.0 10,000 2.0 [100.44] [2.0] [1,000.4] [3.0] [10,291] [7.0] 2048 100.44 2.0 1,000.5 2.0 10,000 2.0 [100.44] [2.0] [1,000.5] [5.8] [10,257] [-2.7] Table 3.2. Convergence analysis for ARL(S A ) assuming = 0:5. = 10 2 (A = 94:34) = 10 3 (A = 943:41) = 10 4 (A = 9; 434:08) N STADD(S A ) Rate STADD(S A ) Rate STADD(S A ) Rate 4 19.577 0.7 28.500 -0.7 29.768 -0.9 [12.793] [1.0] [118.93] [1.0] [1,180.3] [1.0] 8 30.140 0.7 68.832 -0.3 79.312 -0.8 [6.8964] [1.0] [59.963] [1.0] [590.63] [1.0] 16 36.535 1.3 118.834 0.2 166.90 -0.5 [4.0398] [-3.3] [30.482] [1.0] [295.82] [1.0] 32 39.173 1.9 161.894 1.0 289.73 0.1 [31.825] [1.7] [15.741] [1.0] [148.41] [1.0] 64 39.897 2.0 184.161 1.7 407.65 0.7 [40.236] [6.5] [8.3714] [1.2] [74.704] [1.0] 128 40.078 2.0 191.094 1.9 479.69 1.4 [40.142] [2.4] [5.1205] [-5.5] [37.852] [1.0] 256 40.124 2.0 192.897 2.0 506.54 1.9 [40.125] [0.4] [155.33] [2.0] [19.43] [1.0] 512 40.135 2.0 193.352 2.0 513.93 2.0 [40.138] [4.4] [193.48] [10.1] [10.220] [1.5] 1024 40.138 2.0 193.466 2.0 515.82 2.0 [40.139] [1.7] [193.51] [2.8] [6.9412] [-6.9] 2048 40.139 2.0 193.495 2.0 516.29 2.0 [40.139] [2.0] [193.51] [-0.2] [406.31] [1.9] Table 3.3. Results of accuracy and convergence analysis for STADD(S A ) for = 0:1. 44 = 10 2 (A = 74:76) = 10 3 (A = 747:62) = 10 4 (A = 7; 476:15) N STADD(S A ) Rate STADD(S A ) Rate STADD(S A ) Rate 4 6.6326 0.1 7.3581 -0.5 7.4395 -0.6 [15.927] [1.1] [134.21] [0.7] [1,318.0] [0.7] 8 10.374 1.3 14.043 -0.1 14.582 -0.3 [12.588] [2.3] [83.783] [0.7] [799.79] [0.7] 16 11.927 1.9 21.062 0.7 23.508 -0.1 [11.903] [0.8] [53.684] [0.8] [480.68] [0.7] 32 12.344 2.0 25.409 1.6 32.767 0.3 [12.290] [1.2] [36.660] [1.0] [286.83] [0.8] 64 12.451 2.0 26.839 1.9 40.101 1.1 [12.455] [2.7] [28.386] [1.9] [171.62] [0.8] 128 12.477 2.0 27.222 2.0 43.541 1.8 [12.480] [2.5] [26.137] [1.7] [105.06] [0.9] 256 12.484 2.0 27.320 2.0 44.543 1.9 [12.485] [2.1] [26.839] [0.6] [68.252] [1.0] 512 12.486 2.0 27.344 2.0 44.805 2.0 [12.486] [2.0] [27.303] [3.5] [49.863] [1.5] 1024 12.486 2.0 27.350 2.0 44.871 2.0 [12.486] [2.0] [27.345] [2.9] [43.375] [4.5] 2048 12.486 2.0 27.352 2.0 44.888 2.0 [12.486] [2.0] [27.350] [2.0] [43.656] [-2.0] Table 3.4. Results of accuracy and convergence analysis for STADD(S A ) for = 0:5. Consider further a two-population Gaussian problem with post-change parameters (1:0; 1:4). The single-population model will serve as a benchmark against the multi- population models. Specifically, it is assumed that ARL =f10 2 ; 10 3 ; 10 4 g, i.e. high, moderate and low false alarm rate. Convergence analysis of ARL forS M andC M is presented in Tables 3.5 and 3.6, respectively. As can be seen from the tables, for theS M procedure the method hits the actual value of ARL almost immediately, even for high , which plays a crucial part in designing the method. On the other hand, the method, as expected, does not offer that much of an improvement for theC M procedure, which does not posses the martingale property of the GSR procedure. Tables 3.7 and 3.8 report the efficiency (ratio of performance measures) of the multi- population procedures as compared to the single-population rules tuned on the specific population, as well as how fast the asymptotic expansions suggested above start to work, for different values of ARL = . 45 = 10 2 (A 1 = 165;A 2 = 68) = 10 3 (A 1 = 1745;A 2 = 716) = 10 4 (A 1 = 16492;A 2 = 6763) N ARL(S M ) Rate ARL(S M ) Rate ARL(S M ) Rate 64 103.448 - 1065.17 - 10039.2 - [50.302] [-] [76.49] [-] [80.2] [-] 121 103.349 - 1064.42 - 10030.7 - [66.009] [-] [126.78] [-] [137.9] [-] 256 103.304 1.14 1063.97 0.74 10024.6 0.48 [81.900] [-0.02] [227.07] [-1.00] [267.2] [-1.17] 529 103.275 0.60 1063.31 -0.52 10014.7 -0.66 [91.793] [0.64] [379.61] [-0.57] [513.7] [-0.88] 1024 103.257 0.69 1063.02 1.19 10008.8 0.75 [96.845] [0.97] [557.72] [-0.22] [928.9] [-0.75] 2025 103.248 1.03 1062.92 1.59 10006.5 1.40 [99.790] [0.80] [741.38] [-0.05] [1675.7] [-0.87] 4096 103.242 0.58 1062.89 1.74 10005.8 1.72 [101.464] [0.81] [883.46] [0.37] [2926.4] [-0.74] 8281 103.240 1.56 1062.87 0.58 10005.6 1.78 [102.342] [0.92] [967.02] [0.75] [4661.9] [-0.47] Table 3.5. Convergence analysis for ARL(S M ) assuming = (1:0; 1:4). 10 2 10 3 10 4 0 2 4 6 8 10 12 14 16 18 ADD vs. log(ARL) for SR, 1 st population, θ = 1.0 γ, ARL ADD Multipopulation Asymptotic Approximation Single−channel Performance Single−channel Asymptotics 10 2 10 3 10 4 0 2 4 6 8 10 12 14 16 18 ADD vs. log(ARL) for SR, 2 nd population, θ = 1.4 γ, ARL ADD Multipopulation Asymptotic Approximation Single−channel Performance Single−channel Asymptotics Figure 3.2. Performance of the Shyriaev-Roberts minimum rule. Figures 3.2 and 3.3 suggest that asymptotic expansions work quite well, with error less than 5% achieved at ARL as low as 100. This allows to use the proposed expansions for the choice of design parameters for high levels of false alarms, when the numerical methods are not available, e.g. for the CUSUM procedure, or as initial guess for the 46 = 10 2 (A 1 = 48;A 2 = 29) = 10 3 (A 1 = 511;A 2 = 306) = 10 4 (A 1 = 4834;A 2 = 2894) N ARL(C M ) Rate ARL(C M ) Rate ARL(C M ) Rate 64 65.031 - 427.95 - 3751.4 - [73.318] [-] [80.68] [-] [80.7] [-] 121 82.106 - 457.59 - 3778.7 - [80.634] [-] [139.27] [-] [139.3] [-] 256 91.493 0.86 523.97 -1.16 3850.7 -1.40 [87.983] [-0.01] [272.62] [-1.19] [272.7] [-1.19] 529 93.239 2.28 641.45 -0.77 3997.1 -0.96 [92.675] [0.61] [534.80] [-0.92] [535.2] [-0.92] 1024 94.920 0.05 811.58 -0.53 4259.9 -0.84 [95.462] [0.75] [798.37] [-0.01] [1002.9] [-0.83] 2025 96.243 0.36 967.56 0.13 4765.5 -0.98 [95.717] [3.56] [859.51] [2.18] [1937.6] [-1.03] 4096 96.682 1.59 998.97 2.31 5702.9 -0.89 [96.651] [-1.87] [961.65] [-0.74] [3855.1] [-1.04] 8281 96.959 0.65 1034.54 -0.18 7249.0 -0.71 [96.824] [2.39] [1001.31] [1.34] [6946.1] [-0.68] Table 3.6. Convergence analysis for ARL(C M ) assuming = (1:0; 1:4). = 1:0 = 1:4 Perf. Ratio Dev., % SADD (1) Single SADD (2) Single = 1:0 = 1:4 10 2 8.62 6.69 4.41 4.05 1.29 1.09 2.2 10 3 16.34 11.15 8.46 6.35 1.47 1.33 0.4 10 4 17.89 15.73 9.10 8.69 1.14 1.05 <0.1 Table 3.7. Performance comparison for the minimum of SR procedures. = 1:0 = 1:4 Perf. Ratio Dev., % SADD (1) Single SADD (2) Single = 1:0 = 1:4 10 2 7.93 6.11 4.14 3.80 1.30 1.09 2.4 10 3 12.80 10.60 6.55 6.12 1.21 1.07 0.3 10 4 17.33 15.10 8.84 8.43 1.15 1.05 <0.1 Table 3.8. Performance comparison for the minimum of CUSUM procedures. Shyriaev-Roberts procedure, which may be then evaluated numerically with great pre- cision. 47 10 2 10 3 10 4 0 2 4 6 8 10 12 14 16 18 ADD vs. log(ARL) for CUSUM, 1 st population, θ = 1.0 γ, ARL ADD Multipopulation Asymptotic Approximation Single−channel Performance Single−channel Asymptotics 10 2 10 3 10 4 0 2 4 6 8 10 12 14 16 18 ADD vs. log(ARL) for CUSUM, 2 nd population, θ = 1.4 γ, ARL ADD Multipopulation Asymptotic Approximation Single−channel Performance Single−channel Asymptotics Figure 3.3. Performance of the CUSUM minimum rule. 48 Chapter4 UnstructuredDecentralized Change-PointDetection Suppose some area is monitored by an array of sensors that transmit their observations to a fusion center. The goal of the fusion center is to combine received data in order to quickly detect a change in the environment, such as the presence of an intruder. However, the bandwidth constraints that characterize such sensor networks limit the amount of information that the fusion center can receive. Specifically, we assume that the sensors need to communicate with the fusion center at a much smaller rate than the rate at which they collect their observations. Additionally, we assume that the sen- sors can transmit a small number of bits per communication. The sequential change detection problem under such constraints has been addressed in a number of papers, in which it is typically assumed that the change affects all sensors. An alternative formu- lation assumes that the change affects only one sensor, which however is not known in advance; this case has been studied in Chapter 2. This assumption gives rise to one-shot schemes, which require that each sensor communicate at most once with the fusion cen- ter. The goal of this chapter is to propose decentralized sequential detection rule that requires minimal transmission activity from the sensors and is asymptotically optimal for any scenario regarding the unknown subset of affected sensors. 49 4.1 Problemformulation LetX k 1 ;X k 2 ; be the sequence of observations atk-th sensor. For every timet > 1 andk2f1;:::;Kg, we setF k t := (X k 1 ;:::;X k t ), that is,F k t represents the history of observations at sensor k up to time t. We assume that the sensor observations are independent both spatially and temporally. In other words, the filtrationsfF k t g t>1 and fF j t g t>1 are independent for anyk6= j, whereas the random variablesX k t andX k s are independent for anyk andt6= s. Furthermore, we setF t :=_ K k=1 F k t for everyt, i.e., fF t g is the global filtration in the network. We assume that initially the situation is under control in all sensors, by which we mean that the observations of each sensor k are independent over time with common density f k with respect to some dominating measure k . However, at some unknown time 2 [0;1], this common density changes for all sensors that belong to a subset Nf1;:::;Kg, so thatX k t f k ;t6, and X k t 8 > > < > > : g k ; k2N; f k ; k = 2N; t>: We denote byP 1 the distribution of (X 1 t ;:::;X K t ) t>1 when the change never occurs in any subset of sensors ( =1), in which caseX k t f k (x) for allt andk. On the other hand, we denote byP N the distribution of (X 1 t ;:::;X K t ) t>1 when the change occurs at time in subsetN . When, in particular, = 0,X k t g k for allt andk2N . For eachk, we assume that the densitiesf k andg k have the same support and that the Kullback-Leibler information number is finite, i.e., I k = Z log g k (x) f k (x) g k (x) k (dx)<1: 50 Furthermore, we set Z k t =Z k t1 + log g k f k (X k t ); Z N t = X k2N Z k t ; t> 1; withZ k 0 = 0, i.e.,Z k t is the log-likelihood ratio process betweenP k 0 andP 1 givenF k t and, due to the assumption of independence across sensors,Z t is the log-likelihood ratio process betweenP N andP 1 givenF t . A sequential change detection rule is defined, in general, as anF t -adapted stopping timeT , at which the fusion center raises an alarm declaring that the change has occurred. Since it is not possible to minimize the expected detection delay for any change point , a standard approach in the literature of quickest detection is to consider a minimax criterion of optimality instead. In this chapter we will consider a minimax performance measure introduced by Lorden (1971), according to which the performance of a detec- tion rule is measured by J N [T ] = sup >0 ess sup E N [(T) + jF ]; that is the worst (with respect to the change point) conditional expected delay given the worst possible history of observations up to the change point. Specifically, Lorden considered the following constrained optimization problem: find inf T2C J N [T ]; where C =fT : E 1 [T ] g; (4.1) 51 i.e. suggested the minimization ofJ N among stopping times for which the rate of false alarms is smaller than a given bound 1= . Moreover, Lorden (1971) also showed that as !1 inf T2C J N [T ] log I N ; I N = X k2N I k (4.2) where byxy we mean thatx=y! 1, and that this asymptotic performance is attained by Page’s CUSUM procedure, S N (a) = inf t 1 :Y N t a ; where Y N t = max 0st (Z N t Z N s ); (4.3) when thresholda is selected as 1 . The proof of the latter asymptotic optimality was based on the observation that the CUSUM testS N results from repeated sampling of the one-sided sequential testT N (a) = infft> 1 : Z N t ag. Later, Moustakides (1986) reduced the non-standard optimization problem (4.1) to a tractable optimal stopping problem and showed that the CUSUM test is not only optimal in an asymptotic sense, but for any given , whena is chosen so that E 1 [S N (a)] = . The implementation of the CUSUM test in practice in sensor networks is constrained by at least two important factors. The first is that it requires knowledge of the subset of affected sensors, which is typically not known in advance. The second factor is related to the nature of communication in sensors networks, which does not allow the sensors to transmit their exact values to the fusion center. Therefore, even if the subset of affected sensors is completely known in advance, sayN , the corresponding CUSUM test S N may not be implementable in practice either because the sensors must communicate with the fusion center at a given (low) rate, or because a certain number of bits is per- missible per transmission. Such communication constraints have stimulated the problem 52 of decentralized sequential change detection, where the problem is to find a sequential detection rule that requires minimal communication requirements from the sensors. This line of research has focused on how to minimize inflicted performance loss due to such communication constraints. Therefore, in this context it is typically taken for granted that the subset of affected sensors is known in advance. It is the main goals of this work to propose a decentralized sequential change detection rule with minimal communication requirements that is nevertheless characterized by theoretical optimality properties when there is uncertainty regarding the subset of affected sensors. 4.2 Thecompletelyunstructuredcase When there is absolutely no prior information regarding the subset of affected sensors, it would be natural to generalize the definition of the detection ruleS in (4.3) by con- sidering the minimum over all possible affected subsets, which leads to ^ S = min N S N (a N ) = inf n t> 1 : max N (Y N t =a N ) 1 o : It is clear that this detection rule cannot be implemented under the communication con- straints that apply in a sensor network framework. However, even if we ignore such constraints, it should be emphasized that one has to minimize at any time t a number of statistics that grows exponentially inK. In order to overcome this problem, several alternatives have been considered in the literature. One was suggested by Mei (2011): T 1 = inf ( t 1 : K X k=1 Y (k) t 1 l fY (k) t b (k) g a ) : (4.4) 53 Another was suggested by Siegmund and Xie (2013), who focused on the case of Gaus- sian observations with unknown post-change mean, but transferring these ideas into our setup leads to sequential change detection rule of the form T 2 = inf ( t 1 : max 06s6t K X k=1 Z k t Z k s + a ) ; (4.5) where in practice one has to consider a window limited version that replaces the maxi- mization over 06s6t with (tm 1 ) + 6s6tm 0 . However, neither of these two schemes take into account communication constraints we described above. In this work, we consider a decentralized detection rule of the form (4.5), where each log-likelihood ratio statisticZ k has been replaced by an approx- imation ~ Z k that requires minimal communication from the sensors. We show that the resulting detection rule is asymptotically optimal for every possible scenario regarding the affected subset. Moreover, based on extensive numerical experiments, we show that it outperforms (4.4) even if the induced rate of communication is low and a small number of bits is transmitted by each sensor per communication. In what follows, we assume that eachZ k 1 has a finite second moment underP k 0 , i.e., E k 1 [(Z k 1 ) 2 ] = Z log f k 1 (x) f k 0 (x) 2 f k 1 (x) k (dx)<1; (4.6) for every 1kK. We propose a sequential detection rule of the form ~ S = inf ( t 1 : max 06s6t K X k=1 ~ Z k t ~ Z k s + a ) ; (4.7) where each ~ Z k t is an approximation ofZ k t that requires minimal communication from the sensors, and nevertheless leads to an asymptotically optimal sequential detection rule. 54 Our method for establishing this asymptotic optimality will hinge on the observation that ~ S results from repeated sampling of the one-sided sequential test ~ T = inf ( t> 1 : K X k=1 ( ~ Z k t ) + a ) ; and Theorem 1, pg. 1899 in Lorden (1971), which we reformulate below in our setup. Theorem 6 (Lorden). IfP 1 ( ~ T <1) 1 for every and E N 0 [ ~ T ] (log )=I N as !1, then E 1 [ ~ S(a )] and ~ S is first-order asymptotically optimal, in the sense that for every subsetN we haveJ N [ ~ S] (log )=I N as !1. Based on this result it follows that the asymptotic optimality of the above detection rule will be guaranteed as long as the statistics ~ Z k , 1kK and the thresholda are selected so thatP 1 ( ~ T <1) 1 for every and E N 0 [ ~ T ] (log )=I N as !1. The following result provides an upper bound on the induced false alarm rate, which is rather crude, yet sufficient for our purposes. In order to state it, we set F (x) := 1e x K1 X j=0 x j j! ; (4.8) i.e.,F (x) is the cdf of the Erlang distribution with parameters 1 andK. Theorem7. LetM k t := max 0st Z k s , and suppose that for some constantsb k > 0; 16 k6K, for everyt ~ Z k t M k t +b k for 16k6K: ThenP 1 ( ~ T (a)<1) 1F (a P b k ). 55 Proof. For anya andt> 1 we have P 1 ( ~ T (a)t) =P 1 max 0st K X k=1 ( ~ Z k s ) + a P 1 K X k=1 M k t a K X k=1 b k : Now, for any givenk andt, it is clear that P 1 (M k t b) =P 1 (T k (b)t)P 1 (T k (b)<1); where T k (b) := infft > 1 : Z k t bg, and from Wald’s likelihood ratio identity it follows that P 1 (T k (b)<1) = E k 0 h e Z k T k (b) i e b ; where E k 0 is expectation with respect to P k 0 . The last two relationships imply that, for any givenk andt, P 1 (M k t b) e b , which means that the random variableM k t is stochastically dominated by an exponential random variable with rate 1. Since, due to the assumed independence across sensors, M 1 t ;:::;M K t are independent, this implies that P K k=1 M k t is stochastically dominated by an Erlang random variable with parame- ters 1 andK, i.e., P 1 K X k=1 M k t a K X k=1 b k 1F (a K X k=1 b k ); whereF (x) is defined in (4.8). The asymptotic optimality of the sequential detection rule (4.5) is now a direct con- sequence of the two previous theorems. 56 Corollary 1. If a is selected so that 1F (a) = 1 , then T 2 , as defined in (4.5), is asymptotically optimal. Proof. It is clear that (4.5) is of the form (4.7) with ~ Z t =Z t . Therefore, Theorem 7 with b k = 0 implies that E 1 [T 2 ] . Moreover, it is clear that, in this case, the one-sided sequential test ~ T coincides with the one-sided SPRT, which is asymptotically optimal; cf. Wald (1947). 4.2.1 Alevel-triggeredcommunicationscheme In order to construct a decentralized detection rule that demands minimal transmission activity from the sensors, we will assume that each sensor k communicates with the fusion center only at a sequence of random times ( k n ). Specifically, similarly to Fel- louris and Moustakides (2013), we set k n = inf n t> k n1 :Z k t Z k k n1 = 2 ( k ; k ) o ; where k and k are positive constants which clearly determine the rate of communi- cation at sensork. That is, we suggest that at each timet sensork communicates with the fusion center only if the current value of the local log-likelihood ratioZ k t has either increased by k or decreased by k relative to its value at the most recent communica- tion time, k (t) := k m k t , wherem k t := maxfn 1 : k n tg represents the number of transmitted messages up to timet. ExpressingZ k k (t) as a telescopic sum of the form Z k k (t) = m k t X n=1 (Z k k n Z k k n1 ) =` k 1 + +` k m k t ; (4.9) 57 shows that at each time k n sensork simply needs to transmit the value of` k n := Z k k n Z k k n1 , the accumulated log-likelihood ratio between k n1 and k n . This communication scheme is illustrated in Figure 4.1. Δ k Δ k 0 τ k τ k Δ k Δ k Δ k Δ k Overshoot Overshoot Figure 4.1. Random sampling scheme. Corollary 2. Ifa is selected so that 1F (a) = 1 and ~ Z according to (4.9), then ~ S is asymptotically optimal, even if k := maxf k ; k g!1 so that k =o(log ). Proof. It suffices to observe that Z k k (t) max 0st Z k s and thatjZ k k (t) Z k t j maxf k ; k g. 4.2.2 Quantization Selecting ~ Z t according to (4.9) requires that each sensor transmit the exact value` k n at each time k n . If we further assume that each sensork has the ability to transmitd bits per communication, where d may not necessarily be large, we also need to design an appropriate quantization rule. Indeed, similarly to Fellouris and Moustakides (2013) we suggest that sensork transmit at time k n the following message to the fusion center z k n = 8 > < > : j; if k j1 ` k n k < k j j; if k j1 <` k n + k k j 58 forj = 1;:::;d, where k and k are the percentiles of the overshoot of` k n over k and k , respectively. In this case, we select ~ Z k t as follows: ~ Z k t = ~ ` k 1 + ~ ` k 2 + + ~ ` k mt ; (4.10) where ~ ` k n = 8 > < > : k +R k j ifz k n =j k +R k j ifz k n =j ; j = 1;:::;d and R k j = log E 0 h e (` k n k k j1 ) jz k n =j i ; R k j = log E 1 h e ` k n + k + k j1 jz k n =j i : (4.11) In order to show that the resulting detection rule is asymptotically optimal, we need to introduce some notation. To this end, we set k n the random, non-negative overshoot associated with then th transmission from sensork wheneverz k n > 0. Moreover, we set C := max k2N sup k >0 E k 0 [ k 1 ]; which, because of condition (4.6), is a finite quantity for any givenf k ;k2Ng and an O(1) term as k !1 for everyk2N . We also let k !1 so that k =o(j logj) for every 1kK, so that each sensor does not communicate with the fusion center very frequently. In what follows, we denote byO( max ) a term that is bounded above when divided by max as min !1, where min := min 1kK k ; max := max 1kK k : 59 Theorem8. For any > 1 ifa is selected so that 1F (aR) = 1 ; R = K X k=1 max 16j6d fR k j ;R k j g; and ~ Z according to (4.10)–(4.11), then E 1 [ ~ S] and ~ S is asymptotically optimal as long as min !1 so that max =o(log ) for every 1kK. 4.3 Performanceevaluationandcomparison We consider the case of 20 sensors, 8 of which are affected, with both pre- and post- change distributions normal with unit variance, with pre-change mean 0 and post-change mean 0:5. Using Monte-Carlo simulations, we compare T 1 as defined in (4.4) to the decentralized procedure ~ S as defined in (4.7) and (4.10). In what follows windows size for ~ S is set to m 1 = 50, and number of bits to 4. Comparison results when expected time to transmission = 2 are shown in Figure 4.2 and summarized in Table 4.1. For frequent transmissions, ~ S enjoys 25% increase in performance compared toT 1 . 10 2 10 3 10 4 0 5 10 15 20 25 ARL ADD 0 T1 ˜ S Figure 4.2. Procedure comparison for frequent communications ( = 2). 60 = 500 = 5000 a ARL ADD 0 a ARL ADD 0 T 1 21:4 507 15:3 26:05 5020 19:9 ~ T 20:7 507 15:6 25:35 4990 20:3 ~ S 8:82 498 11:0 11:65 5022 14:7 Table 4.1. Procedure comparison for frequent communications ( = 2). Simulation results suggest that for as few as 4 bits per message, ~ S procedures behaves consistently better thanT 1 . When the expected time to transmission is further increased to = 10, the differ- ence between the procedures becomes smaller, as can be seen in Figure 4.3 and Table 4.2, even though the latter uses as few as 4 bits per transmission. For infrequent trans- missions ~ S does not offer such a significant advantage (only about 5%). 10 2 10 3 10 4 0 5 10 15 20 25 ARL ADD 0 T1 ˜ S Figure 4.3. Procedure comparison for infrequent communications ( = 10). 61 = 500 = 5000 a ARL ADD 0 a ARL ADD 0 T 1 18:0 507 15:3 24:0 4950 19:9 ~ T 14:0 499 16:5 19:4 5080 21:1 ~ S 7:82 501 14:2 10:84 5000 17:8 Table 4.2. Procedure comparison for infrequent communications ( = 10). To conclude, we suggest using the ~ S as defined in (4.7) and (4.10), which consis- tently outperformsT 1 for a broad range ofp and. 62 Chapter5 Applications We now consider the problem of efficient anomaly detection in computer network traf- fic. A multi-cyclic setting of quickest change detection is a natural fit for this prob- lem. We propose a score-based multi-cyclic detection algorithm in two flavours: clas- sical single-population and multi-population alternative. The algorithm is based on the Shiryaev–Roberts procedure, which apart from being easy to implement and compu- tationally inexpensive, has appealing optimality properties; specifically, it is exactly optimal in the multi-cyclic setting. It is therefore expected that an intrusion detection algorithm based on the Shiryaev–Roberts procedure will perform well in practice, which is confirmed experimentally for real traces. 5.1 Cybersecurity Designing automated and efficient techniques for rapid detection of computer network anomalies (e.g., due to intrusions) is an important problem in cybersecurity Ellis and Speed (2001). Many existing anomaly-based Intrusion Detection Systems (IDS-s) oper- ate by applying the machinery of statistics to comb through the passing traffic looking for a deviation from the traffic’s normal profile Thatte et al. (2011, 2008); Tartakovsky et al. (2006c,a, 2005a). By way of example, the Sequential Probability Ratio Test (SPRT) Wald (1947), the Cumulative Sum (CUSUM) chart Page (1954), and the Expo- nentially Weighted Moving Average (EWMA) inspection scheme Roberts (1959) are the 63 de facto “workhorse” of the community. Though practically unknown in the cyberse- curity community, the SR procedure has been recently proposed for anomaly detection Tartakovsky et al. (2013). As network anomalies typically take place at unknown points in time and entail changes in the traffic’s statistical properties, it is intuitively appealing to formulate the problem of computer network anomaly detection as that of a quickest change-point detection: to detect changes in the statistical profile of network traffic as rapidly as possible, while maintaining a tolerable level of the risk of making a false detection. It is common that in practice neither pre- nor post-anomaly distributions are known. As a result, traffic’s pre- and post-anomaly profile is poorly understood, and one can no longer rely on the likelihood ratios. Hence, an alternative approach is required. Let us first consider a typical behavior of the SR statistics. Under the normal regime, the detec- tion statistic has a negative drift, i.e., E 1 [L n ] < 0, and a positive drift in an anomaly situation, i.e., E [L n ]> 0, <n. Consider now the following score-based modification of the SR procedure ~ R n = (1 + ~ R n1 )e Sn ; n> 1; ~ R 0 = 0 with the corresponding stopping time being ~ S A = minfn> 1: ~ R n >Ag; where A > 0 is an a priori chosen detection threshold, and S n (X 1 ;:::;X n ) are the selected score functions. Clearly, so long as E 1 [S n (X 1 ;:::;X n )]< 0 and E [S n (X 1 ;:::;X n )]> 0; 64 for all> 0, the SR detection procedure designed using such score functions in place of the likelihood ratio will work, though they will not be optimal anymore. Score functions S n can be chosen in a number of ways and each particular choice depends crucially on the expected type of change. In the applications of interest, the detection problem can be usually reduced to detecting changes in mean values along with variances (mean and variance shifts). Furthermore, define a two-population alternative ~ S M;A = minf ~ S (1) A 1 ; ~ S (2) A 2 g; where ~ S (i) A i ;i = 1; 2 are independent score-based procedures running in parallel in two populations. Let 1 = E 1 [X n ]; 2 1 = Var 1 [X n ], and = E 0 [X n ]; 2 = Var 0 [X n ] denote the pre- and post-anomaly mean values and variances, respectively. Write Y n = (X n 1 )= 1 for the centered and scaled observation at timen. In the real-world applications the pre-change parameters 1 and 2 1 are estimated from the training data and periodically re-estimated due to the non-stationarity of network traffic. We suggest the scoreS n of the linear-quadratic form S n (Y n ) =C 1 Y n +C 2 Y 2 n C 3 ; (5.1) where C 1 , C 2 and C 3 are positive design numbers assuming for concreteness that the change leads to an increase in both mean and variance. In the case where the variance either does not change or changes relatively insignificantly compared to the change in mean, the coefficientC 2 may be set to zero. In the opposite case where the mean changes only slightly compared to the variance, we take C 1 = 0. The first case appears to be typical for many cybersecurity applications, for example for ICMP and UDP Denial- of-Service (DoS) attacks (see Tartakovsky et al. (2006c,a) where the linear score-based 65 CUSUM has been proposed). However, in certain cases both the mean and variance change quite significantly. Note that the score given by (5.1) with C 1 =q 2 ; C 2 = (1q 2 )=2; C 3 = 2 q 2 =2 logq; (5.2) whereq = 1 =, = ( 1 )= 1 , is optimal if pre- and post-change distributions are Gaussian with known putative values and 2 . This is true because in the latter caseS n is the log-likelihood ratio. If one believes in the Gaussian model (which sometimes is the case), then selectingq =q 0 and = 0 with some design valuesq 0 and 0 provides reasonable operating characteristics forq<q 0 and> 0 and optimal characteristics for q =q 0 and = 0 . However, it is important to emphasize that the proposed score-based SR procedure does not assume that the observations have Gaussian pre- and post-change distributions. 5.2 Acasestudy We now present the results of testing the proposed detection algorithms on a real Dis- tributed DoS (DDoS) attack, namely, SYN flood attack. The aim of this attack is to con- gest the victim’s link with a series of SYN requests so as to have the victim’s machine exhaust all of its resources and stop responding to legitimate traffic. This kind of an attack clearly creates a volume-type anomaly in the victim’s traffic flow. The data is courtesy of the Los Angeles Network Data Exchange and Repository (LANDER) project (see http://www.isi.edu/ant/lander). Specifically, the trace is the subset of flow data captured by Merit Network Inc. (seehttp://www.merit.edu). The attack is on a University of Michigan IRC server. It starts at roughly 550 seconds 66 into the trace and has a duration of 10 minutes. Figure 5.1(a) shows the aggregate num- ber of attempted connections or the connections birth rate as a function of time. In order to implement a multi-population change-point detection procedure we partition the sys- tem into two bins based on the target IP; the birth rate in each population is shown in Figure 5.1(b). 100 200 300 400 500 600 700 800 0 50 100 150 200 250 300 350 400 450 (a) Aggregate. 100 200 300 400 500 600 700 800 0 100 200 300 400 100 200 300 400 500 600 700 800 0 100 200 300 (b) Two-population. Figure 5.1. SYN flood attack: number of attempted connections. We used the number of connections during 20 msec batches as the observations X n . We estimated the connections birth rate average and variance for legitimate traffic and for attack traffic; in both cases, to estimate the average we used the usual sample mean, and to estimate the variance we used the usual sample variance. For legitimate traffic, the average is about 1 = 66:9 and 40:7 connections per 20 msec for the first and second population, respectively, and the standard deviation is in the neighborhood of 1 = 17:4 and 9:7 connections per 20 msec. For attack traffic, the numbers are = 68:7 and 51:0, and = 15:0 and 13:8, respectively. We can now see the effect of the attack: it leads to a considerable increase in the mean and standard deviation of the connections birth rate in the affected population. We now perform a basic statistical analysis of the connections birth rate distribu- tion. Figure 5.2 shows the empirical densities of the aggregate connections birth rate for legitimate and attack traffic. It so happens that for given data, traffic traffic appears to 67 0 50 100 150 200 250 0 200 400 600 800 1000 1200 (a) Legitimate (pre-attack) traffic. 0 50 100 150 200 250 0 50 100 150 200 250 300 (b) Attack traffic. Figure 5.2. SYN flood attack: connections birth rate pdf with a Gaussian fit. resemble the Gaussian process. However, for legitimate traffic, the distribution is not as close to Gaussian. We have implemented the score-based single-population and multi-population SR procedures with the linear-quadratic score (5.1). When choosing the design parameters we assume the Gaussian model for pre-attack traffic, which agrees with the conclu- sions drawn above following the basic statistical analysis of the data. Thus, parameters C 1 ;C 2 , andC 3 are chosen according to formulas (5.2) withq 0 = q 0:9 and to allow for detection of fainter attacks 0 0:6 (versus the estimated attack value 1:1). We set the detection thresholds so as to ensure the same level of ARL at approximately 500 samples (i.e., 10 sec) for both single- and multi-population procedures. The thresh- olds are estimated using Monte Carlo simulations assuming the empirical pre-change distribution learned from the data. Specifically, we took 10 5 samples from the empirical pre-change distribution and simulated the behavior of the respective detection statistics and procedures while adjusting the thresholds until observing the desired ARL. The detection process is illustrated in Figure 5.3 and Figure 5.4. Figure 5.3 shows a relatively long run (taking into account the sampling rate 20 msec) of the SR statistic with several false alarms and then the true detection of the attack with a very small detection delay (at the expense of raising many false alarms along the way). Recall that 68 100 200 300 400 500 10 0 10 2 10 4 10 6 10 8 Figure 5.3. SYN flood attack: long run of the SR procedure; logarithm of the SR statistic vs time. 540 542 544 546 548 550 10 0 10 2 10 4 10 6 10 8 (a) By the SR procedure 540 542 544 546 548 550 10 0 10 5 10 10 540 542 544 546 548 550 10 0 10 5 10 10 (b) By the CUSUM procedure. Figure 5.4. Detection of the SYN flood attack by the single- and multi-population pro- cedures. the whole idea of this work is to set the detection thresholds low enough in order to detect attacks very quickly with minimal delays, which unavoidably leads to multiple false alarms prior to the attack starts. These false alarms should be filtered by a specially designed algorithm, as has been suggested in, e.g., Pollak and Tartakovsky (2009a). Figure 5.4(a) shows the behavior of the logarithm of the SR statistic shortly prior to the attack and right after the attack starts till its detection, which happens when the statistic crosses the threshold. Figure 5.4(b) shows the same for the mult-population SR statistic. We see that both procedures successfully detect the attack with very small delays, though at the expense of raising false alarms along the way, as shown in Fig- ure 5.3 and discussed above. For both procedures we observed approximately 4 false 69 alarms per 1000 samples. The detection delay for the repeated single-population SR procedure is roughly 0:103 seconds (or 5 samples), and for the multi-population proce- dure the delay is about 0:041 seconds (or 2 samples). Thus, the multi-population SR procedure seems to be a better alternative. 5.3 Conclusion We addressed the problem of rapid anomaly detection in computer network traffic. Approaching the problem statistically, namely, as that of sequential change-point detec- tion, we proposed a new anomaly detection method. The method is based on the multi- cyclic (repeated) Shiryaev–Roberts detection procedure where the likelihood ratio is replaced with the linear-quadratic score. This is done because in real-world network security applications both pre-attack and post-attack distributions are different from hypothesized distributions such as Gaussian or Poisson. Like many change-point detec- tion schemes, our method is also of practically no computational complexity and easy to implement. However, what distinguishes the SR procedure is its exact multi-cyclic optimality in a simple change detection problem where densities are known, a property that such techniques as the SPRT, the CUSUM chart, or the EWMA scheme lack. We further enhanced this approach, and demonstrated that the score-based multi-population SR algorithm outperformed the standard SR procedure in the multi-cyclic setting. 70 ReferenceList Kendall Atkinson and Weimin Han. Theoretical Numerical Analysis: A Functional Analysis Framework, volume 39 of Texts in Applied Mathematics. Springer, third edition, 2009. ISBN 978-1-4419-0457-7. doi: 10.1007/978-1-4419-0458-4n 12. Mich` ele Basseville and Igor V . Nikiforov. Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs, 1993. ISBN 0-13-126780-9. D. Brook and D. A. Evans. An approach to the probability distribution of CUSUM run length. Biometrika, 59(3):539–549, 1972. doi: 10.1093/biomet/59.3.539. Charles W. Champ and Steven E. Rigdon. A a comparison of the Markov chain and the integral equation approaches for evaluating the run length distribution of quality control charts. Communications in Statistics – Simulation and Computation, 20(1): 191–204, 1991. doi: 10.1080/03610919108812948. Acheson J. Duncan. The economic design of ¯ x charts used to maintain current control of a process. Journal of the American Statistical Association, 51(274):228–242, June 1956. doi: 10.1080/01621459.1956.10501322. Juanita Ellis and Tim Speed. The Internet Security Guidebook: From Planning to Deployment. Academic Press, 2001. Eugene A. Feinberg and Albert N. Shiryaev. Quickest detection of drift change for Brownian motion in generalized Bayesian and minimax settings. Statistics & Deci- sions, 24(4):445–470, 2006. doi: 10.1524/stnd.2006.24.4.445. G. Fellouris and G.V . Moustakides. Decentralized sequential change detection. submit- ted, 2013. Meyer Abraham Girschick and Herman Rubin. A Bayes approach to a quality control model. Annals of Mathematical Statistics, 23(1):114–125, 1952. doi: 10.1214/aoms/ 1177729489. Chuanching Ho and Kenneth Case. Economic design of control charts: A literature review for 1981-1991. Journal of Quality Technology, 26(1):39–53, 1994. 71 Leonid V . Kantorovich and Vladimir I. Krylov. Approximate Methods of Higher Analy- sis. Interscience Publishers, third edition, 1958. Translated from Russian into English by Curtis D. Benster. Ron S. Kenett and Shelemyahu Zacks. Modern Industrial Statistics: Design and Control of Quality and Reliability. Duxbury Press, first edition, 1998. Sven Knoth. Frontiers in Statistical Quality Control 8, chapter The Art of Evaluating Monitoring Schemes – How to Measure the Performance of Control Charts?, pages 74–99. Physica-Verlag HD, 2006. doi: 10.1007/3-7908-1687-6 5. N. Kryloff and N. Bogol ˘ iubov. La solution approch´ ee de probl` eme de Dirichlet. In Comptes Rendus de l’Acad´ emie des Sciences de l’URSS, volume 11, pages 283–288, May 1929. Gary Lorden. Procedures for reacting to a change in distribution. Annals of Mathemat- ical Statistics, 42(6):1897–1908, 1971. doi: 10.1214/aoms/1177693055. Thomas J. Lorenzen and Lonnie C. Vance. The economic design of control charts: A unified approach. Technometrics, 28(1):3–10, February 1986. James M. Lucas and Ronald B. Crosier. Fast initial response for CUSUM quality-control schemes: Give your CUSUM a head start. Technometrics, 24(3):199–205, August 1982. doi: 10.2307/1271440. Y . Mei. Quickest detection in censoring sensor networks. In IEEE International Sym- posium on Information Theory, pages 2148–2152, 2011. Douglas C. Montgomery. The economic design of control charets: a review and litera- ture survey. Journal of Quality Technology, 12(2):75–87, April 1980. Douglas C. Montgomery. Introduction to Statistical Quality Control. Wiley, seventh edition, 2012. George V . Moustakides. Optimal stopping times for detecting changes in distributions. Annals of Statistics, 14(4):1379–1387, 1986. George V . Moustakides, Aleksey S. Polunchenko, and Alexander G. Tartakovsky. Numerical comparison of CUSUM and Shiryaev–Roberts procedures for detecting changes in distributions. Communications in Statistics – Theory and Methods, 38 (16):3225–3239, 2009. doi: 10.1080/03610920902947774. George V . Moustakides, Aleksey S. Polunchenko, and Alexander G. Tartakovsky. A numerical approach to performance analysis of quickest change-point detection pro- cedures. Statistica Sinica, 21(2):571–596, April 2011. 72 Ewan S. Page. Continuous inspection schemes. Biometrika, 41(1):100–115, 1954. doi: 10.1093/biomet/41.1-2.100. Moshe Pollak. Optimal detection of a change in distribution. Annals of Statistics, 13(1): 206–227, 1985. doi: 10.1214/aos/1176346587. Moshe Pollak. Average run lengths of an optimal method of detecting a change in dis- tribution. Annals of Statistics, 15(2):749–779, 1987. doi: 10.1214/aos/1176350373. Moshe Pollak and Alexander G. Tartakovsky. Optimality properties of the Shiryaev– Roberts procedure. Statistica Sinica, 19(4):1729–1739, 2009a. Moshe Pollak and Alexander G. Tartakovsky. Asymptotic exponentiality of the distri- bution of first exit times for a class of markov processes with applications to quickest change detection. Theory of Probability and Its Applications, 53(3):430–442, 2009b. Aleksey S. Polunchenko and Alexander G. Tartakovsky. On optimality of the Shiryaev– Roberts procedure for detecting a change in distribution. Annals of Statistics, 38(6): 3445–3457, 2010. doi: 10.1214/09-AOS775. Aleksey S. Polunchenko and Alexander G. Tartakovsky. State-of-the-art in sequential change-point detection. Methodology and Computing in Applied Probability, 14(3): 649–684, 2012. doi: 10.1007/s11009-011-9256-5. Aleksey S. Polunchenko, Alexander G. Tartakovsky, and Nitis Mukhopadhyay. Nearly optimal change-point detection with an application to cybersecurity. Sequential Anal- ysis, 31(3):409–435, 2012. doi: 10.1080/07474946.2012.694351. Aleksey S. Polunchenko, Grigory Sokolov, and Wenyu Du. On efficient and reliable performance evaluation of the Generalized Shiryaev–Roberts change-point detection procedure. In Proceedings of the 56-th Moscow Institute of Physics and Technology Annual Scientific Conference, Moscow, Russia, November 2013a. Aleksey S. Polunchenko, Grigory Sokolov, and Alexander G. Tartakovsky. Optimal design and analysis of the Exponentially Weighted Moving Average chart for expo- nential data. Sri Lankan Journal of Applied Statistics, 2013b. (accepted). Aleksey S. Polunchenko, Grigory Sokolov, and Wenyu Du. Efficient performance eval- uation of the generalized shiryaevroberts detection procedure in a multi-cyclic setup. Applied Stochastic Models in Business and Industry, 2014a. (accepted). Aleksey S. Polunchenko, Grigory Sokolov, and Wenyu Du. An accurate method for determining the pre change Run Length distribution of the Generalized Shiryaev– Roberts detection procedure. Sequential Analysis, 33(1):1–23, 2014b. doi: 10.1080/ 07474946.2014.856642. 73 H. Vincent Poor and Olympia Hadjiliadis. Quickest Detection. Cambridge University Press, 2008. S.W. Roberts. Control chart tests based on geometric moving averages. Technometrics, 1(3):239–250, August 1959. doi: 10.1080/00401706.1959.10489860. S.W. Roberts. A comparison of some control chart procedures. Technometrics, 8(3): 411–430, August 1966. Thomas P. Ryan. Statistical Methods for Quality Improvement. Wiley, third edition, 2011. doi: 10.1002/9781118058114. Walter A. Shewhart. The application of statistics as an aid in maintaining quality of a manufactured product. Journal of the American Statistical Association, 20(152): 546–548, 1925. doi: 10.1080/01621459.1925.10502930. Walter Andrew Shewhart. Economic Control of Quality of Manufactured Product. Bell Telephone Laboratories series. D. Van Nostrand Company, Inc., Princeton, NJ, 1931. Albert N. Shiryaev. The problem of the most rapid detection of a disturbance in a stationary process. Soviet Math. Dokl., 2:795–799, 1961. Translation from Dokl. Akad. Nauk SSSR 138:1039–1042, 1961. Albert N. Shiryaev. On optimum methods in quickest detection problems. Theory of Probability and Its Applications, 8(1):22–46, January 1963. doi: 10.1137/1108002. Albert N. Shiryaev. Optimal Stopping Rules. Springer-Verlag, New York, 1978. Albert N. Shiryaev. Probability, volume 95 of Graduate Texts in Mathematics. Springer- Verlag, New York, second edition, 1995. Albert N. Shiryaev. Mathematical Finance – Bachelier Congress 2000, chapter Quick- est Detection Problems in the Technical Analysis of the Financial Data, pages 487–521. Springer Finance. Springer Berlin Heidelberg, 2002. doi: 10.1007/ 978-3-662-12429-1 22. Albert N. Shiryaev and Pavel Y . Zryumov. Optimality and Risk – Modern Trends in Mathematical Finance, chapter On the Linear and Nonlinear Generalized Bayesian Disorder Problem (Discrete Time Case), pages 227–236. Springer Berlin Heidelberg, 2010. doi: 10.1007/978-3-642-02608-9 12. D. Siegmund and Y . Xie. Sequential multi-sensor change-point detection. The Annals of Statistics, 41(2):670–692, 2013. David Siegmund. Sequential Analysis: Tests and Confidence Intervals. Springer Series in Statistics. Springer-Verlag, New York, 1985. ISBN 0387961348. doi: 10.1007/ 978-1-4757-1862-1. 74 M.S. Srivastava and Yanhong Wu. Comparison of EWMA, CUSUM and Shiryayev– Roberts procedures for detecting a shift in the mean. Annals of Statistics, 21(2): 645–670, 1993. doi: 10.1214/aos/1176349142. Alexander G. Tartakovsky and George V . Moustakides. State-of-the-art in Bayesian changepoint detection. Sequential Analysis, 29(2):125–145, April 2010. doi: 10. 1080/07474941003740997. Alexander G. Tartakovsky and Aleksey S. Polunchenko. Minimax optimality of the Shiryaev–Roberts procedure. In Proceedings of the 5th International Workshop on Applied Probability, Universidad Carlos III of Madrid, Spain, July 2010. Alexander G. Tartakovsky and Venugopal V . Veeravalli. Change-point Detection in Multichannel and Distributed Systems with Applications, chapter 17, pages 339–370. Marcel Dekker, Inc., New York, 2004. Alexander G. Tartakovsky, Boris L. Rozovskii, and Khoshboo Shah. A nonparametric multichart CUSUM test for rapid intrusion detection. In Proceedings of the 2005 Joint Statistical Meetings, Minneapolis, MN, August 2005a. Alexander G. Tartakovsky, Boris L. Rozovskii, and Khoshboo Shah. A nonparametric multichart CUSUM test for rapid intrusion detection. In Proceedings of the 2005 Joint Statistical Meetings, Minneapolis, MN, August 2005b. Alexander G. Tartakovsky, Boris L. Rozovskii, Rudolf B. Blaˇ zek, and Hongjoong Kim. Detection of intrusions in information systems by sequential changepoint methods (with discussion). Statistical Methodology, 3(3):252–340, 2006a. doi: 10.1016/j. stamet.2005.05.003. Alexander G. Tartakovsky, Boris L. Rozovskii, Rudolf B. Blaˇ zek, and Hongjoong Kim. Detection of intrusions in information systems by sequential changepoint methods (with discussion). Statistical Methodology, 3(3):252–340, 2006b. doi: 10.1016/j. stamet.2005.05.003. Alexander G. Tartakovsky, Boris L. Rozovskii, Rudolf B. Blaˇ zek, and Hongjoong Kim. A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Transactions on Signal Processing, 54(9):3372–3382, 2006c. doi: 10.1109/TSP.2006.879308. Alexander G. Tartakovsky, Aleksey S. Polunchenko, and George V . Moustakides. Design and comparison of Shiryaev–Roberts- and CUSUM-type change-point detec- tion procedures. In Proceedings of the 2nd International Workshop in Sequential Methodologies, University of Technology of Troyes, Troyes, France, June 2009. 75 Alexander G. Tartakovsky, Moshe Pollak, and Aleksey S. Polunchenko. Third-order asymptotic optimality of the Generalized Shiryaev–Roberts changepoint detection procedures. Theory of Probability and Its Applications, 56(3):457–484, 2012. doi: 10.1137/S0040585X97985534. Alexander G. Tartakovsky, Aleksey S. Polunchenko, and Grigory Sokolov. Efficient computer network anomaly detection by changepoint detection methods. IEEE Journal of Selected Topics in Signal Processing, 7(1):4–11, February 2013. doi: 10.1109/JSTSP.2012.2233713. Gautam Thatte, Urbashi Mitra, and John Heidemann. Detection of low-rate attacks in computer networks. In Proceedings of the 11th IEEE Global Internet Symposium, pages 1–6, Phoenix, AZ, April 2008. doi: 10.1109/INFOCOM.2008.4544638. Gautam Thatte, Urbashi Mitra, and John Heidemann. Parametric methods for anomaly detection in aggregate traffic. IEEE/ACM Transactions on Networking, 19(2):512– 525, April 2011. doi: 10.1109/TNET.2010.2070845. Abraham Wald. Sequential Analysis. J. Wiley & Sons, Inc., New York, 1947. William H. Woodall. The distribution of the run length of one-sided CUSUM procedures for continuous random variables. Technometrics, 25(3):295–301, August 1983. doi: 10.1080/00401706.1983.10487883. William H. Woodall and Benjamin M. Adams. Handbook of Statistical Methods for Engineers and Scientists, chapter Statistical Process Control. McGraw-Hill Publish- ing, Inc., New York, NY , second edition, 1998. Michael Woodroofe. Nonlinear Renewal Theory in Sequential Analysis. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1982. ISBN 0-89871-180-0. Emmanuel Yashchin. Some aspects of the theory of statistical control schemes. IBM Journal of Research and Development, 31(2):199–205, March 1987. doi: 10.1147/rd. 312.0199. Emmanuel Yashchin. Statistical control schemes: Methods, applications and general- izations. International Statistical Review, 61(1):41–66, 1993. 76
Abstract (if available)
Abstract
The main aim of this work is to generalize the classical single‐population sequential change‐point detection to a more general multi‐population case, and design procedures that retain optimality properties of their single‐population counterparts. Specifically, we propose an alternative performance criterion, better suited for multi‐sensor applications. ❧ Second, we propose a numerical method to study the operating characteristics of a wide class of multi‐population procedures. Specifically, the proposed method is based on the integral‐equations approach and uses the collocation technique with the basis functions chosen so as to exploit a certain change‐of‐measure identity and the SR detection statistic's unique martingale property. Moreover, the design and analysis of the method are complete in that the question of the method's accuracy is properly addressed, providing the error bounds and the convergence rates. ❧ We then discuss the problem of sequential change‐point detection in distributed multi‐sensor networks. Specifically, we assume that the sensors need to communicate with the fusion center at a much smaller rate than the rate at which they collect their observations. Additionally, we assume that the sensors can transmit a small number of bits per communication. We propose decentralized sequential detection rule that requires minimal transmission activity from the sensors and is asymptotically optimal for any scenario regarding the unknown subset of affected sensors. ❧ We conclude with a case study, and consider the problem of efficient intrusion or traffic anomaly detection in large‐scale high‐speed computer networks, and show that proposed procedures are highly effective and advantageous over conventional ones.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Quickest change detection with applications to distributed multi-sensor systems
PDF
Asymptotically optimal sequential multiple testing with (or without) prior information on the number of signals
PDF
Optimizing statistical decisions by adding noise
PDF
Topics in selective inference and replicability analysis
PDF
Optimal and exact control of evolution equations
PDF
Theoretical foundations of approximate Bayesian computation
PDF
Sequential testing of multiple hypotheses
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Congestion control in multi-hop wireless networks
PDF
Design of cost-efficient multi-sensor collaboration in wireless sensor networks
PDF
3D object detection in industrial site point clouds
PDF
Equilibrium model of limit order book and optimal execution problem
PDF
Automatic tracking of protein vesicles
PDF
Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
PDF
Structure learning for manifolds and multivariate time series
PDF
Neural matrix factorization model combing auxiliary information for movie recommender system
PDF
High dimensional estimation and inference with side information
PDF
Large deviations rates in a Gaussian setting and related topics
PDF
Large scale inference with structural information
PDF
Facial key points detection by convolutional neural network
Asset Metadata
Creator
Sokolov, Grigory
(author)
Core Title
Multi-population optimal change-point detection
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
06/06/2014
Defense Date
05/05/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
change‐point detection,multi‐sensor networks,OAI-PMH Harvest,probability,sequential analysis,statistics
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey V. (
committee chair
), Tartakovsky, Alexander (
committee chair
), Friedlander, Susan (
committee member
), Medioni, Gérard G. (
committee member
), Mikulevicius, Remigijus (
committee member
)
Creator Email
gsokolov@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-417276
Unique identifier
UC11297351
Identifier
etd-SokolovGri-2533.pdf (filename),usctheses-c3-417276 (legacy record id)
Legacy Identifier
etd-SokolovGri-2533-0.pdf
Dmrecord
417276
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Sokolov, Grigory
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
change‐point detection
multi‐sensor networks
probability
sequential analysis