Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 855 (2005)
(USC DC Other)
USC Computer Science Technical Reports, no. 855 (2005)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Change Detection in Time Series Data Using Wavelet Footprints Mehdi Sharifzadeh, Farnaz Azmoodeh and Cyrus Shahabi Computer Science Department University of Southern California Los Angeles, CA 90089-0781 [sharifza, azmoodeh, shahabi]@usc.edu ABSTRACT Detecting changes in time series data is an important data analysis task with application in various scientific domains. In this paper, we propose a novel approach to address the problem of change detection in time series data, which can find both the amplitude and degree of changes. Our ap- proach is based on wavelet footprints proposed originally by thesignalprocessingcommunityforsignalcompression. We, however, exploit the properties of footprints to efficiently capture discontinuities in a signal. We show that trans- forming time series data using footprint basis up to degree D generates nonzero coefficients only at the change points withdegreeuptoD. Exploitingthisproperty, weproposea novel change detection query processing scheme which em- ploys footprint-transformed data to identify change points, theiramplitudes,anddegreesofchangeefficientlyandaccu- rately. We also present two methods for exact and approxi- mate transformation of data. Our analytical and empirical results with both synthetic and real-world data show that our approach outperforms the best known change detection approach in terms of both performance and accuracy. Fur- thermore, unlike the state of the art approaches, our query response time is independent from the number of change points in the data and the user-defined change threshold. 1. INTRODUCTION Timeseriesdataaregenerated,maintained,andprocessed within a broad range of application domains in different fields such as economics, meteorology, or sociology. More- over, recent advances in the manufacturing of modern sen- sorydeviceshavecausedseveralapplicationstoutilizethese sensors towards better understanding of the physical world. These sensors when deployed in an environment generate large amount of measurement data streams which can be stored as time series data. Mining such time series data becomes vital as the ap- plications demand for understanding the underlying pro- cesses/phenomena that generate the data. There has been an explosion of interest within the data mining community in indexing, segmenting, clustering, and classifying time se- ries [13, 14, 15, 16]. A specific interesting mining task is to detect change points in a given time series [21, 12, 23, 7, 8]. These are the time positions in the original data where the local trend in the data values has changed. They may in- dicate the points in time when external events have caused the underlying process to behave differently. The problem of detecting change in time series has been mostly studied in the class of segmentation problems [14, 6] where each portion of the data is modelled by a known function. Subsequently, change points are defined as the pointsindatawheretwoadjacentsegmentsofthetimeseries areconnected. However,therearereal-worldapplicationsin whichonlythepositionofthechangeisrequiredandnotthe fitting functions. For the past year, we have been working with Chevron- Texaco on mining real data generated during oil well tests. This is a real-world petroleum engineering application stud- ied within the USC’s Center of Excellence for Research and Academic Training on Interactive Smart Oilfield Technolo- gies(CiSoft). Petroleumengineersdeploysensorsinoilwells to monitor different characteristics of the underlying reser- voir. Here, the underneath pressure values measured by sensors form a time series. When the second derivative in the pressure vs. time plot becomes fixed (i.e., a radial flow event in their terminology), they estimate the “permeabil- ity” of the reservoir [11]. At the same time, they would like to know if the first derivative is changing. To us, the points where the second derivative becomes fixed are the positions in the pressure time series where a change of degree 1 or 2 occurs. In this paper we focus on identifying both the change points and degrees in time series data. While the definition of change is highly application-specific, we focus on points in data where discontinuities occur in the data or any of its ith derivatives. Moreover, we consider the notion of degree of change as the degree of the changing derivatives at the change point. This general definition of change has been broadly used in many scientific application areas such as petroleum engineering [11]. However, its significance has been ignored within the data mining community. Weproposeanovelefficientapproachtofindchangepoints intimeseriesdata. Ourapproachutilizeswavelet footprints, a new family of wavelets recently introduced by the signal processing community for signal compression and denois- ing [5]. While footprints are defined to address a different problem in a different context, we exploit their interesting properties that make them a powerful data analysis tools for our change detection problem. Our contribution starts with employing the idea of wavelet footprints in the con- text of a data mining problem. This is yet another example of adapting signal processing techniques for the purpose of data mining which started by Vitter et. al proposing the use of wavelets in answering OLAP range-sum queries [19], and Chakrabarti et. al using multi-dimensional wavelets for 1 general approximate query processing [1]. Weshowthatfootprintsefficientlycapturediscontinuities ofanydegreeinthetimeseriesdatabygatheringthechange information in the corresponding coefficients. Motivated by this property, we make the following additional contribu- tions: ² We propose two database-friendly methods to trans- formthetimeseriesdatausingfootprintsuptodegree D. These methods enable us to detect all the change pointsofdegree0todegreeD,theircorrespondingam- plitude and degree of change. While our lazy method only approximates the changes, our absolute method computestheexactchangestogetherwiththeirdegree and amplitude. To the best of our knowledge, this is the first change detection approach that captures all the above parameters at the same time. ² Whilewetransformthedatausingfootprints,ourmeth- ods can work with any user-defined threshold value. That is, there is no need to rerun our algorithms each time the user-defined threshold value changes; we an- swer any new query via a single scan over the trans- formed data to return the coefficients greater than the user threshold. This is a considerable improvement over the best change detection algorithms which are highly dependent on this threshold value. ² Both analytically and empirically, we show that our query processing schemes significantly outperform the state of the art change detection methods in terms of performance. Employing either our transformation methods, ourqueryresponsetimeisindependentfrom the number of changepoints in the data. This is while both methods demonstrate a dramatical increase in accuracy. The remainder of the paper is organized as follows. Sec- tion 2 reviews the current data mining research on change detection in time series data. Section 3 provides the back- ground on linear algebra and wavelet theory. In Section 4, we describe the idea of using footprints to capture disconti- nuities in piecewise linear time series. Section 5 generalizes the concept of footprints of degree zero to provide a formal definition for wavelet footprints of arbitrary degrees. We propose our change detection approach and the lazy and absolute methods for footprint transformation in Section 6. In Section 7, we show how our footprint-based approach can be incorporated within systems where time series data is stored in wavelet domain. Section 8 includes our exper- imental results, and Section 9 discusses the conclusion and our future plans. 2. RELATED WORK Change detection in time series has newly received con- siderable attention in the field of data mining. Change de- tection has also been studied for a long time in statistics literature where the main purpose is to find the number of change-points first and identify the stationary model to fit the dataset based on the number of change points. Inthedataminingliterature,changedetectionhasmainly beenstudiedinthetimeseriessegmentationproblems. Most of these studies use linear interpolation to approximate the signal with a series of best fitting lines and return the end- points of the segments as change points in the time series. However, there are many examples of real-world time series which fitting a linear model is inappropriate. For example, Puttagunta et al. [21] use incremental LSR to detect the change points and outlier points with the assumption that the data can be fit with linear models. Also Keogh et al. [12] use probabilistic methods to fit the data with linear segments in order to find patterns in time series. Yamanish et al. [23] reduce the problem of change point detection in time series into that of outlier detection from time series of moving-averaged scores. Although they con- sider a wide group of models for fitting the data, their ap- proach is time intensive and hardly scales to high volume datasets. Ge et al. [7] extend hidden semi markov model for change detection. Their solution is applicable to differ- ent data distributions using different regression functions; however, it is not scalable to large size datasets due to its time complexity. Guralnik et al. [8] suggest using maximum likelihood technique to find the best function to fit the data in each segment. Their method is mainly based on the trade-off between the data fit quality and the number of estimated changes. They also consider a wider group of curve fitting functions; however, they do not consider the possible dis- agreement among different human observers on the actual change points. Also their approach lacks enough flexibil- ity in the sense that they have to rerun the algorithm for different change thresholds asserted by the user. The method described in this paper is mostly similar to the work done by Guralnik et al. in [8]; however, we return all the possible change points for several polynomial curve fittingfunctionssincethenusershavetheflexibilitytofocus onlyontheinterestingchangepoints. Forexample,onlythe change points found by using quadratic and linear models. Moreover, after we find the change points once, there is no need to rerun the algorithm for different change thresholds asserted by the user. 3. PRELIMINARIES WeconsidertimeseriesXn ofsizenasavector(x1;:::;xn) where each x i is a real number (i.e., x i 2 R). Given F, a class of functions (e.g., polynomials), one can find the piecewise segmented function X : [1;n] ! R that models time series Xn as follows: X(t)= 8 > > > > > < > > > > > : P1(t)+e1(t) 1<t·µ1 P 2 (t)+e 2 (t) µ 1 <t·µ 2 : : : PK+1(t)+eK+1(t) µK <t·n (1) Each function P i is a member of class F that best fits the corresponding segment of the data in Xn and each ei(t) is the amount of error introduced when fitting the data with F. Our ultimate goal is to identify µ 1 ;:::;µ K where P i ’s are not known a priori. We refer to these points as the change points in data where discontinuities occur in the data or its derivatives. We use change point and discontinuity inter- changeably in this paper. For example, Figure 1 illustrates a time series with one change point when modelled with quadratic polynomials. In this case P 1 (t) is t 2 +10, P 2 (t) is 2 6t 2 ¡1990, and µ 1 is 20. Throughout the paper, F is the class of polynomial func- tionsofmaximumdegreeD. Thatis,eachPi(t)inEquation 1 can be represented as Pi(t)=pi;Dt D +pi;D¡1t D¡1 +:::+pi;2t 2 +pi;1t+pi;0 We call a change point µ i , a change point of degree j if the corresponding coefficients p i;j and p i+1;j differ in the polynomialrepresentationsofPi(t)andPi+1(t). Noticethat µi is a change point of all degrees j where we have pi;j 6= p i+1;j . For example, µ 1 = 20 is a change point of degrees 0 and 2 in Figure 1. This is because p 1;2 = 1 and p 1;0 = 10 while p 2;2 =6 and p 2;0 =¡1990. 3.1 Linear Algebra In this section, we present some background linear alge- braicdefinitions. WeusethesedefinitionsinSection4when we discuss transforming time series with wavelet footprints. Definition 1. AfinitebasisB foravectorspaceR d isaset of vectors Bi 2R d (i.e., B =fB1;B2;:::;Bng where n = d) such that any vector V 2 R d can be written as a linear combination of B i ’s, i.e., V = n P i=1 c i B i . Note that given a basis B, the set of coefficients c i is unique for a vector V. However, if the number of vectors in B is greater than d (i.e., n > d) then the vector V can be represented as the linear combination of Bi’s in infinite number of ways. We call such basis B where n>d an overcomplete basis forR d . Definition 2. Suppose B = fB 1 ;B 2 ;:::;B n g is a finite basis for vector space R d and there exists a basis ˜ B = f ˜ B 1 ; ˜ B 2 ;:::; ˜ B n g such that hB i ; ˜ B j i= 1 if i=j 0 if i6=j (2) wherehX;YidenotestheinnerproductofvectorsX andY. The “unique” basis ˜ B is known as the dual basis of B. Definition 3. A basis B =fB1;B2;:::;Bng is a biorthog- onal basis if we have hBi; ˜ Bji = 0 for any Bi 6= Bj and hBi; ˜ Bji=1 otherwise. Definition 4. A basis B =fB 1 ;B 2 ;:::;B n g is an orthogo- nal basis if for any Bi6=Bj, we havehBi;Bji=0. Accord- ing to Definition 2, an orthogonal basis is the dual basis of itself (i.e., it is self-dual). To find each coefficient c i where 1 · i · n for a vector V given a basis B (as in Definition 1), we simply compute hV; ˜ B i i. For orthogonal bases, due to their self-duality, c i is computed by the inner product of V to the basis itself, i.e., hV;Bii . The basic idea of compression is to find the basis B and then for each given vector V, only store the coefficients c i ’s. Themainquestioniswhatisthebestbasisforagivenappli- cation and dataset, such that several of c i ’s become zero or take negligible values. In our case, wavelet footprints would resultinci’sthatwouldtakenon-zerovaluesonlyifachange occurs in the vector V. The value of c i then corresponds to the amount of change. Figure 1: time series with one quadratic change point 3.2 Wavelets Wedevelopthebackgroundonthewavelettransformation using an example. We use Haar wavelets to transform our exampletimeseriesintothewaveletdomain. AlthoughHaar waveletsarethesimplestandoldestmembersofthewavelet family, they capture the essential elements of the wavelet theory. Consider the time series X 8 = (0;0;0;0;0;1;1;1). The transformation process starts by computing the pair- wise averages and differences of data. These computed val- ues are multiplied by a normalization factor at each level ( p 2 for Haar) to produce two vectors of summary coef- ficients H1 = (0;0;0:7;1:4) and detail coefficients G1 = (0;0;¡0:7;0), respectively. This process continues by ap- plying the same computation on the vector of summary co- efficients to get H 2 = (0;1:5) and G 2 = (0;¡0:5). Finally, from H 2 we get H 3 = (1:06) and G 3 = (¡1:06). The last summary coefficient (i.e., single element of H3) followed by all n¡ 1 (7) detail coefficients computed during logn (3) iterations form the transformed data. The total number of iterations to get the last summary vector of size one (e.g., H 3 ) is called the level of decomposition. Figure 2 shows the complete transformation on X8. Note that both the transformation from original representation to wavelet rep- resentation and re-transformation to the original presenta- tion from wavelet view can be performed very efficiently in linear time. We can conceptualize the process of wavelet transforma- tion as the projection of the time series vector of size n to n differentvectorsÃ i termedaswavelet basis vectors. Assume thatjXj represents the length of a vector X. Therefore, the wavelet transformation 1 of Xn is ˆ Xn = (ˆ x1;:::;ˆ xn) where ˆ xi =hXn;Ãii: 1 jÃ i j where the term 1 jÃ i j is the normalization factor (notice that Haar wavelet basis is orthogonal, and hence self-dual). Moreover, time series Xn can be repre- sented as the linear combination: X n = n X i=1 ˆ x i Ã i (3) 1 Throughout the paper, we assume that the size of the time series we work with is always a power of 2. This can be achieved in practice by padding the time series with zeroes. 3 Figure 2: Wavelet example (traditional view of wavelets) 1 1 1 1 1 1 1 1 1 1 1 1 ¡1 ¡1 ¡1 ¡1 1 1 ¡1 ¡1 0 0 0 0 0 0 0 0 1 1 ¡1 ¡1 1 ¡1 0 0 0 0 0 0 0 0 1 ¡1 0 0 0 0 0 0 0 0 1 ¡1 0 0 0 0 0 0 0 0 1 ¡1 Figure 3: Haar wavelet basis of size 8 Figure 3 shows Haar wavelet basis vectors of size 8 as different rows of an 8£8 matrix. In general, we identify Haar wavelet basis vectors of size n as Ãi where 1· i· n. The first vector Ã1 consists of n 1’s. Theremaindern¡1vectorscorrespondingtothedetail coefficients are defined as follows: Ã 2 j +k+1 (l)= 8 < : 1 k: N 2 j ·l·k: N 2 j + N 2 j+1 ¡1 ¡1 k: N 2 j + N 2 j+1 ·l·k: N 2 j + N 2 j ¡1 0 otherwise (4) where 0 · j · logn (j is the level of decomposition), k = 0;:::;2 j ¡1, and 1·l·n. We now define the term support interval that we will use throughout the paper. Definition 5. Let ˆ Xn = (ˆ x1;:::;ˆ xn) be the wavelet trans- formation of the time series X n . The support interval of a wavelet coefficient ˆ x i is the range of indices j 2 [1;n] such that ˆ x i isderivedfromx j ’s, i.e., thevalueof ˆ x i iscalculated from xj’s. For example, the support interval of the first coefficient ˆ x 1 is the entire time series (i.e., [1;n]), while that of the last coefficient ˆ xn is the last two elements ofXn (i.e., [n¡1;n]). We use Sup(ˆ xi) to denote the support interval of coefficient ˆ x i . Similarly, we use Sup ¡1 (j) to refer to the set of all wavelet coefficients which are derived from x j , i.e., all ˆ x i ’s such that x j 2Sup(ˆ x i ). 4. FOOTPRINTS AT A GLANCE Wavelets have been widely used in different data mining applications due to their power in capturing the trend of the data as well as their approximation property [17]. How- ever, wavelets in their general form do not efficiently model Figure 4: The top line shows the original data and the following lines are the wavelet representations at different levels. Notice how nonzero coefficients arescatteredamongdifferentlevelsduetotheover- lapping support intervals. discontinuities in the data. To illustrate the problem, con- sider the example of Figure 2. Although there exists only one discontinuity point at fifth position of our example time series X 8 , we get three nonzero coefficients (other than the average) in the final transformed vector ˆ X 8 . The reason is that there is a great amount of overlap among the support intervals of different coefficients in different levels. Figure 4 shows how wavelet transform scatters the effect of a single discontinuity point among wavelet coefficients of different levels. Therefore, to benefit from the approximation power of wavelets and efficiently model the change points in the underlying data at the same time, a new form of basis is required. Dragotti et al. [3] introduce a new basis which removes the overlap among the support intervals of corresponding wavelet coefficients at different levels. They call this basis waveletfootprints orfootprints forshort,sincefootprintsare basically the traces left by the discontinuities (i.e., singular- ities). We now explain the idea behind the footprints, as- suming for simplicity piecewise constant data only for now. Consider Xn with only one discontinuity at position µ: Xn(i)= a 1·i·µ b µ 0 is n£(D+1), and hence the resulting basis is overcomplete. ² Thefootprintsefficientlymodel discontinuitiesintime series data; a piecewise constant time series with K discontinuities, can be represented with only K foot- prints together with the summary coefficients. Foot- printscontainthefollowinginformationaboutthedis- continuity point generating them: 1. The amplitude of the discontinuity. 2. The characteristics of the two polynomials right before and after the discontinuity point. Also in piecewise polynomial case, time series with K discontinuities of maximum degree D can be repre- sented with the summary coefficients and K£(D+1) footprint coefficients. 6. CHANGE DETECTION WITH FOOTPRINTS We showed that nonzero coefficients in footprint trans- formed time series data are representatives for the change points in the data. Therefore, a novel change detection ap- proach emerges by employing footprints. Throughout this section,weassumethatwehavepre-computedthebiorthog- onal footprint basis FD (and ˜ FD) in terms of the ith discon- tinuityvectorsandthesummaryvectorofdegreeupto D as we showed in Section 5. Note that this is a one time process independent of either the data or the queries. We would like to answer two major categories of change detection queries. These queries can be issued on a single time series or on a database of time series data: ² Q d : Return change points of all degrees. ² Q da : Return change points of all degrees, their corre- sponding degrees and change amplitudes. SimilartoanygeneralSQLquery,usercanenforcerestric- tionsondegreeofchangepointoritschangeamplitude. For example, user can ask for change points of degree d where the change amplitude is greater than a threshold T. Our approach stores the time series data in the wavelet domain. That is, instead of the original data, its wavelet transformation is stored in the database. We use footprint basis F D as our wavelet basis. Now, our approach answers change detection queries by returning the nonzero coeffi- cients stored in its database. Figures 6a and 6b illustrate the process flow of our approach. We describe each part in details. 6.1 Insert/Update Upon receiving the new data (i.e., time series), we trans- form it using F D and then store it in the database. Since transforming data using footprints may be a time consum- ing task, we propose a lazy and an absolute method for the transformationinSections6.3.1and6.3.2,respectively. The lazy transformed data does not contain the exact informa- tion about the change amplitude and hence cannot be used to answer Q da queries. However, the absolute transformed data can be processed to answer both Q d andQ da queries. To update the transformed data, approaches such as Shift- Split [9] can be used to update the transform data stored in the wavelet domain efficiently. 6 a) Time Series DB (wavelet domain) Absolute Method Lazy Method a da Select the position of nonzero coefficients change points future query types? only Q a Q a and Q ad time series data b) Time Series DB Absolute Method Lazy Method a da Select the position of nonzero coefficients change points Figure 6: a) Query processing in wavelet domain and b) Ad hoc query processing 6.2 Query Processing On receiving a change detection query on a specific por- tion of the data, we retrieve the nonzero coefficients corre- spondingtothatportionofthedatafromthedatabase. For each nonzero coefficient corresponding to footprint vector f (d) i , we only return a change of degree d at point i. If the user is interested in changes greater than a given threshold, we return the coefficients greater than that threshold. The timecomplexityofourapproachtochangedetectionisO(n) foreachtimeseriesofsizen,sinceweonlyneedasinglescan over the data. Trivially, the pre-transformation of the data eliminates theneedtorestarttheentireprocesswheneveruserspecifies a new degree and/or threshold for the change values. This makesourapproachfasterandmorepracticalthantheother change detection approaches where their entire algorithm is highly dependent on either threshold value or degree. 6.3 Footprint Transformation Thechallengehereistotransformthedatausingthefoot- print basis. We propose two different methods for footprint transformation. The first lazy method is mainly based on approximating the footprint coefficients by projecting the time series on the dual basis of the footprint basis. This method is highly efficient in terms of performance but it is not tolerant to the existence of high-amplitude noise. Also the coefficients returned by this method are not the exact footprintcoefficientsduetotheoverlapamongthefootprint basis of different degrees. The second absolute method is based on a greedy iter- ative algorithm termed matching pursuit [18] which is a proven approach in signal processing for representing sig- nals in terms of an overcomplete basis. The outputs of both methods enable us to answer change detection queries by retrieving the nonzero coefficients and reporting their posi- tions as change points. The absolute method has the extra advantage that it returns the amplitude of change as well since the coefficients are the exacts footprint coefficients. Because of possible noise in the data, both methods may employ thresholds to select nonzero coefficients. 6.3.1 The Lazy Footprint Transformation WeassumethatF D ,thefootprintbasisofdegreeuptoD, anditsdualbasis ˜ FD arepre-computed. Notethatthecom- putation of vectors of FD is completely data-independent. We would like to find the change points in a given time se- ries X n . The lazy method approximates the coefficients of X n by simply computing the ® (d) i =hX n ; ˜ f (d) i i for all f (d) i ’s in the basis. During the query processing, it returns i as a change point if ® (d) i is greater than the user defined thresh- oldineachfootprintbasisofdegree d(seeSection6.2). The universal threshold u = ¾ p 2lnN suggested in [4] is an ap- propriate candidate for the threshold value. Since our basis is overcomplete, the dependency among footprint vectors can result into false hits and false nega- tives. To reduce the effect of this dependency, we make the footprintvectorscorrespondingtoeachdiscontinuitypointk locally orthogonal by applying orthogonalization techniques such as Gram Schmidt [2] to f (d) k . Notice that the coeffi- cients computed by the lazy method are not the exact foot- print coefficients due to the overcompleteness of the basis. Theyonlyapproximatethediscontinuitypoints. Thisisthe reasonthatthelazytransformeddatacannotbeusedtoan- swer Q da queries. However, in Section 8 we show that the lazy transformation performs very effectively for detecting the change points. For each time series of size n, the time complexity of the lazytransformationisO(n 2 ). Thereasonisthatthismethod requires projecting Xn on each of the vectors of footprint basis F D . The number of these vectors is O(n) where n is the size of the time series data. 6.3.2 The Absolute Footprint Transformation As mentioned in Section 5.1, the footprint vectors consti- tuteanovercompletebasis. Thisovercompletebasisgivesus more power and flexibility in modelling changes in the data. As a drawback, transformation of the time series X n (i.e., coefficients ® (d) i ) becomes a more challenging task. Here, in order to compute the exact values for ® (d) i ’s we use the matching pursuit technique to find the nonzero ® (d) i coeffi- cients. The following iterative procedure computes the co- efficients for time series Xn: 1. Find the projection of Xn on all vectors f (d) i in the footprintbasis,i.e.,computeallthecoefficients®i such that ®i =hXn;f (d) i i. 2. Let® j =max(® i );thecoefficientcorrespondingtothe highest energy component of Xn. 3. Let X 0 n = Xn¡ D P d=0 hXn;f (d) j i:f (d) j where f (d) j is the footprint corresponding to ® j . 4. If all the values in X 0 n are zero (or less than a prede- fined threshold in case of noisy data) exit the proce- dure. 5. Let X n =X 0 n and goto 1. 7 The set of coefficients selected by step 2 of the above pro- cedureformthenonzerocoefficientsofthetransformationof time series Xn using footprint basis of degree d. Therefore, their corresponding positions are change points of degree d inthetimeseriesX n . Noticethatthecoefficientscomputed by the absolute approach are the exact coefficients of the footprint transformation. We can modify matching pursuit such that the algorithm terminates after maximum K 2 it- erations where K is the number of change points in Xn. Hence, the overall time complexity of the absolute transfor- mation becomes O( K 2 :n 2 ). An important advantage of the absolute method over the lazy method is that with the former we get a) all the exact change points of degrees 0 to D, and b) the exact ampli- tude of the change at each change point. This enables us to answer both Q d and Q da queries which focus on the am- plitude and degree of change at each point. For example, the absolute method can return the change points higher in value than 10 with the change of degree 3 in a given time series. 6.4 Ad Hoc Query Processing If the data cannot be stored in the wavelet domain, we must transform it in real-time when we receive a query. We choose between the lazy or absolute method based on the type of query. The time complexity of this ad hoc change detection approach is equal to the time complexity of the transformationusingfootprintswhichisO(n 2 )(seeSections 6.3.1 and 6.3.2). 7. CUSTOMIZING FOOTPRINTS FOR WA VELET- BASED APPLICATIONS In the previous sections, we developed a novel approach for change detection based on the transformation of the data using wavelet footprints. In this section we show that our approach can be incorporated within systems where the time series data is maintained in the wavelet domains (e.g., ProDA [10]). That is, instead of the original data, its wavelet transformation is stored. An example of an approach dealing with the data di- rectly in the wavelet domain is ProPolyne introduced in [22]. ProPolyne is a wavelet-based technique for answer- ing polynomial range-aggregate queries. It uses the trans- formed data from the wavelet domain to generate the re- sult. We show that ProPolyne’s approach to answer poly- nomialrange-aggregatequeriesisstillfeasiblewhenthedata is transformed using footprint wavelet basis instead of gen- eral wavelet bases such as Haar. WithProPolyne,apolynomialrange-aggregatequery(e.g., SUM,AVERAGE,orVARIANCE)isrepresentedasaquery vector Q n . Then, the answer to the query ishX n ;Q n i. The family of wavelet basis used by ProPolyne each constitutes anorthogonalbasis. Thus, accordingto Parseval’s theorem, theypreservetheenergyofthedataafterthetransformation and hence we have: hX n ;Q n i=h ˆ X n ; ˆ Q n i: (14) Therefore,ProPolyneevaluatestheinnerproductof ˆ Xn and ˆ Qn as the answer to the query Qn. Now,weshowthatifwetransformthedatausingthefoot- print basis F D and the query using the dual basis of F D , we are able to answer polynomial range-aggregate queries pro- posed by ProPolyne. Therefore, ProPolyne can transform data using footprint basis and still benefit from its unique properties. ItiseasytoseethatEquation14doesnotholdforwavelet footprints since the footprint basis is not an orthogonal ba- sis. Here, we extend Equation 14 to hold for the biorthogo- nal bases. Assume that ˆ Xn and ˜ Xn are the transformations of Xn using the footprint wavelet basis, and its dual basis, respectively. Now, according to Definitions 3 and 4, and Equation 14 it is easy to see that hX n ;Q n i=h ˆ X n ; ˜ Q n i (15) hXn;Qni=h ˜ Xn; ˆ Qni (16) In practice, we use Equation 15 where the data is trans- formed to wavelet in advance and the dual of the query will be computed on the fly to perform the dot product at the query time. 8. EXPERIMENTAL RESULTS We conducted several experiments to evaluate the per- formance of our proposed approach for change detection in time series data. We compared the query response time of our ad hoc query processing approach described in Section 6.4 with the maximum likelihood-based algorithm proposed by Guralnik et. al [8]. We chose their approach for com- parison because it is the fastest change detection algorithm that considers different degrees of change. Throughout this section, we refer to their method as the Likelihood method. We studied how the size of the time series (n) and the total number of its change points (K) affects the performance of each method. We also evaluated our lazy and absolute methods by in- vestigating the effect of the following parameters on their accuracy: 1) the minimum distance between two consecu- tive change points (MinDist), 2) the maximum degree of change points in the data (MaxDeg), 3) the maximum de- gree D of footprint basis (MaxDegF), and 4) the amount of noise in the data (Noise). We represent the accuracy of each method in terms of the average number of missed change points (false negatives AFN) and detected spurious change points (false hits AFH). We used both synthetic and real-world datasets. We gen- erated a synthetic dataset D3 of 80 time series each with a size in the range of 100 to 5,000. Each time series of the datasetD3isaconcatenationofseveralsegmentseachmod- elledbyapolynomialofdegreeupto3. Theaveragenumber of change points in each time series is 10. Our real-world dataset include oil and pressure time series for different oil wells in California. The experiments were performed using MATLAB on a DELL Precision 470 with Xeon 3.2 GHz processor and 2GB of RAM. Notice that for the absolute method, we used a modifiedversionofmatchingpursuitintroducedin[3]which is guaranteed to terminate after K 2 iterations where K is thenumber ofchangepoints. Forthelikelihood method, we use the threshold value using which the method computes the most accurate result. Sections 8.1 and 8.2 focus on our synthetic dataset as we already know the exact character- istics of the change in their time series. This enables us to measuretheaccuracyofourapproach. Section8.3discusses our experiments with the real-world data. 8 8.1 Performance In our first set of experiments, we compared the perfor- mance of our ad hoc change detection query processing us- ing both lazy and absolute methods with the Likelihood method. As the Likelihood method uses the original time series data as input, we must compare it only with our ad hocapproachwherethefootprinttransformationtakesplace at the query time. That is, the CPU time reported for the lazy and absolute methods include both the time for the footprint transformation of data and that of detecting the change points. We used footprint basis of up to degree 3 to transform the data in the lazy and absolute methods (i.e., f (0) , f (1) , f (2) and f (3) ). That is, MaxDeg = MaxDegF. Also, we used polynomials of up to degree 3 for finding the change points in the Likelihood method. We varied the size of time series data from 100 to 5,000 and measured the CPU cost of each method. Figure 7 de- picts the performance of our lazy and absolute methods as compared to that of the Likelihood method. In the figure, Lazy(i) denotes the measurements of the lazy method in which the threshold value is i£u (u is the universal thresh- old and i is simply a factor multiplied by u). As shown in the figure, our lazy and absolute methods outperform the Likelihood method by a factor of 2 to 8 when the size of dataincreasesfrom100to5,000. Italsoshowsthatthelazy methodisfasterthantheabsolutemethodbecauseittrades the performance for accuracy. Note that we showed in Sections 6.3.1 and 6.3.2 that the timecomplexityofourmethodsisO(n 2 )ifwetransformthe timeseriesatthequerytime. Moreover,thetimecomplexity of the Likelihood method is also O(n 2 ) [8]. However, as ourfirstexperimentshows,ourmethodsperformpractically better than the Likelihood method in terms of CPU cost. ThetheoreticaltimecomplexitiesoftheabsoluteandLike- lihood methods depend on K, the number of change points in the data. On the other hand, the lazy method is a se- ries of simple projections which are independent from the characteristics of the data. To study the effect of K on the performance of each method, we varied K from 0 to 42 on timeseriesofsize2,048andmeasuredtheCPUcost. Figure 8illustratesthatwhiletheperformanceofbothourmethods remains almost fixed for different number of change points, the CPU cost of Likelihood method dramatically increases when the number of change points in data increases. The intuition here is that the number of Likelihood’s iterations to fit the functions on the segments of data is identical to thenumberofchangepointsandeachiterationperformsex- pensive computations. This is while the computation time of our lazy method only depends on the size of data (n). As showninthefigure,theCPUcostoflazymethodisfixedfor differentvaluesofK. Theabsolutemethodperformsslightly slower when K increases but its performance downgrade is negligible. The figure does not show the CPU cost of the Likelihood method for K >25 but the method computes 42 change points in 1.7 seconds, almost 68 times slower than our methods. Noticethatevenwhenthereisnochangeinthedata(i.e., K = 0), our methods are performing at least twice better than the Likelihood method. The reason is that the Likeli- hood method still requires to apply one regression iteration on the data. To conclude, we showed that unlike the like- lihood method both our methods are scalable with respect to the number of change points in the data. Figure 7: Query cost vs. size of data (n) Figure 8: Query cost vs. number of change points (K) 8.2 Accuracy Our second set of experiments were aimed to evaluate the accuracy of each method in terms of number of missed change points and the detected spurious change points (i.e., precision and recall). First, we illustrate how each method performs on an example time series data. Figure 9 shows a small time series of size 200 generated with polynomial seg- ments of maximum degree 2. It also shows the true change points and those detected by each method. We used foot- print basis of up to degree 2 to transform the data in the lazyandabsolutemethods(i.e.,f (0) ,f (1) andf (2) ). Alsofor the Likelihood method we used polynomials of up to degree 2 for finding the change points. There are ten actual change points as shown in Figure 9 and the minimum distance between each two change points is 20. The likelihood and lazy(1.5) methods both miss the change point at t = 120. Also, Likelihood method detects two false hits at points t = 51 and t = 90. Also for the change that occurs at point t = 100, it detects two change points at t = 98 and t = 101. The lazy method returns no false hit. The absolute method returns all 10 actual change 9 Figure 9: Detected change points by the absolute, lazy and Likelihood methods. The vertical lines show the actual change points in the data. points at their exact positions without any false hit (i.e., 100% precision and recall). Notice that using our footprint-based approach, we also acquire valuable information about the degree and ampli- tude of each change point. For example, at point t=40, we have a discontinuity caused by a quadratic segment follow- ing a constant segment. Hence, we get large coefficients for f (0) 40 and f (2) 40 . Also at point t = 140, we have a discontinu- itycausedbyaconstantsegmentfollowingalinearsegment. Hence, we get large coefficients for f (0) 140 and f (1) 140 and zero for f (2) 140 . We repeated the previous experiment on all time series of the dataset D3 for which K =10. We varied the minimum distance between each two consecutive change points in the data (MinDist) from 5 to 50. Table 1 shows the average number of false negatives (AFN) for the lazy, absolute and Likelihood methods. Column F3 shows the results for the experiment where we used footprints of degrees 0, 1, 2, and 3 for the lazy and absolute methods. That is, we used F 3 = ff (0) ;f (1) ;f (2) ;f (3) g. Note that in this case MaxDegF is identical to MaxDeg. Likewise, column F2 shows the case whereweusedfootprintsofdegrees0,1,and2. Forthiscase, MaxDegF < MaxDeg. That is, we used footprint basis of a degree less than the degree of change points in the data. The Likelihood method also used polynomial functions of degrees up to 2 (resp. 3) for case F2 (resp. F3). 8.2.1 The effect of MinDist As Table 1 depicts, both lazy and absolute methods al- ways outperform the Likelihood approach in terms of accu- racy with absolute being the superior method. With the small values of MinDist, the accuracies of all methods dra- matically downgrade. That is, if the change points are too close to each other, all methods are unable to detect some of the true change points. However, even for close change points, the absolute method misses only one of 10 change points on average. This yields that the absolute method is resilient to the effect of closeness of change points. As the distancesbetweenchangepointsareincreasing,theaccuracy of all methods improves. When MinDist is 20, the absolute method correctly detects all change points when MaxDeg = MaxDegF (i.e., case F3). Note that in this experiment, the absolute method detected no false change points (i.e., AFH = 0). However, AFH = 0.5 for the Likelihood method and AFH=0.4forourlazymethodforallthevaluesofMinDist. Thisshowsthatwhileallmethodsareperformingaccurately for large enough MinDist values, our absolute method can effectively detect even very close change points. 8.2.2 The effect of MaxDeg and MaxDegF In this section, we investigate the accuracies of all the three methods for the cases where the degree of footprints in our methods or that of polynomials in the Likelihood method (MaxDegF) is less than, equal, or greater than the maximum degree of change in data (MaxDeg). We focus on the results shown in Table 1 for MinDist = 20. It is clear that when MaxDegF is less than MaxDeg (i.e., case F2), all methods miss more true change points as compared to the caseF3 where the footprints and polynomials of appro- priate degrees have been used (i.e., MaxDegF = MaxDeg). However, the average number of false hits of each method does not change. We repeated our experiments on only those time series of D3 which consist of polynomial segments of up to degree 2. WeusedfootprintbasisF3 andpolynomialsofdegreeupto3 in the methods. While we do not report the detailed results for this case where MaxDegF > MaxDeg, we report that no methodgeneratedfalsehitsinthiscase. Itshowsthatusing footprintbasisofadegreegreaterthanthemaximumdegree of change in the data does not result into detecting false change points by our methods. Therefore, if we guarantee that the degree D of our footprint basis is large enough, our approach is able to accurately detect all the change points in the data. 8.2.3 The effect of Noise In our third set of experiments, we studied the effect of noise on the accuracy of the change detection. Here, we fixed the minimum distance between change points to 10, threshold value equal to 1:5£u, and FD =F2. Using poly- nomials of degree up to 2, we generated two noisy datasets. We added noise with the standard deviation of about 1=15 to 1=30 and 1=150 to 1=300 of the average of values in time series to generate two noisy datasets Noisy(1) and Noisy(0.1), respectively. The number of change points in each time series is 10. Table 2 shows the accuracy results of applying all three methods on both datasets. Trivially, the presence of noise reduces the performance of all methods. Our lazy method which detects the change pointsbasedonapproximationsofthefootprinttransforma- tionofthedatastartsmissingsomeofthetruechangepoints in the presence of noise. On the other hand, the Likelihood method generates more false hits in this case. However, our absolutemethodstillgeneratesthemostaccurateresultsfor different amount of noise in data. 8.3 Experiment With Real-World Datasets Finally, the last set of experiments focuses on real-world time series data. We tested our methods on different time series generated within the oil industry. Here, we report the results on three time seriesOIL1,OIL2, andGAS ob- tainedfromPetroleumTechnologyTransferCouncil 3 . These time series are collected from wells in active oil fields in 3 http://www.westcoastpttc.org/ 10 F3 F2 Method AFN Method AFN MinDist=5 MinDist=5 Lazy(1.5) 3 Lazy(1.5) 3.5 Lazy(1) 2.1 Lazy(1) 3.1 Absolute 0.9 Absolute 1 Likelihood 4.5 Likelihood 4.1 MinDist=10 MinDist=10 Lazy(1.5) 1.9 Lazy(1.5) 2.9 Lazy(1) 0.9 Lazy(1) 2.7 Absolute 0.2 Absolute 0.5 Likelihood 1.8 Likelihood 3.0 MinDist=20 MinDist=20 Lazy(1.5) 0.5 Lazy(1.5) 1.1 Lazy(1) 0.2 Lazy(1) 0.6 Absolute 0 Absolute 0.1 Likelihood 0.2 Likelihood 1 MinDist=50 MinDist=50 Lazy(1.5) 0.3 Lazy(1.5) 0.9 Lazy(1) 0.2 Lazy(1) 0.4 Absolute 0 Absolute 0.1 Likelihood 0.2 Likelihood 0.8 Table 1: Accuracy results of all methods for cases F2 and F3 Noisy(1) Noisy(0.1) Method AFN AFH Method AFN AFH Lazy(1) 5 1.2 Lazy(1) 3.5 1 Absolute 2 0.9 Absolute 1.5 0.8 Likelihood 2.8 3 likelihood 2.6 3 Table2: Accuracyresultsofallmethodsfordatasets Noisy(1) and Noisy(0.1) California. OIL1 and OIL2 include oil production during 1985-1995 and 1974-2002, respectively. GAS includes the gasproductionratemeasuredina2300dayperiod,sampling once every 15 days. Unlike synthetic time series, here we do not know where the exact change points are. Therefore, we evaluated our methods visually based on the position of their detected changepoints. Figure 10 depicts the change points detected in time series OIL1 by the absolute, lazy and Likelihood method. While both our methods perform absolutely per- fect on this data, the Likelihood method ignores 60% of the change points. For example, there are local minimum and maximum points at t = 28 and t = 57, respectively. The Likelihoodmethodmissesthesepoints. Moreover,itreturns several false points such as t = 29 and t = 47. Figures 11 and 12 show similar results for time seriesOIL2 andGAS, respectively. Notice that our absolute method does not identify any change at points such as t = 235 in Figure 11. The rea- son here is that the segment corresponding to the range [228;240] can be perfectly modelled by a polynomial of de- gree 3. Therefore, t=235 is not a discontinuity of degree 3 in OIL2. Figure 13 illustrates the detected change points in GAS data by each of three methods when they use higher thresh- oldvaluesascomparedtothoseusedinFigure12. Compar- ingFigures12and13showsthattheformerdetectsallsmall changes while the later identifies only major changes in the data. Notice that once our methods detect changes using Figure 10: Detected change points by the absolute, lazy and Likelihood methods on OIL1 Figure 11: Detected change points by the absolute, lazy and Likelihood methods on OIL2 a given threshold value, changes above different thresholds can be identified only by doing a scan over all coefficients. However, with the likelihood method, we need to rerun the whole process when the user changes her threshold value. For example, we ran the likelihood method separately for the results of each of Figures 12 and 13. For our methods, weonlyranourapproachoncetogeneratetheresultsofFig- ure 12 and only checked all the coefficients for the different thresholds of Figure 13. 9. CONCLUSIONS AND FUTURE WORK Westudiedtheproblemofdetectingchangesintimeseries data. Using a real-world motivating application, we showed that different scientific data analysis tasks might have dif- ferentdefinitionsforchange. Therefore, weformallydefined the degree of change for change points in the time series data. Ourdefinitioniscloselyrelatedtothedifferenceintwo 11 Figure 12: Detected change points by the absolute, lazy and Likelihood methods on GAS polynomial functions fitting the two adjacent segments of data. We then described our novel wavelet-based approach whichemployswaveletfootprintsfordefiningdiscontinuities of different degrees. First, we provided preliminary background on footprints and showed their property of capturing discontinuities in thetimeseriesdata. Weshowedthatthefootprinttransfor- mation of the time series data using footprints of degree d containsnonzerocoefficientsonlyatthechangepointsofde- gree d. Subsequently, we proposed our approach for change detection in time series data describing its data transforma- tion and query processing modules. We proposed lazy and absolute methods to transform the data using footprint ba- sis. Ourlazymethodapproximatesthecoefficientsbasedon which change points can be identified efficiently. However, our absolute method computes the accurate coefficients and benefits from the ability to provide the amplitude of the change at each change point. We also showed that our ap- proach can be efficiently incorporated within the systems such as ProDA [10] where the time series data is stored in the wavelet domain. Finally, we compared the performance and accuracy of our footprint-based approach with the maximum likelihood method[8]throughexhaustivesetsofexperimentswithboth syntheticandreal-worlddata. Ourempiricalresultsshowed the followings: ² Our ad hoc query processing approach with both lazy and absolute methods outperforms the maximum like- lihoodmethodspeciallyforlargetimeseriesdata. Fur- thermore, while the performance of maximum likeli- hoodmethoddramaticallydowngradeswithincreasing numberofchangepointsinthedata,bothofourmeth- ods are scalable with respect to this factor. This is a significant improvement over the best known method specially for real-world applications where the charac- teristics of the data is not known a priori. ² Ourchangedetectionapproachishighlyaccurate. Even thelazymethodwhichapproximatesthechangepoints performsasaccurateasthemaximumlikelihoodmethod. Figure 13: Detected change points by the absolute, lazyandLikelihoodmethodsonGASwiththreshold values 1:5£u for the absolute and lazy methods and 0:2£u for the likelihood method Theabsolutemethoddetectsexactlyalldiscontinuities and computes their amplitudes and degree of change whenthedataisnoiseless. Evenwiththenoiseindata, itisconsiderablymoreprecisethantheothermethods. In this paper, for the first time we exploited the interest- ing characteristics of footprints for change detection in time series data. Motivated by our results, we plan to develop a footprint-based tool for real-time change detection on data streams. 10. ACKNOWLEDGEMENT We would like to acknowledge Dr. Antonio Ortega and his student En-Shuo Tsau for their valuable help in our un- derstanding of the theory of wavelet footprints. ThisresearchhasbeenfundedinpartbyNSFgrantsEEC- 9529152 (IMSC ERC) and IIS-0238560 (PECASE), unre- strictedcashgiftsfromMicrosoft, anon-goingcollaboration under NASA’s GENESIS-II REASON project and partly funded by the Center of Excellence for Research and Aca- demic Training on Interactive Smart Oilfield Technologies (CiSoft); CiSoft is a joint University of Southern California - ChevronTexaco initiative. 11. REFERENCES [1] K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, editors, VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, pages 111–122. Morgan Kaufmann, 2000. [2] H. Cohen. A course in computational algebraic number theory. Springer-Verlag New York, Inc., 1993. [3] P. L. Dragotti and M. Vetterli. Wavelet transform footprints: Catching singularities for compression and denoising. In ICIP, 2000. 12 [4] P. L. Dragotti and M. Vetterli. Deconvolution with wavelet footprints for ill-posed inverse problems. In IEEE Conference on Acoustics, Speech and Signal Processing, volume 2, pages 1257–1260, Orlando, Florida, USA, May 2002. [5] P. L. Dragotti and M. Vetterli. Wavelet footprints: Theory, algorithms and applications. IEEE Transactions on Signal Processing, 51(5):1306–1323, May 2003. [6] L. Firoiu and P. R. Cohen. Segmenting time series with a hybrid neural networks - hidden markov model. In Eighteenth national conference on Artificial intelligence, pages 247–252. American Association for Artificial Intelligence, 2002. [7] X. Ge. Segmental semi-markov models and applications to sequence analysis. PhD thesis, 2002. Chair-Padhraic Smyth. [8] V. Guralnik and J. Srivastava. Event detection from time series data. In KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 33–42. ACM Press, 1999. [9] M. Jahangiri, D. Sacharidis, and C. Shahabi. SHIFT-SPLIT: I/O Efficient Maintenance of Wavelet-Transformed Multidimensional Data. In Proceedings of the 24th ACM SIGMOD International Conference on Management of Data, 2005. [10] M. Jahangiri and C. Shahabi. ProDA: A Suit of WebServices for Progressive Data Analysis. In Proceedings of 24th ACM SIGMOD International Conference on Management of Data, 2005. (demostration). [11] R. Jr. Advances in Well Test Analysis, volume 5. Society of Petroleum Engineers, 1977. [12] E. Keogh and P. Smyth. A probabilistic approach to fast pattern matching in time series databases. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Third International Conference on Knowledge Discovery and Data Mining, pages 24–30, Newport Beach, CA, USA, 1997. AAAI Press, Menlo Park, California. [13] E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In SIGMOD Conference, 2001. [14] E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In ICDM ’01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 289–296. IEEE Computer Society, 2001. [15] J. Lin, E. Keogh, and W. Truppel. Clustering of streaming time series is meaningless. In DMKD ’03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 56–65. ACM Press, 2003. [16] J. Lin, M. Vlachos, E. J. Keogh, and D. Gunopulos. Iterative incremental clustering of time series. In EDBT ’04: Proceedings of the 8th International Conference on Extending Database Technology, pages 106–122, 2004. [17] S. Mallat and W. L. Hwang. Singularity detection and processing with wavelets. IEEE Trans. Inf. Th, 38:617–643, 1992. [18] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993. [19] Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pages 448–459. ACM Press, 1998. [20] A. V. Oppenheim and R. W. Schafer. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, USA, 1975. [21] V. Puttagunta and K. Kalpakis. Adaptive methods for activity monitoring of streaming data. In ICMLA, pages 197–203, 2002. [22] R. R. Schmidt and C. Shahabi. Propolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries. In EDBT ’02: Proceedings of the 8th International Conference on Extending Database Technology, pages 664–681. Springer-Verlag, 2002. [23] K. Yamanishi and J. ichi Takeuchi. A unifying framework for detecting outliers and change points from non-stationary time series data. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 676–681. ACM Press, 2002. 13
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 868 (2005)
PDF
USC Computer Science Technical Reports, no. 826 (2004)
PDF
USC Computer Science Technical Reports, no. 893 (2007)
PDF
USC Computer Science Technical Reports, no. 845 (2005)
PDF
USC Computer Science Technical Reports, no. 740 (2001)
PDF
USC Computer Science Technical Reports, no. 744 (2001)
PDF
USC Computer Science Technical Reports, no. 754 (2002)
PDF
USC Computer Science Technical Reports, no. 813 (2004)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 739 (2001)
PDF
USC Computer Science Technical Reports, no. 622 (1995)
PDF
USC Computer Science Technical Reports, no. 966 (2016)
PDF
USC Computer Science Technical Reports, no. 962 (2015)
PDF
USC Computer Science Technical Reports, no. 590 (1994)
PDF
USC Computer Science Technical Reports, no. 592 (1994)
PDF
USC Computer Science Technical Reports, no. 869 (2005)
PDF
USC Computer Science Technical Reports, no. 839 (2004)
PDF
USC Computer Science Technical Reports, no. 584 (1994)
Description
Mehdi Sharifzadeh, Farnaz Azmoodeh, Cyrus Shahabi. "Change detection in time series data using wavelet footprints." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 855 (2005).
Asset Metadata
Creator
Azmoodeh, Farnaz
(author),
Shahabi, Cyrus
(author),
Sharifzadeh, Mehdi
(author)
Core Title
USC Computer Science Technical Reports, no. 855 (2005)
Alternative Title
Change detection in time series data using wavelet footprints (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
13 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269721
Identifier
05-855 Change Detection in Time Series Data Using Wavelet Footprints (filename)
Legacy Identifier
usc-cstr-05-855
Format
13 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/