Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 754 (2002)
(USC DC Other)
USC Computer Science Technical Reports, no. 754 (2002)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ProP olyne: A F ast W a v elet-based Algorithm for Progressiv e Ev aluation of P olynomial Range-Sum Queries ? RolfeR.Sc hmidt and Cyrus Shahabi Univ ersit y of Southern California, Los Angeles CA 90089-0781, USA frrs, shahabig@usc.edu Abstract. Man y range aggregate queries can b e eÆcien tly deriv ed from a class of fundamen tal queries: the p olynomial r ange-sums. After demon- strating ho w an y range-sum can be ev aluated exactly in the w a v elet domain, wein tro duce a no v el pre-aggregation metho d called ProP olyne to ev aluate arbitrary p olynomial range-sums progressiv ely.Ateac h step of the computation, ProP olyne mak es the b est p ossible w a v elet appro xi- mation of the submitted query . The result is a data-indep enden t appro x- imate query answ ering tec hnique whic h uses data structures that can be main tained eÆcien tly and exactly . ProP olyne’s p erformance as an exact algorithm is comparable to the b est kno wn MOLAP tec hniques. Exp erimen tal results sho w that this approachof appro ximating queries rather than compressing data pro duces consisten t and sup erior appro xi- mate results when compared to t ypical w a v elet-based data compression tec hniques. 1 In tro duction Range aggregate queries are a fundamen tal part of mo dern data analysis appli- cations. Man y recen tly prop osed tec hniques can b e used to ev aluate a v arietyof aggregation op erators from simple COUNT and SUM queries to statistics suc h as V ARIANCE and CO V ARIANCE. Most of these metho ds attempt to pro vide eÆcien t query answ ering at a reasonable up date cost for a single aggregation op- eration. An in teresting exception is [20], whic h p oin ts out that all second order statistical aggregation functions (including h yp othesis testing, principle comp o- nen t analysis, and ANO V A) can b e deriv ed from SUM queries of second order p olynomials in the measure attributes. These SUM queries can b e eÆcien tly sup- p orted using an y prop osed OLAP metho d. Higher order statistics can similarly b e reduced to sums of higher order p olynomials. The p o w er of these observ ations leads us to in tro duce the class of p olynomial r ange-sum aggregates in Section 3. ? This researc h has b een funded in part b y NSF gran ts EEC-9529152 (IMSC ER C) and ITR-0082826, NASA/JPL con tract nr. 961518, D ARP A and USAF under agreemen t nr. F30602-99-1-0524, and unrestricted cash/equipmen t gifts from NCR, IBM, In tel and SUN. There is a generic w a y to supp ort p olynomial range-sums of an y degree using existing OLAP tec hniques- simply treat eac h indep enden t monomial up to the required degree as a separate measure and build a new datacub e for eac h of these. This requires the measure attributes to be sp ecied when the database is p opulated, and has un wieldy storage and main tenance cost for higher order p olynomials. W e prop ose a no v el MOLAP tec hnique that can supp ort an y p olynomial range-sum query (up to a degree sp ecied when the database is p opulated) using a single set of precomputed aggregates. This extra p o w er comes with little extra cost: the query , up date, and storage costs are comparable to the best kno wn MOLAP tec hniques (see T able 1). W e ac hiev e this b y observing that p olynomial range-sums can b e translated and ev aluated in the w a v elet domain. When the w a v elets are c hosen to satisfy an appropriate moment c ondition,most of the query w a v elet co eÆcien ts v anish making the query easier to ev aluate. Wemak e this observ ation practical byin tro ducing the lazy wavelet tr ansform, an algorithm that translates p olynomial range-sums to the w a v elet domain in p oly-logarithmic time (Section 6). W a v elets are often though t of as a data appro ximation to ol, and ha v e b een used this w a y for appro ximate range query answ ering [7, 21]. The eÆcacy of this approac h is highly data dep enden t; it only w orks when the data ha v e a concise w a v elet appro ximation. F urthermore the w a v elet appro ximation is diÆcult to main tain. T o a v oid these problems, w e use w a v elets to appro ximate incoming queries rather than the underlying data. By using our exact p olynomial range- sum tec hnique, but using the largest query w a v elet co eÆcien ts rst w e are able to obtain accurate, data-indep enden t query appro ximations after a small n um ber of I/Os. This approac h naturally leads to a progressiv e algorithm. W e bring these ideas together byin tro ducing ProP olyne (ProgressiveP oly- nomial Range-Sum Ev aluator, Section 7), a p olynomial range-sum ev aluation metho d whic h 1. T reats all dimensions, including measure dimensions, symmetrically and sup- p orts range-sum queries where the measure is any p olynomial in the data dimensions. All computation is p erformed en tirely in the w a v elet domain (see Section 5). 2. Uses the lazy w a v elet transform to ac hiev e query and up date cost comparable to the b est kno wn exact tec hniques (see T able 1 and Section 6). 3. By using the most imp ortan t query w a v elet co eÆcien ts rst, pro vides excel- len t appro ximate results with v ery little I/O and computational o v erhead, reac hing lo w relativ e error far more quic kly than analogous data compression metho ds (see Section 7). 4. Pro vides eÆcien tly computable guaran teed error b ounds on all appro ximate query results (see Section 8). Exp erimen tal results on sev eral empirical datasets sho w that the appro xi- mate results pro duced b y ProP olyne are v ery accurate long b efore the exact query ev aluation is complete (Section 9). These exp erimen ts also showthat the p erformance of w a v elet based data appro ximation metho ds v aries wildly with the dataset, while query appr oximation based ProP olyne deliv ers consisten t, and consisten tly b etter, results. 2 Related W ork Extensiv e researc h has b een done to nd eÆcien tw a ys to ev aluate range aggre- gates. The prex-sum metho d [11] publicized the fact that careful pre-aggregation can b e used to ev aluate range aggregate queries in time indep enden t of the range size. This led to a n um b er of new tec hniques that pro vide similar b enets with dieren t query/up date cost tradeos [6, 5, 16, 3]. Iterativ e Data Cub es [17] gen- eralize all of these tec hniques, and allo w dieren t metho ds to b e used on eac h dimension. It is p oin ted out in [20] that statistical analysis of summary data can lead to signican t errors in applications suc h as regression rule mining. In order to pro vide eÆcien t OLAP-st yle range statistics, Multiv ariate Aggregate Views are prop osed. These pro vide a R OLAP tec hnique to supp ort p olynomial range-sums of degree 2. ProP olyne tak es a MOLAP approac h to this problem, and is able to use one set of precomputed aggregates to supp ort all p olynomial range-sums up to an arbitrary degree. Appro ximate query answ ering can b e used to deliv er fast query results. Use of synopsis data structures suc h as histograms has been analyzed thoroughly [13, 7]. References [19, 9, 18] create synopses that estimate the data frequency distribution. The exible measures supp orted b y this approac h inspire its use in this pap er. Online aggregation, or progressiv e query answ ering, has also b een prop osed [10, 15, 23, 12]. These metho ds, whether based on sampling, R-trees, or m ultiresolution analysis, share a common strategy: answ er queries quic kly using a lo w resolution view of the data, and progressiv ely rene the answ er while building a sharp er view. ProP olyne is fundamen tally dieren t. It mak es optimal progressiv e estimates of the query , not the data. F urthermore, the progressiv e ev aluation comes for free since ProP olyne is an excellen t exact algorithm. Recen tly w a v elets ha v e emerged as a p o w erful to ol for appro ximate answ ering of aggregate [7, 21, 23] and relational algebra [2] queries. Streaming algorithms for appro ximate p opulation of a w a v elet database are also a v ailable [8]. Most of the w a v elet query ev aluation w ork has fo cused on using w a v elets to compress the underlying data, reducing the size of the problem. ProP olyne can use compressed data, but is designed to w ork as an exact algorithm with uncompressed data. ProP olyne pro duces appro ximate query results b y compressing queries, not data. 3 Range Queries In this section wein tro duce the class of p olynomial r ange-sum queries, and sho w ho w w e use the data fr e quency distribution to recast these queries as ve ctor queries. Throughout this pap er wew ork with a nite instance I of a sc hema F (the ’fact table’). W e assume that eac h attribute of F is n umeric with domain of size N =2 j , and that F has d attributes. W e study the follo wing: Denition 1. Given a range R Dom (F ) and a measure function f : Dom (F ) ! R,the range-sum query Q(R; f )j I is e qual to Q(R; f )j I = P x 2I \R f ( x) The sum c ounts multiplicities in the multiset I . If f is a p olynomial of de gr e e Æ we c al l this a p olynomial range-sum of de gr e e Æ . Example 1. Consider a table of emplo y ee ages and salaries, tak en from the ex- ample in [19], with en tries [age, salary]: f [25, 50K], [28, 55K], [30, 58K], [50, 100K], [55, 130K], [57, 120K]g . Cho ose the range R to b e the set of all p oin ts where 25 age 40 and 55000 salary 150000. By c ho osing f ( x) 1( x)= 1the range-sum query returns the COUNT of the tuples that lie in R : Q(R; 1;I)= X x2R\I 1( x)= 1(28; 55K)+ 1(30; 58K)= 2 Cho osing f ( x) salary( x) the range-sum query computes the SUM of salary for tuples in the set R : Q(R; salary;I)= X x 2R\I salary( x)= f (28; 55K)+ f (30; 58K ) = 113K An A VERA GE query for a measure function giv en a range is the ratio of the SUM query with the COUNT query for that range. Finally , taking f ( x ) salary( x ) age( x ) the range-sum query computes the total pro duct of salary and age for tuples in the set R : Q(R; salary age;I)= X x2R\I salary( x ) age( x)= f (28; 55K )+f (30; 58K)= 3280M This is an imp ortantcomponen t for the computation of co v ariance in the range R . In particular Co v(age ; salary)= Q(R; salary age;I ) Q(R; 1;I ) Q(R; age;I )Q(R; salary;I ) (Q(R; 1;I )) 2 The v ariance, kurtosis, or an y other statistics in a range can be computed similarly . Denition 2. The data frequency distribution of I is the function I : Dom(F ) ! Z that maps a p oint x to the numb er of times it o c curs in I . T o emphasize the fact that a query is an op er ator on the data fr e quency distribu- tion, we write Q(R; f )j I = Q(R; f ; I ). When I is cle ar fr om c ontext we omit subscripts and simply write . Example 2. F or the table of Example 1, weha v e (25; 50K)= (28; 55K)= ::: = (57; 120K)= 1 and ( x ) = 0 otherwise. Noww e can rewrite the basic denition of the range-sum as. Q(R; f ; I )= X x 2Dom(F ) f ( x) R ( x ) I ( x) (1) where R is the c haracteristic function of the set R : R ( x)= 1 when x 2 R and is zero otherwise. Equation 1 can b e thoughtof asa v ector dot pro duct, where an y function g :Dom(F ) ! R is a v ector and for any x 2 Dom(F ), g ( x ) is the x- co ordinate of g.W e denote this scalar pro duct b y hg; hi = P x2Dom (F ) g ( x )h( x ). Allo wing us to write Q(R; f ; I )= hf R ; I i (2) dening a range-sum query as a ve ctor query: the scalar pro duct of a function completely dened b y the database instance ( I ) and another completely dened b y the query (f R ). W e refer to the function f R as the query function. 4 W a v elets Before pro ceeding, w e need some basic results ab out the fast w a v elet transform. W e refer the reader to [22] for a treatmen t similar to the one presen ted here. Readers familiar with Haar w a v elets will see that the w a v elet construction used in this pap er is a generalization of the Haar transform whic h main tains manyof the essen tial prop erties of the transform while pro viding more ro om for w a v elet design. Weuse w a v elets arising from pairs of orthogonal con v olution-decimation op- erators. One of these op erators, H , computes a lo cal a v erage of an arrayat ev ery other p oin t to pro duce an arrayof summary c o eÆcients. The other op erator, G, measures howm uchv alues in the arrayv ary inside eac h of the summarized blo c ks to compute an arrayof detail c o eÆcients. Sp ecically w e ha v e lters h and g suc h that for an arra y a of length 2q H a[i]= 2q 1 X j =0 h[(2i j ) mo d 2q ] a[j ] and Ga[i]= 2q 1 X j =0 g [(2i j ) mo d 2q ] a[j ] In order to ensure that H and G act as \summary" and \detail" lters, w ealso require that P h[j]= p 2, P g =0, P h 2 = P g 2 =1, and g [j]= ( 1) j h[1 j ]. These conditions imply that splitting an arra y in to summaries and details preserv es scalar pro ducts: 2q 1 X j =0 a[j ]b[j]= q 1 X i=0 H a[i]H b[i]+ q 1 X i=0 Ga[i]Gb[i] (3) Example 3. The Haar w a v elet summary lter h is dened b y h[0] = h[1] = 1= p 2, and h[i] = 0 otherwise. The Haar w a v elet detail lter has g [0] = 1= p 2, g [1] = 1= p 2,and g [i] = 0 otherwise. The con v olution-decimation op erators H and G corresp onding to these lters are orthogonal. T o obtain the discrete w a v elet transform, w e con tin ue this pro cess recur- siv ely,at eac h step splitting the summary arrayfrom the previous step in to a summary of summaries arra y and details of summaries arra y . In time (N)w e ha v e computed the discrete w a v elet transform. Denition 3. Given ortho gonal c onvolution-de cimation op er ators H and G, and an arr ay a of length 2 j , the discrete w a v elet transform [D WT] of a is the arr ay ^ a wher e ^ a[2 j j + k]= GH j a[k ] for 1 j j and 0 k< 2 j j . Dene ^ a[0] = H j a[0].We often r efer to the elements of ^ a as the w a v elet co eÆcien ts of a.We r efer to the arr ays H j a and GH j 1 a r esp e ctively as the summary co eÆcien ts and detail co eÆcien ts at level j . The follo wing consequence of Equation 3 is of fundamen tal imp ortance Lemma 1. If ^ a is the D WT of a and ^ b is the D WT of b then P ^ a [ ] ^ b[ ] = P a[i]b[i]. T o dene w a v elet transformations on m ultidimensional arra ys w e use the ten- sor pro duct construction, whic h has app eared recen tly in the database literature as the foundation of Iterativ e Data Cub es [17]. The idea of this construction is simple: giv en a library of one dimensional transforms, w e can build m ultidi- mensional transforms b y applying one dimensional transforms in eac h dimension separately . Sp ecically , w e note that since the one dimensional D WT is linear, it is represen ted b y a matrix [W ;i ] suc h that for an y arra y a of length N : ^ a[ ] = P N 1 i=0 W ;i A[i]Giv en a m ultidimensional arra y a[i 0 ;::: ;i d 1 ], p erforming this transform in eac h dimension yields the multivariate D WT of a ^ a[ 0 ;::: ; d 1 ]= N 1 X i 0 ;::: ;i d 1 =0 W 0 ;i 0 W 1 ;i 1 W d 1 ;i d 1 a[i 0 ;::: ;i d 1 ] Using the fast w a v elet transform for eac h of these one dimensional matrix m ul- tiplications allo ws us to compute this sum in (‘N d ) for d-dimensional data. Rep eated application of Lemma 1 yields N 1 X 0 ;::: ; d 1 =0 ^ a [ 0 ;:::; d 1 ] ^ b[ 0 ;:::; d 1 ]= X i 0 ;::: ;i d 1 a[i 0 ;::: ;i d 1 ]b[i 0 ;:::;i d 1 ] (4) 5 Naiv e P olynomial Range-Sum Ev aluation Using W a v elets No w w e sho w ho w an y range-sum query can be ev aluated in the w a v elet do- main. This discussion allo ws us to see ho w data can b e prepro cessed using the w a v elet transform, stored, and accessed for query ev aluation. W e also see that this metho d imp oses no storage cost for dense data, and increases the storage re- quiremen ts b y a factor of O (log d N)inthe w orst case. W e x orthogonal w a v elet lters h and g of length ‘. Com bining Equations 2 and 4, and in terpreting functions on Dom(F ) as d-dimensional arra ys, w e obtain a new form ula for range-sums Q(R; f ; )= h d f R ; ^ i = N 1 X 0 ;::: ; d 1 =0 d f R ( 0 ;:::; d 1 ) ^ ( 0 ;::: ; d 1 ) (5) Giving a tec hnique for ev aluation of range-sums with arbitrary measures en tirely in the w a v elet domain: giv en a dataset with data frequency distribution w e prepro cess it as follo ws W a velet Prepr ocessing 1. Prepare a [sparse] arra y represen tation of . This requires time prop ortional to jI j, the size of the dataset. 2. Use the m ultidimensional fast w a v elet transform to compute the [sparse] ar- ra y represen tation of ^ . If the data are sparse, this requires time O (jI j‘ d log d N ). If the data are dense, this requires time N d = O (jI j). 3. Store the arra y represen tation of ^ . If the data are sparse, use a hash- index. If the data are dense, use arra y-based storage. In either case weha v e essen tially constan t time access to an y particular transform v alue. F or dense data, this prepro cessing step in tro duces no storage o v erhead. The w orst p ossible storage o v erhead arises when the dataset has only one record. O (‘ d log d N ) nonzero w a v elet transform v alues need to b e stored in this case. With our storage and access metho ds established, weno w discuss query ev al- uation. In order to use Equation 5 to ev aluate a general range-sum with query function f R using the stored w a v elet transform data, w e pro ceed as follo ws Naive W a velet Quer y Ev alua tion 1. Compute the w a v elet transformation d f R . Using the fast w a v elet transform requires time O (‘ log d N ). Initialize sum 0. 2. F or eachen try =( 0 ;::: ; d 1 ) in the arra y represen tation of d f R , retriev e ^ ( ) from storage and set sum sum + d f R ( ) ^ ( ). F or general query functions, there are O (N d ) items retriev ed from storage. When complete, sum is the query result. This is a c orr e ct metho d for ev aluating range-sum in the w a v elet domain, but it is not an eÆcient metho d. In Section 6 wesho w that for p olynomial range- sums it is p ossible to impro v e b oth the query transformation cost and the I/O cost dramatically . 6 F ast Range-Sum Ev aluation Using W a v elets With this bac kground established, w e presen t a rst v ersion of ProP olyne. F or p olynomial range-sums of degree less than Æ in eac h attribute, this algorithm has time complexit y O ((2‘ log N ) d ) where ‘ =2Æ + 1. The data structure used for storage can b e up dated in time O ((‘ log N ) d ). T o our kno wledge there are no existing COUNT or SUM ev aluation algorithms that pro vide faster query ev al- uation without ha ving slo w er up date cost. The algorithm requires prepro cessing the data as describ ed in Section 5. If the storage cost is prohibitiv e, it is pos- sible to store a w a v elet synopsis of the data and use this algorithm to ev aluate appro ximate query results [21] or use this tec hnique for dense clusters [11]. In Section 7 weshowho w ProP olyne can b e rened to deliv er go o d progressiv e estimates of the nal query result b y retrieving data in an optimal order. A t the heart of our query ev aluation tec hnique is a fast algorithm for com- puting w a v elet transforms of query functions, and an observ ation that these transforms are v ery sparse. This allo ws us to ev aluate queries using Equation 5 quic kly . 6.1 In tuition W e can see wh y query functions can be transformed quic kly and ha v e sparse transforms b y lo oking at a v ery simple example: consider the problem of trans- forming the indicator function R of the in terv al R =[5; 12] on the domain of in tegers from 0 to 15. W e use the Haar lters dened in Example 3. Before com- puting the rst recursiv e step of the w a v elet transformation w e already can sa y that 1. The detail co eÆcien ts are all zero unless the detail lter o v erlaps the b ound- ary of R : G R (i)= R (2i +1) R (2i)= 0 if i 62f2; 6g 2. The summary co eÆcien ts are all zero except when the summary lter o v er- laps R : H R (i)=0 when i 62 [2; 6] 3. The summary co eÆcien ts are constan t when the summary lter is con tained in R : H R (i)= p 2 when 2 u s j +1 +2‘ 1 then the induction h yp othesis implies that S (x; j ) is zero. Also, if l s j +1 + ‘ 1 2x u s j +1 then S (x; j )= ‘ 1 X k =0 h k p j +1 (2x k ) p j +1 (x) Where w e can compute the co eÆcien ts of p j +1 (x) P ‘ 1 k =0 h k p j +1 (2x k)in time O (Æ‘). Cho osing l s j = d l s j +1 +‘ 1 2 e and u s j = b u s j +1 2 c w e nd that the lemma holds for lev el j . Notice that it ma y b e the case that l s j u s j ,giving us eectiv ely one short in terv al of nonzero co eÆcien ts. This completes the induction step and the pro of is complete. 2 Giv en this simple structure of the summary co eÆcien ts it is nowv ery easy to sa y without computation that most of the detail co eÆcien ts are zero, and that the lo cation of the few nonzero terms is easy to compute. Lemma 3. The detail c o eÆcients of the r ange function in L emma 2 at level j < 0 ar e supp orte d in two (not ne c essarily disjoint) intervals [l d j ;l d j + ‘] and u d j ;u d j + ‘] wher e l d j = b l s j +1 2‘+2 2 c and u d j = d u s j +1 ‘+2 2 e. Pro of: This is v ery similar to the pro of of Lemma 2, only the momen t condition for the lter G nowmak es the terms b et w een l d j + ‘ and u d j v anish. The fact that wec hose G to b e supp orted on the in terv al [2 ‘; 1] also aects the b o okk eeping that leads to the nal form of the results. 2 Pro of of Theorem 1: With the lemmas ab o vew e see that eachofthe log N recursiv e steps of the w a v elet transform can b e carried out in time O (‘) b ecause there are only O (‘)\in teresting" co eÆcien ts to compute. This giv es a total time complexityof O (‘ log N ). Lemma 3 implies that at eac h of the log N resolution lev els at most 2‘ 2 nonzero detail co eÆcien ts are added, completing the pro of. 2 Note that w e can use Daub ec hies’ construction of compactly supp orted or- thogonal w a v elets to pro duce w a v elets satisfying M (Æ ) that ha v e lter length ‘ = 2Æ +2. Using these lters it is p ossible to transform p olynomial query functions of degree less than Æ in time (Æ log N)and the result has less than (4Æ + 2) log N nonzero co eÆcien ts. 6.3 P olynomial Range-Sum Ev aluation In Section 5 w e discussed ho w w a v elet data can be prepro cessed, stored, and accessed. Noww e showho w using the lazy w a v elet transform can dramatically sp eed up query ev aluation in the w a v elet domain. First w e dene a sp ecial t yp e of p olynomial range query that is particularly easy to w ork with. Denition 5. Ap olynomial r ange-sum with me asur e function f :Dom(F ) ! R is said to satisfy c ondition S (Æ ) if f (x 0 ;:::;x d 1 )= Q d 1 i=0 p i (x i ) and e ach p i has de gr e e less than or e qual to Æ,and R is a hyp er-r e ctangle. All of the queries in Example 1 satisfy this condition: COUNT satises S (0), SUM, A VERA GE, and CO V ARIANCE are computed with range-sums satisfying S (1), and V ARIANCE is computed with range-sums satisfying S (2). As Example 1mak es clear, the class of p olynomial range-sums is ric h, ev en restricted to satisfy condition S (Æ ). Consider a p olynomial range-sum with query function f R = Q d 1 j =0 p j (i j ) R j (i j ). By Equation 4 d f R = \ Q p j R j = Q \ p j R j . If f R satises S (Æ ), then using Daub ec hies’ lters of length 2Æ +2, w e can compute (a data structure repre- sen ting) d f R in time (dÆ log N ), and d f R has O ((4Æ +2) d log d N ) nonzero co eÆcien ts. Th us w e ha v e the follo wing query ev aluation tec hnique, whic h w e presen t for queries satisfying S (Æ ). General m ultiv ariate p olynomials of degree Æ can alw a ys b e split in to a sum of p olynomials satisfying S (Æ ), eac h of whichis transformed separately . F ast W a velet QueryEv alua tion 1. Use the lazy w a v elet transform to compute the d one-dimensional w a v elet transforms \ p j R j . Initialize sum 0. 2. Iterate o v er the Cartesian pro duct of the nonzero co eÆcien ts of the \ p j R j . F or eac h =( 0 ;:::; d 1 ) in this set, retriev e ^ ( ) from storage and set sum sum + ^ ( ) Q \ p j R j ( j ). When complete, sum is the query result. Notice that this algorithm is dominated b y the I/O phase (step 2) for d> 1. The online prepro cessing phase (step 1) tak es time and space (dÆ log N ). The lazy w a v elet transform also gives usaw a y to p erform fast up dates of the database. T o insert a record i =(i 0 ;:::;i d 1 ), let i denote the function equal to 1at i and zero elsewhere. Then the up dated data frequency distribution is new = + i . By the linearityof the w a v elet transform, weha v e [ new = ^ +^ i . Th us to p erform an up date, w e simply compute ^ i and add the results to storage. F urthermore, i can b e though t of as a query function satisfying S (0), so w e can transform it just as w e transform other query functions. A careful lo ok rev eals that in this case the \in teresting in terv als" o v erlap completely , and ^ i can be computed in time ((2Æ +1) d log d N ). W a velet Upd a te 1. T o insert i = (i 0 ;:::;i d 1 ) use the lazy w a v elet transform to compute ^ i j for 0 j< d. 2. F or eac h =( 0 ;:::; d 1 ) in the Cartesian pro duct of the nonzero en tries of ^ i j ,set ^ ( ) ^ ( ) Q ^ i j ( j ). F or hash table access, this ma y require an insertion. Theorem 2. Using Daub e chies’ wavelets with lter length 2Æ +2,the W a velet Prepr ocessing str ate gy to storedata, W a velet Upd a te to insert new r e c or ds, and F ast W a velet Quer y Ev alua tion to evaluate queries it is p ossible to evaluate any p olynomial r ange-sum satisfying S (Æ ) in time O ((4Æ +2) d log d N ). It is p ossible to insert one r e c or d in time O ((2Æ +1) d log d N ). 7 Progressiv e P olynomial Range-Sum Ev aluation Note that in Equation 6 some of these co eÆcien ts are larger than others. F or larger ranges and higher dimensions, this phenomenon b ecomes dramatic: most of the information is con tained in a handful of co eÆcien ts. In tuitiv ely , if w e use the largest co eÆcien ts rst w e should obtain an accurate estimate of the query result b efore the computation is complete. This is the basic idea b ehind ProP olyne, but b efore pro ceeding w e pause to ask precisely whyw e think that ev aluating queries using the ’big’ co eÆcien ts rst giv es b etter appro ximate an- sw ers. Once w e can state precisely howthis ev aluation order is b etter, wepro v e that ProP olyne uses the b est p ossible ev aluation order. Denition 6. A progressiveev aluation plan for a sum S = P 0 i<N a[i] is a p ermutation of the inte gers 0 to N 1. The estimate of S at the j th progressiv e step of this plan is P 0 i<j a[ (i)]. 7.1 Query Best-B ProP olyne Howdow e determine whether one progressiveev aluation plan pro duces b etter results than another? When ev aluating range aggregate queries using Equation 5, w e m ust c ho ose a data indep enden t progressiv e ev aluation plan for a sum of the form P d f R (i) ^ (i). We cannotlookat ^ un til executing our plan, but wew an t to minimize the a v erage square error observ ed at eac h progressivestep when op erating on a random dataset. Denition 7. L et ~ Q 1 ( ) and ~ Q 2 ( ) b e appr oximations of the query Q(R; f ; ). We say ~ Q 1 dominates ~ Q 2 if E [( ~ Q 1 ( ) Q(R; f ; )) 2 ] E [( ~ Q 2 ( ) Q(R; f ; )) 2 ] wher e is r andomly sele cte d fr om the set f j P 2 (i) = 1g. We say that one pr o gr essive query plan dominates another if the estimate of the rst plan dominates the estimate of the se c ond at every pr o gr essive step. Wew anttondaprogressiv eev aluation plan that dominates all others. It is not ob vious that this is p ossible, but the follo wing result sho ws that the ’biggest co eÆcien t’ plan suggested at the b eginning of the section is in fact the b est. Theorem 3. L et y b e ave ctor r andomly sele cte d fr om the set fy j P N 1 i=0 y 2 i = 1g = S N 1 with uniform distribution, let I [0;N 1] b e a set of size B,and for a ve ctor x let ~ x I denote x with al l c o or dinates exc ept those in I settozer o. Denote the set of the B biggest (lar gest magnitude) c o or dinates of x by I .Then for any choiceof I we have E y [hx ~ x I ;y i 2 ] E y [hx ~ x I ;y i 2 ] In other wor ds, appr oximating x with its biggest B terms gives us the best B term appr oximation of hx; y i- an appr oximation that dominates al l others. Pro of: F or an y I , y weha v e hx ~ x I ;y i = P i;j 62I y i (x i x j )y j = y T Ry , where R = ~ x T I ~ x I is a symmetric matrix. The a v erage error can be obtained b y in te- grating o v er the sphere of all y ’s E y [hx ~ x I ;y i 2 ]= Z S N 1 y T Ry = Z S N 1 y T Dy = X i Z S N 1 y 2 i = 1 n trace (R ) where D is the diagonalization of R and i are the eigen v alues. This c hain of equalities is p ossible b ecause R is symmetric, hence is diagonalized b y a unitary transformation that preserv es the uniform distribution. But the trace of R is just the sum of squares of the co ordinates of ~ x I ,so E y [hx ~ x I ;y i 2 ]= 1 n P i62I x 2 i , whic h is clearly minimized b y taking I = I . 2 Corollary 1 The pr o gr essive query plan obtaine d by using the lar gest query c o- eÆcients rst dominates al l other plans. This ev aluation plan is the foundation for our progressiv e algorithm, ProP olyne. Because after B progressiv e steps ProP olyne pro vides the best-B w a v elet ap- pro ximation of a query , w e refer to this tec hnique as query b est-B Pr oPolyne. W e implemen t this progressiv e query plan b y rst ev aluating the query func- tion transformation using Theorem 1, then building a heap from the resulting set of nonzero w a v elet co eÆcien ts. Compute the sum rep eatedly extracting the top elemen t from this heap- the partial sums pro vide accurate progressiv e query estimates. As describ ed in Section 8, this analytical framew ork also pro vides eÆcien tly computable guaran teed error b ounds at eac h progressiv e step. 7.2 Data b est-B ProP olyne Previous uses of w a v elets for appro ximate query ev aluation ha v e fo cused on data appr oximation, using w a v elets to pro duce a precomputed synopsis data structure. The reader ma y note that the r^ ole of the data and the query in the results of Section 7.1 w ere en tirely symmetric. If w e can sa y \the biggest-B appro ximation of a query is the b est-B appro ximation for random data", w e can just as easily sa y \the biggest-B appro ximation of the data is the best- B appro ximation for random queries". Hence w e can obtain a dieren t sort of progressiv e query ev aluation b y sorting the data w a v elet co eÆcien ts oine, then retrieving them in decreasing order of magnitude. This is the spirit of the appro ximate query ev aluation algorithms of [21], where it is sho wn that this giv es reasonable estimates quic kly . The tec hnique presen ted here has the extra b enet of treating measure dimensions symmetrically , supp orting general p olynomial range-sums. W e call this tec hnique data b est-B Pr oPolyne. Unfortunately , query w orkloads are v ery far from b eing randomly distributed on the unit sphere. In practice, the b est ordering w ould be w eigh ted b y the exp ected query w orkload. In an y case, this tec hnique only w orks w ell if the data are w ell appro ximated b y the c hosen w a v elets. This stands in con trast to query b est-B ProP olyne, where the query functions are alw a ys w ell appro ximated b y w a v elets. In Section 9 w e see that the p erformance of data best-B ProP olyne v aries dramatically with the dataset, and is consisten tly less accurate than query b est-B ProP olyne. 7.3 Ev aluation of Fixed-Measure SUM Queries In practice, there are situations where the measures and aggregate functions are kno wn at the time the database is built. When this is the case, ProP olyne can b e optimized. In particular, it can b e adapted to op erate on the w a v elet trans- form of the measure function rather than the frequency distribution. Wecallthis adaptation xe dme asur e ProP olyne, or ProP olyne-FM. One notable optimiza- tion that ProP olyne-FM allo ws is the use of Haar w a v elets in all dimensions. The fact that one-dimensional Haar w a v elets do not o v erlap brings query ev alu- ation cost do wn to (2j ) d 1 and up date cost to j d 1 for a table with d 1 j -bit attribute and one measure attribute. W e note that this cost is iden tical to that of the Space-eÆcien t Dynamic Data Cub e [16]. ProP olyne-FM serv es another useful r^ ole : it solv es the same problem as other pre-aggregation metho ds, so it is directly comparable. The pro cess of turning ProP olyne-FM in to a data or query b est-B progres- siv e algorithm is iden tical to the pro cess for unrestricted ProP olyne. It happ ens that when using Haar w a v elets, the data best-B v ersion of ProP olyne-FM is simply a progressiv e implemen tation of the c ompactdatacub e presen ted in [21]. The query b est-B v ersion of ProP olyne-FM is no v el, and w e see in Section 9 that appro ximate results it pro duces are signican tly more accurate than those pro duced b y the compact data cub e. F or the remainder of the pap er w e use ProP olyne-FM to mean this query b est-B v ersion. 8 Guaran teed Error Bounds An imp ortan t comp onentof anyappro ximation tec hnique is its abilityto pro- vide meaningful information ab out the distribution of errors. In this section w e presen t a family of analytic absolute b ounds for the error of estimates made using the algorithms of Section 7. These b ounds can be main tained eÆcien tly and made a v ailable throughout progressiv e query ev aluation. PR OPOL YNE ev aluates queries b y selecting the most imp ortantw a v elets for the query and ev aluating Equation 5 using these terms rst. The error resulting from using only the best B co eÆcien ts is straigh tforw ard, if not eÆcien t, to compute: simply compute the scalar pro duct using the least imp ortan t 2 j B co eÆcien ts. W e pro duce a tractable b ound on this v alue b y using Holder’s inequalit y in eac h resolution lev el. Sp ecically ,if B denotes the set of indices of the most imp ortan t B w a v elets, ~ Q B (R; f ; ) is the appro ximate range sum obtained using only w a v elets in B , and denotes the set of w a v elet resolution lev els, then weha v e the follo wing error b ound: jQ(R; f ; ) ~ Q N (R; f ; )j P 2 h P 2 n B j d f R ( )j p i 1=p h P 2 n B j ^ ( )j p =(p 1) i (p 1)=p (7) where 1 p 1 for eac h . It is p ossible to ac hievemost of the benet of the b ound in (7) more c heaply b y iden tifying a small set of resolution lev els that con tain most of the energy for the data densit y distribution or for an exp ected query w orkload. All other resolution lev els are group ed in to one large lev el and estimate (7) is main tained for the reduced set. 9 Exp erimen tal Results In this section wepresen t results from our exp erimen ts with ProP olyne and re- lated algorithms in order to pro vide the reader with an o v erview of ho w these tec hniques p erform on real-w orld data. ProP olyne-FM’s w orst-case p erformance as an exact algorithm (whichis v ery similar to the p erformance of ProP olyne) is compared with related exact pre-aggregation metho ds in T able 1. Our fo cus in this section is on the accuracy of ProP olyne’s progressiv e estimates. Our ex- p erimen ts sho w that w a v elet-based query appro ximation deliv ers consisten tly accurate results, ev en on datasets that are p o orly appro ximated b y w a v elets. Not only is the p erformance consisten t, it is consisten tly b etter than data ap- pro ximation. By directly comparing our metho ds with the w a v elet-based data compression metho d prop osed b y Vitter and Wang[21]w e see that query ap- pro ximation based ProP olyne deliv ers signican tly more accurate results after retrieving the same n um ber of v alues from p ersisten t storage. 9.1 Exp erimen tal Setup W e rep ort results from exp erimen ts on three datasets. PETR OL is a set of p etroleum sales v olume data with 56504 tuples, sparseness of 0.16%, and v e dimensions: lo cation, pro duct, y ear, mon th, and v olume (thousands of gallons). PETR OL is our example of a dataset for whic h traditional data appro ximation w orks w ell [1]. Algorithm Query Cost Up date Cost Storage Cost ProP olyne-FM (2 log N ) d log d N min fjI j log d N; N d g SDDC [16] (2 log N ) d log d N N d Prex-Sum [11] 2 d N d N d Relativ e Prex-Sum [6] 2 2d N d=2+1 N d T able 1. Query/up date/storage tradeo for sev eral exact SUM algorithms GPS 1 is a set of sensor readings from a group of GPS ground stations lo cated throughout California. W e use a pro jection of the a v ailable data to pro duce a dataset with 3358 tuples, sparseness of 0.01%, and four dimensions: latitude, longitude, time, and heightv elo cit y . The presence of a tuple (lat; long ; t; v) means that a sensor observ ed that the ground at co ordinates (lat ; long) w as mo ving up w ard with a v elo cit y of v at time t. GPS is our example of a dataset for whic h traditional data appro ximation w orks p o orly . TEMPERA TURE is a dataset holding the temp eratures at poin ts all o v er the glob e and at 20 dieren t altitudes on Marc h 1, 2001. It has 207360 tuples, sparseness of 1.24%, and four dimensions: latitude, longitude, altitude, and tem- p erature. The TEMPERA TURE dataset is considerably larger than the GPS and PETR OL datasets, and w e use it to emphasize the fact that as datasets get larger, the b enet of using ProP olyne increases. F or all tests, 250 range queries w ere generated randomly from the set of all p ossible ranges with the uniform distribution. If a generated range selects few er than 100 tuples from the dataset, it is discarded and a new range is generated. All graphs displa y the progressiv e accuracy impro v ementof v arious appro x- imation tec hniques for queries on a single dataset. The horizon tal axis alw a ys displa ys the n um ber of v alues retriev ed b y an algorithm on a logarithmic scale. The v ertical axis of eac h graph displa ys the median relativ e error for a set of generated queries. Relativ e error is used so that queries returning large results do not dominate the statistics. Median error is used rather than mean error in order to a v oid the noise caused b y the one-sided fat tail of observ ed relativ e error. The results using mean error are qualitativ ely similar, but are not as smo oth. 9.2 P erformance for Fixed Measure Range-sums Figure 1 compares the p erformance of ProP olyne-FM with a progressivev ersion of the c ompactdatacub e (CDC) [21] on the PETR OL and GPS datasets. Other tec hniques, including [9], ha v e b een compared fa v orably to the CDC. Wedonot directly compare ProP olyne to these b ecause they ha v e no progressiv e analog. W e see that CDC w orks v ery w ell on the PETR OL dataset, pro ducing a median relativ e error under 10% after using less than 100 w a v elet co eÆcien ts. Still, ProP olyne-FM w orks b etter than CDC from the b eginning, and this dierence only gro ws as ev aluation progresses. The dierence b et w een the p erformance of 1 Wew ould lik e to thank our colleagues at JPL, Brian Wilson and George Ha jj, for pro viding us with the GPS and TEMPERA TURE datasets. 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2 10 3 Number of Values Retrieved (log scale) Median Relative Error (% on log scale) ProPolyne−FM CDC (a) PETR OL (mean selectivit y: 22.3%) 10 0 10 1 10 2 10 3 10 4 10 5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2 10 3 10 4 10 5 Number of Values Retrieved (log scale) Median Relative Error (% on log scale) ProPolyne−FM CDC (b) GPS (mean selectivit y: 20.4%) Fig. 1. Progressiv e accuracy for Compact Data Cub e (CDC) [21] and ProP olyne-FM. the t w o tec hniques on the GPS dataset is striking: CDC m ust use more than v e times as manyw a v elet co eÆcien ts as there w ere tuples in the original table b efore pro viding a median relativ e error of 10%. ProP olyne-FM reac hes this lev el of accuracy after retrieving less than 300 w a v elet co eÆcien ts. 9.3 P erformance for General Range-sums Figure 2 compares the p erformance of data b est-B ProP olyne and query b est- B ProP olyne on the PETR OL and GPS datasets. Unlik e the previous section, the queries for these tests slice in all dimensions, including the measure dimen- sion. Data b est-B ProP olyne can be though t of as an extension of CDC that supp orts this ric her query set. As in the previous section, the metho d based on query appro ximation consisten tly and signican tly outp erforms the analogous data compression metho d. By the time query b est-B ProP olyne has ac hiev ed a median error of 10%, data b est-B ProP olyne still has a median error of near 100% for the PETR OL dataset. The data b est-B ProP olyne error for the GPS dataset at this p oin t is enormous. Notice also that the data b est-B results exhibit accuracy \clis" where the progression reac hes a set of co eÆcien ts that are par- ticularly imp ortan t for the giv en query w orkload. This hin ts that query w orkload information is critical to impro ving the ordering of data b est-B co eÆcien ts. Finally , Figure 3 illustrates the progressiv e accuracy of query b est-B ProP olyne On the TEMPERA TURE dataset. Figure 3(a) displa ys relativ e error for A VER- A GE queries on randomly generated ranges of dieren t sizes. W e dene the size of a range to b e the pro duct of the lengths of its sides. Larger ranges ha v e b etter appro ximate results, suggesting that a basis other than w a v elets ma y pro vide b etter appro ximation of query w orkloads with small ranges. 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2 10 3 10 4 10 5 Number of Values Retrieved (log scale) Median Relative Error (% on log scale) Query Best−B ProPolyne Data Best−B ProPolyne (a) PETR OL (mean selectivit y: 18.2%) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Number of Values Retrieved (log scale) Median Relative Error (% on log scale) Query Best−B ProPolyne Data Best−B ProPolyne (b) GPS (mean selectivit y: 17.5%) Fig. 2. Progressiv e query accuracy for data b est-B and query b est-B ProP olyne. 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Window Size Category Median Relative Error (a) Relativ e error vs. range size af- ter retrieving 500 v alues 10 0 10 1 10 2 10 3 10 4 10 5 10 −3 10 −2 10 −1 10 0 10 1 10 2 10 3 Number of Values Retrieved (log scale) Median Relative Error (% on log scale) COUNT SUM AVERAGE COVARIANCE (b) Second order statistics (mean selectivit y 16.5%) Fig. 3. Query Best-B ProP olyne on the TEMPERA TURE dataset. F or (a), range size categories are as follo ws. category 1: size < 5000, category 2: 5000 size < 10000, category 3: 10000 size < 20000, category 4: 20000 size < 40000, category 5: 40000 size < 80000, category 6: 80000 size. Figure 3(b) displa ys progressiv e relativ e error for COUNT, SUM, A VER- A GE, and CO V ARIANCE queries on the TEMPERA TURE dataset. Here w e note that COUNT, SUM, and A VERA GE all obtain excellen t accuracy after re- trieving a v ery small n um ber of w a v elet co eÆcien ts. A VERA GE is signican tly more accurate early in the computation, obtaining a median relativ e error b elo w 10% using just 16 data w a v elet co eÆcien ts. COUNT and SUM b oth ac hievethis lev el of accuracy using close to 100 data w a v elet co eÆcien ts. CO V ARIANCE stands out b y not ha ving signican tly impro ving accuracy un til near the end of the computation. W e emphasize that w e obtain exact results for CO V ARIANCE just as quic kly as for other query t yp es. This slo w con v ergence largely due to the fact that w e compute the co v ariance b y subtracting t w o large appro ximate n um b ers to obtain a relativ ely small n um ber. 10 Conclusions and F uture Plans In this pap er w e presen t ProP olyne, a no v el MOLAP pre-aggregation strategy whic h can b e used to supp ort con v en tional queries suchasCOUNT and SUM alongside more complicated p olynomial r ange-sums.ProP olyne is the rst pre- aggregation strategy that do es not require measures to b e sp ecied at the time of database p opulation. Instead, measures are treated as functions of the attributes whic h can be sp ecied at query time. This approac h leads naturally to a new data indep endent progressiv e and appro ximate query answ ering tec hnique whic h deliv ers excellen t results when compared to other prop osed data compression metho ds. ProP olyne deliv ers all of these features with pro v ably p oly-logarithmic w orst-case query and up date cost, and with storage cost comparable to or b etter than other pre-aggregation metho ds. Wein tend to extend this w ork in sev eral w a ys. Preliminary exp erimen ts indi- cate that using synopsis information ab out query w orkloads or data distributions can dramatically impro v e sort orders for b oth query b est-B and data b est-B tec h- niques. Dimensionalit y reduction tec hniques can impro v e I/O complexityat the exp ense of some accuracy in the nal results. As presen ted here, ProP olyne re- quires random access to stored data; w e will explore clustering strategies whic h tak e adv an tage of ProP olyne’s unique access patterns. Finally w e wish to ex- plore the limits of linear algebraic query appro ximation for appro ximate query answ ering. This includes nding complexit y lo w er bounds, in v estigating more complex queries (e.g. OLAP drill-do wn, relational algebra), and making an eÆ- cien t adaptivec hoice of the b est basis for ev aluating incoming queries. References 1. J. L. Am bite, C. Shahabi, R. R. Sc hmidt, and A. Philp ot. F ast appro ximate ev aluation of OLAP queries for in tegrated statistical data. In Nat’l Conf. for Digital Government R ese ar ch, L os A ngeles,Ma y 2001. 2. K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Appro ximate query pro cessing using w a v elets. In Pr o c. VLDB, pages 111{122, 2000. 3. C.-Y. Chan and Y. E. Ionnidis. Hierarc hical cub es for range-sum queries. In Pr o c. VLDB, pages 675{686, 1999. 4. I. Daub ec hies. Orthonormal bases of compactly supp orted w a v elets. Comm. Pur e and Appl. Math., 41:909{996, 1988. 5. S. Gener, D. Agra w al, and A. E. Abbadi. The dynamic data cub e. In Pr o c. EDBT, pages 237{253, 2000. 6. S. Gener, D. Agra w al, A. E. Abbadi, and T. Smith. Relativ e prex sums: An eÆcien t approac h for querying dynamic OLAP data cub es. In Pr o c. ICDE, pages 328{335, 1999. 7. A. C. Gilb ert, Y. Kotidis, S. Muth ukrishnan, and M. J. Strauss. Optimal and appro ximate computation of summary statistics for range aggregates. In Pr o c. A CM PODS, pages 228{237, 2001. 8. A. C. Gilb ert, Y. Kotidis, S. Muth ukrishnan, and M. J. Strauss. Surng w a v elets on streams: One-pass summaries for appro ximate aggregate queries. In Pr o c. VLDB, 2001. 9. D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Appro ximating m ulti- dimensional aggregate range queries o v er real attributes. In Pr o c. A CM SIGMOD, pages 463{474, 2000. 10. J. M. Hellerstein, P . J. Haas, and H. W ang. Online aggregation. In Pr o c. A CM SIGMOD, pages 171{182, 1997. 11. C. Ho, R. Agra w al, N. Megiddo, and R. Srik an t. Range queries in OLAP data cub es. In Pr o c. A CM SIGMOD, pages 73{88, 1997. 12. I. Lazaridis and S. Mehrotra. Progressiv e appro ximate aggregate queries with a m ulti-resolution tree structure. In Pr o c. A CM SIGMOD, pages 401{412, 2001. 13. V. P o osala and V. Gan ti. F ast appro ximate answ ers to aggregate queries on a data cub e. In Pr o c. SSDBM, pages 24{33, 1999. 14. W. Press, S. T euk olsky,W.V etterling, and B. Flannery . Numeric al R e cip es in C. Cam bridge Univ. Press, 1992. 15. M. Riedew ald, D. Agra w al, and A. E. Abbadi. pCub e: Up date-eÆcien t online aggregation with progressiv e feedbac k. In Pr o c. SSDBM, pages 95{108, 2000. 16. M. Riedew ald, D. Agra w al, and A. E. Abbadi. Space-eÆcien t datacub es for dy- namic en vironmen ts. In Pr o c. of Conf. on Data War ehousing and Know le dge Dis- c overy (DaWaK), pages 24{33, 2000. 17. M. Riedew ald, D. Agra w al, and A. E. Abbadi. Flexible data cub es for online aggregation. In Pr o c. ICDT, pages 159{173, 2001. 18. R. R. Sc hmidt and C. Shahabi. ProP olyne: A fast w a v elet-based tec hnique for fast ev aluation of p olynomial range-sum queries. In Pr o c. EDBT, Springer, 2002. 19. R. R. Sc hmidt and C. Shahabi. W a v elet based densit y estimators for mo deling OLAP data sets. In SIAM Workshop on Mining Scientic Datasets, Chicago, April 2001. Av ailable at h ttp://infolab.usc.edu/publication.h tml. 20. J. Shanm ugasundaram, U. F a yy ad, and P . Bradley . Compressed data cub es for OLAP aggregate query appro ximation on con tin uous dimensions. In Pr o c. SIGKDD, August 1999. 21. S.-C. Shao. Multiv ariate and m ultidimensional OLAP. In Pr o c. EDBT, pages 120{134, 1998. 22. J. S. Vitter and M. W ang. Appro ximate computation of m ultidimensional ag- gregates of sparse data using w a v elets. In Pr o c. A CM SIGMOD, pages 193{204, 1999. 23. M. V. Wic k erhauser. A dapte d Wavelet A nalysis: F r om The ory to Softwar e. IEEE Press, 1994. 24. Y.-L. W u, D. Agra w al, and A. E. Abbadi. Using w a v elet decomp osition to supp ort progressiv e and appro ximate range-sum queries o v er data cub es. In Pr o c. CIKM, pages 414{421, 2000.
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 744 (2001)
PDF
USC Computer Science Technical Reports, no. 893 (2007)
PDF
USC Computer Science Technical Reports, no. 826 (2004)
PDF
USC Computer Science Technical Reports, no. 721 (2000)
PDF
USC Computer Science Technical Reports, no. 766 (2002)
PDF
USC Computer Science Technical Reports, no. 959 (2015)
PDF
USC Computer Science Technical Reports, no. 839 (2004)
PDF
USC Computer Science Technical Reports, no. 855 (2005)
PDF
USC Computer Science Technical Reports, no. 896 (2008)
PDF
USC Computer Science Technical Reports, no. 587 (1994)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 733 (2000)
PDF
USC Computer Science Technical Reports, no. 645 (1997)
PDF
USC Computer Science Technical Reports, no. 799 (2003)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 592 (1994)
PDF
USC Computer Science Technical Reports, no. 943 (2014)
PDF
USC Computer Science Technical Reports, no. 646 (1997)
PDF
USC Computer Science Technical Reports, no. 701 (1999)
PDF
USC Computer Science Technical Reports, no. 740 (2001)
Description
Rolfe R. Schmidt and Cyrus Shahabi. "ProPolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 754 (2002).
Asset Metadata
Creator
Schmidt, Rolfe R.
(author),
Shahabi, Cyrus
(author)
Core Title
USC Computer Science Technical Reports, no. 754 (2002)
Alternative Title
ProPolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
21 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269661
Identifier
02-754 ProPolyne A Fast Wavelet-based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries (filename)
Legacy Identifier
usc-cstr-02-754
Format
21 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/