Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 744 (2001)
(USC DC Other)
USC Computer Science Technical Reports, no. 744 (2001)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
POLAP: A F ast W a v elet-based T e
hnique for Progressiv e
Ev aluation of OLAP Queries
Rolfe R. S
hmidt Cyrus Shahabi
Computer S
ien
e Departmen t
Univ ersit y of Southern California
Los Angeles, CA 90089-0781
[rolfes
h, shahabi℄ us
.edu
Abstra
t
Range sum queries are one of the most basi
t yp es of de
ision supp ort query , but ev en the
b est prop osed te
hniques for their ev aluation
s
ale p o orly with the dimension of the data
domain. W e prop ose a new t yp e of algorithm
that pro du
es exa
t range sum query results
as eÆ
ien tly as the b est kno wn metho ds, but
also pro du
es a
urate estimates of the query
result long b efore the exa
t
omputation is
omplete. This
om bination allo ws us to de-
liv er progressiv e query ev aluation: w e pro vide
a qui
k appro ximate query resp onse that b e-
omes in
reasingly a
urate as the
omputa-
tion progresses.
POLAP , our algorithm for range sum query
ev aluation, op erates on the w a v elet
o ef-
ien ts of a data set’s densit y fun
tion,
whi
h w e sho w
an b e
omputed and up-
dated eÆ
ien tly . W e pro v e that for a d di-
mensional data domain of size N at most
O (((log N )=d)
d
) of these
o eÆ
ien ts are rel-
ev an t for a parti
ular query and that they
an
b e eÆ
ien tly listed in order of imp ortan
e. W e
demonstrate with an implemen tation that a
-
essing the terms in this order yields ex
ellen t
appro ximate results after a small prop ortion
of the total required I/O.
1 In tro du
tion
Range-sum query ev aluation is one of the most ba-
si
problems of OLAP . These queries lie at the heart
of
ommer
ial pro du
ts b eing used b y
ustomers who
are
olle
ting ev er larger databases. Mu
h resear
h
has b een done to impro v e our abilit y to ev aluate these
This resear
h has b een funded in part b y NSF gran ts EEC-
9529152 (IMSC ER C) and ITR-0082826, NASA/JPL
on tra
t
nr. 961518, D ARP A and USAF under agreemen t nr. F30602-99-
1-0524, and unrestri
ted
ash/equipmen t gifts from NCR, IBM,
In tel and SUN.
queries, but as businesses and resear
hers
olle
t more
data and ask more questions, our b est te
hniques will
qui
kly b e stret
hed to their limit.
Resear
h on impro ving resp onse time for range-sum
queries has mo v ed in t w o dire
tions: impro ving exa
t
query ev aluation algorithms and nding fast appro xi-
mate algorithms. F or exa
t queries, te
hniques lik e the
Dynami
Data Cub e [6℄
an b e used to
ompute exa
t
query results in O (((log N )=d)
d
) for a d-dimensional
spa
e of size N , using data stru
tures that
an b e up-
dated eÆ
ien tly . W ork on appro ximate OLAP has fo-
used on
ompressing the underlying data and using
a simple algorithm to ev aluate queries on the smaller
dataset. While these te
hniques deliv er useful results,
they will fa
e formidable
omp etition from the exa
t
algorithms as they s
ale. T ypi
al
ompression-based
te
hniques ha v e
omplexit y linear in the size of the
ompressed database. This implies a
ompression ra-
tio of O (N
1
((log N )=d)
d
) is needed to mat
h the re-
sp onse time of the exa
t metho ds. A
hieving su
h dra-
mati
ompression without degrading query a
ura
y
is a daun ting
hallenge. With that said, the exa
t al-
gorithms still ha v e exp onen tial
omplexit y in d, and it
is not diÆ
ult to imagine data domains where this is
una
eptably slo w.
Progressiv e query ev aluation has b een prop osed as a
ompromise b et w een exa
t and appro ximate metho ds
[17 , 11 ℄. As
omputation progresses these te
hniques
pro du
e an in
reasingly a
urate estimate of the -
nal exa
t answ er. This enables in tera
tiv e feedba
k
and giv es a user a go o d appro ximation of the features
of the dataset long b efore the exa
t
omputations are
omplete. Progressiv e query ev aluation pro vides an
inno v ativ e w a y to pro vide OLAP on massiv e m ultidi-
mensional datasets where in tera
tiv e exa
t OLAP ma y
b e imp ossible. In order to b e viable, a progressiv e al-
gorithm m ust pro vide query estimates of
onstan t a
-
ura
y with
omplexit y at least as go o d as exa
t algo-
rithms. It is un
lear whether the prop osed progressiv e
te
hniques meet this standard.
P age 1
1.1 Our Con tributions
In this pap er w e presen t Progressiv e OLAP , or PO-
LAP , a new progressiv e algorithm for range-sum query
ev aluation. POLAP
an b e used as an exa
t algo-
rithm, and as su
h it has query I/O
omplexit y of
O (((log N )=d)
d
). One v ariation of the algorithm has
storage
omplexit y of O (log N ) during
omputation,
allo wing m u
h larger problems to b e exe
uted in mem-
ory . F urthermore, the
ost of up dating the database
using our te
hnique is O (((log N )=d)
d
). T o our kno wl-
edge there are no existing algorithms that ha v e lo w er
query
omplexit y without ha ving higher up date
ost.
POLAP
an also b e used to pro du
e a
urate ap-
pro ximate query results b y terminating the exa
t
om-
putation b efore it is
omplete. W e demonstrate with
an implemen tation that these appro ximate query re-
sults b e
ome meaningful after a small fra
tion of the
I/O is
omplete. Used as a progressiv e algorithm, PO-
LAP
an pro vide users with these in
reasingly a
u-
rate query estimates during the pro
ess of exa
t query
ev aluation.
W e a
hiev e this b y appro ximating the query rather
than the data. Individual range-sum queries
an b e
asso
iated with fun
tions on the data domain. Com-
puting the w a v elet transform of these fun
tions giv es
us a de
omp osition of the query in to a sum of sub-
queries that
an b e ev aluated in
onstan t time. In
Theorem 1 w e sho w that this de
omp osition yields at
most O (((log N )=d)
d
) sub-queries. In Theorem 2 w e
sho w that our algorithm ev aluates these sub-queries
eÆ
ien tly .
W e obtain progressiv e OLAP b y p erforming this de-
omp osition, then ev aluating the most imp ortan t sub-
queries rst. The resulting ev aluation order will de-
p end on the dimensions of the range: small ranges
will tend to extra
t high resolution information rst,
while large ranges dep end more on lo w resolution in-
formation. This adaptiv e strategy qui
kly pro du
es
an a
urate estimate of the nal answ er for ranges of
all sizes. As a t ypi
al example, in one test of 1000
queries on a database of GPS ground station data, w e
observ ed an a v erage relativ e error of 12.9% after only
10% of the sub-queries w ere ev aluated. After 20% of
the sub-queries for the same run w ere ev aluated, the
a v erage relativ e error w as 8.1%. The results are dra-
mati
ally b etter for sums o v er large ranges, and for
uniformly distributed data.
W e also note that our framew ork allo ws us to ig-
nore distin
tions b et w een measure and non-measure
attributes. W e treat all dimensions symmetri
ally and
allo w measures to b e dened at query time.
1.2 Outline
The remainder of this pap er is organized as follo ws.
In Se
tion 2 w e pro vide a brief o v erview of re
en t re-
sear
h on range-sum query ev aluation, and follo w this
with some basi
denitions and notation in Se
tion 3.
Se
tion 4 des
rib es the essen
e of ho w POLAP w orks
b efore going in to sev eral te
hni
al se
tions ab out dis-
rete w a v elet transforms and ho w they
an b e
om-
puted eÆ
ien tly for our parti
ular problem. On this
foundation, Se
tion 7 des
rib es POLAP as an exa
t,
appro ximate, and progressiv e range-sum query ev alua-
tion algorithm. Se
tion 8 presen ts results from our im-
plemen tation of POLAP , pro viding
on
rete eviden
e
that POLAP estimates
on v erge to exa
t results v ery
qui
kly . Se
tion 9
on
ludes this pap er.
2 Related W ork
The OLAP literature
o v ers a wide and gro wing v ari-
et y of queries, in
luding h yp otheti
al queries [1℄, i
e-
b erg queries [5 ℄, and data
ub e queries that sim ulta-
neously
ompute man y related aggregations [8, 12 ℄.
In this pap er w e restri
t our atten tion to range-sum
queries, whi
h are p erhaps the most basi
aggregate
query .
The prex-sum te
hnique for range-sum ev aluation
[9℄ relies on the storage of a partial sum
ub e, from
whi
h range-sum queries
an b e ev aluated with 2
d
ad-
ditions and subtra
tions. While there are diÆ
ulties
with this te
hnique, this w ork made it
lear that it w as
p ossible to a
hiev e fast, storage eÆ
ien t query ev alu-
ation algorithms. Up date ineÆ
ien
ies of the prex-
sum te
hnique are addressed b y the relativ e prex-sum
and the Dynami
Data Cub e [7, 6 ℄.
Another thread of resear
h has dev elop ed appro xi-
mate OLAP te
hniques. Nonparametri
w a v elet based
data
ompression has b een explored in [16 , 15 ℄, where
it is also noted that
onstan t range fun
tions ha v e a
sparse w a v elet represen tation. P arametri
mo deling is
prop osed in [14 , 2 ℄. All of these te
hniques follo w the
pattern of redu
ing the size of the data, then ev aluat-
ing queries on the smaller dataset.
These t w o threads are brough t together b y progres-
siv e query ev aluation te
hniques. The pCub e [11 ℄ uses
an R-tree-lik e stru
ture to pro vide progressiv e query
ev aluation for a large
lass of aggregate fun
tions. An-
other te
hnique based on w a v elet summary
o eÆ
ien ts
[17 ℄ estimates query results b y lo oking at lo w resolu-
tion a v erages and rening the result as needed. Both
of these te
hniques fun
tion b y building hierar
hies of
appro ximations of the data. Query results are esti-
mated qui
kly using lo w resolution appro ximations of
the data, and re
ursiv ely rened using higher resolu-
tion information. These metho ds ha v e slo w er range-
sum query
omplexit y than the exa
t metho ds men-
tioned ab o v e. T o our kno wledge, the
omplexit y of
nding an appro ximation of
onstan t a
ura
y using
these te
hniques has not b een in v estigated.
3 Range Queries
In order to
larify the presen tation that follo ws w e
will restri
t our atten tion to data that lie in a d-
P age 2
dimensional data
ub e D where ea
h dimension has
2
j p oin ts. The en tire domain has N = 2
d j p oin ts.
Denition 1 A Range in D is a r e
tangular r e gion
of the form R =
Q
d 1
i=0
[l
i
; h
i
℄ wher e l
i
h
i
.
An additiv e range query is usually dened as the
sum of some measure attribute o v er the p oin ts in the
database that lie in the range. It is noted in [14 ℄ that
these sums
an b e written as in tegrals o v er the range
of the pro du
t of a measure fun
tion and the data den-
sit y fun
tion. This is the form ulation w e will use to
mak e the follo wing denition.
Denition 2 Given a fun
tion f : D ! C (the me a-
sur e attribute), a r ange R D , and a datab ase of
p oints in D with density fun
tion : D ! R
+
, the
r ange query Q(R ; f ; ) is dene d as
Q(R ; f ; ) =
X
x2D
f (x) R
(x)( x) dx (1)
wher e R
denotes the
har a
teristi
fun
tion of the set
R . The fun
tion f R
is r eferr e d to as the range fun
-
tion for the query.
4 The Core of POLAP
Denition 2 giv es us an asso
iation b et w een a range-
sum query and a range fun
tion. The fundamen tal
idea b ehind POLAP is that w e
an express these range
fun
tions as a sum of basi
omp onen t fun
tions. Ea
h
of the
omp onen t fun
tions
an b e though t of as a
sub-query , and w e
ho ose a de
omp osition and a data
storage format so that ea
h of these sub-queries
an
b e
omputed in
onstan t time.
On
e the query is de
omp osed, w e sort the sub-
queries in order of imp ortan
e for the nal result.
Ev aluating the most imp ortan t sub-queries rst giv es
a progressiv ely more a
urate estimate of the ev en tual
exa
t result. This allo ws the algorithm to b e used as
an exa
t, appro ximate, or progressiv e range-sum query
ev aluator.
While simple at a high lev el, there are some fun-
damen tal
hallenges to this approa
h. In order to b e
viable, w e need to nd a go o d de
omp osition of range
fun
tions. The de
omp osition m ust ha v e a uniformly
small n um b er of terms for all range fun
tions, so that
the query
omplexit y do es not dep end on the nature of
the range itself. W e also m ust ha v e a w a y to ev aluate
the sub-queries qui
kly .
This se
tion pro vides an o v erview of ho w w e use
w a v elets to a
hiev e these goals. The rst issue w e w an t
to explain is wh y a w a v elet de
omp osition of a range
fun
tion giv es us sub-queries that
an b e ev aluated
easily . The w a v elet transforms w e use de
omp ose a
fun
tion in to a sum of orthogonal w a v elets
f (x) R
(x) =
X
j;k
j;k
(f R
)
j;k
(x)
R Region I
Region II
Region III
Region IV:
Corner Coefficient
Figure 1: V arious w a ys w a v elets o v erlap a range R .
Only those w a v elets that o v erlap a
orner, su
h as the
one in region IV,
on tribute to range-sum query re-
sults.
where w e use the notation of Se
tion 5 for w a v elet
o ef-
ien ts. This allo ws us to rewrite the sum in Denition
2 as
Q(R ; f ; ) =
X
j;k
(f R
) j;k
() (2)
In Se
tion 6 w e sho w that the de
omp osition of
range fun
tions in a basis of w a v elets
an b e p erformed
v ery eÆ
ien tly . If w e store the v alues j;k
(), w e
an
ev aluate the sub-queries j;k
(f R
) j;k
() as qui
kly as
w e
an
ompute the w a v elet transform of f R
.
This is useless if there are to o man y sub-queries
in our de
omp osition. F or the remainder of this se
-
tion w e explain wh y most of the w a v elet
o eÆ
ien ts
of t ypi
al range fun
tions f R
are zero. In Se
tion 6
w e will sho w that the few remaining
o eÆ
ien ts
an
b e
omputed qui
kly . The fa
t that w a v elets de
om-
p ose range-sum queries in to a sum of a small n um b er
of easy to
ompute sub-queries giv es us exa
tly what
w e need to
arry out our high lev el progressiv e query
ev aluation strategy .
T o see wh y most w a v elet
o eÆ
ien ts of range fun
-
tions turn out to b e zero, w e rst need to mak e a few
statemen ts ab out what kind of w a v elets w e are us-
ing. W e
an
ho ose an orthonormal basis of w a v elets
j;k
su
h that
j;k
is zero outside of the re
tangle
Q
d 1
i=0
2
j i
[k
i
; k
i
+ ‘℄ for some in teger ‘. W e
an also
require that the w a v elets satisfy a moment
ondition of
order k > 0, that is, for an y p olynomial in one v ariable
of degree less than k w e ha v e
X
x2D
g (x)p(x
i
)
j;k
(x) = 0 (3)
for an y fun
tion g that do es not dep end on the
o or-
dinate x
i
.
No w w e
an lo ok at the w a v elet
o eÆ
ien ts of the
query fun
tion f R
. Throughout this dis
ussion, w e
will refer to Figure 1. First of all, note that if the
P age 3
w a v elet is zero outside of a re
tangle disjoin t from R ,
lik e Region I of Figure 1, then the fun
tion R
j;k
will
alw a ys b e zero. This means the
orresp onding w a v elet
o eÆ
ien ts will alw a ys b e zero. F or a small range, this
ma y ha v e eliminated most of the
o eÆ
ien ts, but for
a large range, there will b e man y w a v elets supp orted
en tirely inside of R , in re
tangles lik e Region I I. In this
situation, R
j;k
=
j;k
. If w e restri
t f to b e a p oly-
nomial of degree less than k (in pra
ti
e it is usually
onstan t or linear), then using the momen t
ondition
w e see that
j;k
(f R
) =
X
x2D
f (x)
j;k
(x) = 0
A t this p oin t w e only need to
onsider w a v elets that
o v erlap the b oundary of R when
omputing the sum
in Equation 2, but man y of these turn out to b e zero
as w ell. Consider w a v elets supp orted in a re
tangle
lik e Region I I I of Figure 1, where there is at least one
dimension i su
h that 2
j i
[k
i
; k
i
+ ‘℄ [l
i
; h
i
℄. This
allo ws us to write
j;k
(f R
) =
X
x2D
g (x)p(x
i
)
j;k
(x) = 0
where g is a fun
tion that do es not dep end of x
i
, and
p is a p olynomial of degree less than k . The momen t
ondition in Equation 3 immediately implies that this
sum m ust b e zero.
No w w e ha v e eliminated all of the w a v elets from
onsideration ex
ept for those supp orted in re
tangles
lik e Region IV of Figure 1 that o v erlap the
orners
of R . F or the setup w e ha v e des
rib ed, there are at
most (‘ log N )
d
w a v elets that o v erlap an y giv en p oin t
in D . Sin
e R has 2
d
orners, this means that the
sum in Equation 2 will in v olv e only O ((2‘ log N )
d
) of
these
orner
o eÆ
ients. This argumen t giv es us the
follo wing theorem, more details ab out whi
h ma y b e
found in [13℄.
Theorem 1 If the me asur e attribute f is a p olynomial
and the wavelets have vanishing moments up to or der
deg f , then the sum for the r ange query (2) has at most
O ((2‘d
1
log N )
d
)
It is imp ortan t to note that in order for this result
to apply to linear measure fun
tions, whi
h are t ypi
al
in pra
ti
e, Haar w a v elets
annot b e used b e
ause they
do not satisfy the momen t
ondition. No w w e turn to
the problem of
omputing the range fun
tion w a v elet
o eÆ
ien ts. First w e will need to in tro du
e some no-
tations and
on
epts from the theory of w a v elets and
m ultiresolution analysis.
5 W a v elets
The purp ose of this se
tion is to in tro du
e nota-
tion. Readers unfamiliar with w a v elets, m ultiresolu-
tion analysis, and Mallat’s algorithm are referred to
one of the man y ex
ellen t sour
es in
luding [3, 4℄ and
[10 ℄ for a
on
ise
omputational des
ription.
W e use
ompa
tly supp orted orthogonal dis
rete
w a v elets arising from a one dimensional m ultireso-
lution analysis. W e denote the summary lter for
this m ultiresolution analysis b y H = (H
0
; : : : ; H
‘ 1
).
F or the detail lter w e
ho ose the usual oset G
k
=
( 1)
k
H
1 k
. Noti
e that G is supp orted in the in terv al
[2 ‘; 1℄ and H is supp orted in [0; ‘ 1℄.
W e giv e a brief sk et
h of Mallat’s algorithm in order
to in tro du
e notation that will b e used in the sequel.
Giv en the lters H and G, Mallat’s algorithm rep eat-
edly exe
utes partial w a v elet transforms that pro
eed
as follo ws. It tak es as input a fun
tion on an in terv al
[0; 2
j
℄. A t ea
h lev el, it
on v olv es the detail and sum-
mary lters with the input and samples ev ery other
p oin t of the results. The resulting detail
o eÆ
ien ts
are stored as the w a v elet
o eÆ
ien ts for this lev el, and
the summary
o eÆ
ien ts are used as the input for the
omputation at the next lev el. When dis
ussing the
summary and detail
o eÆ
ien ts in the
on text of this
algorithm, w e will denote the summary
o eÆ
ien t at
p osition x of lev el j b y S (x; j ). Similarly , w e denote
the detail
o eÆ
ien t at p osition x of lev el j b y D (x; j ).
The basi
indu
tiv e step of Mallat’s algorithm
an then
b e expressed b y the t w o form ulae
S (x; j ) =
‘ 1
X
k =0
H
k
S (2x k ; j + 1)
D (x; j ) =
1
X
k =2 ‘
G
k
S (2x k ; j + 1)
F ollo wing [10 ℄ w e stop this re
ursion when the size of
the input is less than 4.
Giv en appropriate restri
tions on H , the transfor-
mation p erformed b y this algorithm
an b e made uni-
tary . This means that the standard basis for the trans-
form domain (the set of v e
tors that are equal to 1
for a parti
ular lev el and oset, zero ev erywhere else)
orresp onds to an orthonormal basis in L
2
(D ). The
fun
tions in this basis are our w a v elets. W e use the
notation
j;k
to denote the k th w a v elet at lev el j . It
will b e imp ortan t to us that
j;k
is zero outside of the
in terv al 2
j
[k ; k + ‘℄. W e
an also require that H (and
hen
e
j;k
) satisfy a momen t
ondition. F or the lter
H the
ondition is
‘ 1
X
k =0
H
k
k
n
= 0 (4)
when 0 < n < N for some N
hosen in adv an
e.
This
ondition will imply that the resulting dis
rete
w a v elets will b e orthogonal to p olynomials of degree
less than N , a fa
t that leads to the sparseness of range
fun
tions observ ed in the dis
ussion of Theorem 1.
T o
onstru
t m ultiv ariate dis
rete w a v elets, w e use
the tensor pro du
t basis. In parti
ular, giv en j =
P age 4
(j
0
; : : : ; j
d 1
) and k = (k
0
; : : : ; k
d 1
), w e dene
j;k
(x) =
d 1
Y
i=0
j i ;k i
(x
i
)
This
onstru
tion, along with the univ ariate momen t
ondition dis
ussed ab o v e, yields the a wkw ard but use-
ful momen t
ondition in Equation 3.
W e use the notation j;k
(f ) = hf ;
j;k
i whi
h giv es
us the form of our w a v elet-domain range-query form ula
in Equation 2. In order to ev aluate this sum w e need to
ompute the range fun
tion
o eÆ
ien ts j;k
(f R
) and
retriev e the v alues j;k
(). W e no w turn our atten tion
to the
omputation of the range fun
tion
o eÆ
ien ts.
6 F ast T ransforms of Range F un
tions
A t the
ore of our query ev aluation algorithm is an op-
timization of Mallat’s algorithm for our restri
ted
lass
of range fun
tions. This will allo w us to
ompute the
nonzero w a v elet
o eÆ
ien ts of a range fun
tion in time
prop ortional to the time it tak es to iterate o v er these
o eÆ
ien ts. W e will presen t this optimization in one
dimension, and then sho w that b y p erforming the one
dimensional algorithm indep enden tly on ea
h dimen-
sion, w e
an
reate a data stru
ture in time O (log N )
that
an b e used to iterate o v er all of the nonzero
orner w a v elet
o eÆ
ien ts of a d dimensional range,
omputing ea
h
o eÆ
ien t as needed as part of the
iteration step.
6.1 One Dimension
W e restri
t our atten tion to one-dimensional range
fun
tions of the form p(x) [l; h℄
on [0; 2
j ℄ where p is a
p olynomial of degree less than k . W e will sho w that for
these fun
tions, Mallat’s algorithm
an b e exe
uted in
O (j
) steps rather than the usual O (2
j ). This is b e-
ause w e kno w the result of most of the
on v olutions in
the algorithm in adv an
e. A t ea
h lev el, w e only need
to
on
ern ourselv es with the detail and summary
o ef-
ien ts that arise from
on v olutions in v olving the edge
terms of the range.
In the dis
ussion of Theorem 1 w e observ ed that for
ea
h lev el, at most 2‘
o eÆ
ien ts need to b e
omputed.
In fa
t, the same is true for the summary
o eÆ
ien ts,
thanks to the follo wing
Lemma 1 F or a r ange fun
tion p
[l; h℄
on [0; 2
j ℄, the
summary
o eÆ
ients S (x; j ) at level j 0 wil l b e zer o
outside an interval [l
s
j
‘; h
s
j
+ ‘℄ and wil l b e e qual to
p(x) p(2
j
x) in the interval [l
s
j
; h
s
j
℄.
Pr o of. This is trivial for j = 0, where the summary
o eÆ
ien ts are just the original range fun
tion. W e
an tak e l
s
0
= l and h
s
0
= h. Indu
tiv ely assume that
the result holds for j + 1 < 0.
By denition
S (x; j ) =
‘ 1
X
k =0
H
k
S (2x k ; j 1)
If 2x < l
s
j +1
‘ or if 2x > h
s
j +1
+ 2‘ 1 then the
indu
tion h yp othesis implies that S (x; j ) is zero. Also,
if l
s
j +1
+ ‘ 1 2x h
s
j +1
then
S (x; j ) =
‘ 1
X
k =0
H
k
p(2x k )
= p(2x) = p(2
j
x)
By the momen t
ondition on the lter H . Cho osing
l
s
j
= d
l
s
j +1
+‘ 1
2
e and h
s
j
= b
h
s
j +1
2
w e nd that the
lemma holds for lev el j . Noti
e that it ma y b e the
ase
that l
s
j
h
s
j
, giving us ee
tiv ely one short in terv al of
nonzero
o eÆ
ien ts. This
ompletes the indu
tion step
and the pro of is
omplete. 2
Noti
e that the pro of of this lemma pro vides an ex-
pli
it w a y to
ompute the b ounds of the in terv als re-
ursiv ely . Giv en this simple stru
ture of the summary
o eÆ
ien ts it is no w v ery easy to
ompute the needed
detail
o eÆ
ien ts at an y lev el.
Lemma 2 The detail
o eÆ
ients of the r ange fun
-
tion in L emma 1 at level j < 0 ar e supp orte d
in two (not ne
essarily disjoint) intervals [l
d
j
; l
d
j
+ ‘℄
and h
d
j
; h
d
j
+ ‘℄ wher e l
d
j
= b
l
s
j +1
2‘+2
2
and h
d
j
=
d
h
s
j +1
‘+2
2
e.
Pr o of. This is v ery similar to the pro of of Lemma 1,
only the momen t
ondition for the lter G no w mak es
the terms b et w een l
d
j
+ ‘ and h
d
j
v anish. The fa
t that
w e
hose G to b e supp orted on the in terv al [2 ‘; 1℄ also
ae
ts the b o okk eeping that leads to the nal form of
the results. 2
Put together, these t w o lemmas redu
e the
om-
plexit y of Mallat’s algorithm from O (2
j ) to O (j
) for
xed ‘. In pra
ti
e w e ha v e found that the
hoi
e of
appropriate data stru
tures to represen t these sp e
ial
fun
tions not only pro vides memory eÆ
ien
y , but also
ob viates the
omplexities of the
al
ulus of shifts used
in this se
tion.
6.2 Higher Dimensions
W e no w lo ok at m ultiv ariate range fun
tions of the
form f R
where R =
Q
d 1
i=0
[l
i
; h
i
℄ and f (x) =
Q
d 1
i=0
p
i
(x
i
) where ea
h p
i
is a univ ariate p olynomial
of degree less than k . W e will generally refer to m ulti-
v ariate fun
tions that
an b e written as a pro du
t of
univ ariate fun
tions as sep ar able. Queries that arise in
pra
ti
e are often simpler than this (they are usually
onstan t or linear in one v ariable), so w e will defer the
P age 5
problem of eÆ
ien tly de
omp osing general p olynomi-
als in to a sum of separable p olynomials. W e simply
note that ea
h term of a p olynomial is separable, so a
de
omp osition is p ossible. In the sequel w e des
rib e in
detail an algorithm for query ev aluation with separa-
ble p olynomial measure fun
tions. T o ev aluate a query
for a nonseparable p olynomial, one
an de
omp ose it
in to a sum of separable terms, ev aluate ea
h of these
queries, and sum the results.
Giv en a separable range fun
tion, w e
an
ompute
its m ultidimensional w a v elet transform easily , sin
e
j;k
(f R
) =
d 1
Y
i=0
hp
i
[l i ; h i ℄
;
j i ;k i
i
w e ha v e an expression for an y m ultidimensional
w a v elet
o eÆ
ien t as the pro du
t of one-dimensional
range w a v elet
o eÆ
ien ts that
an b e ev aluated using
the optimized Mallat algorithm.
7 Query Ev aluation
The algorithms presen ted in this pap er are based on
Equation 2. In order to use this form ula, w e m ust
ompute and store the w a v elet
o eÆ
ien ts of the data
densit y fun
tion, and ha v e some me
hanism for their
retriev al. In this pap er w e will assume that this
an
b e done through a POLAP datab ase.
Denition 3 A POLAP Database is an obje
t that
pr ovides me ans to lo ad and up date multidimensional
data, and to r etrieve sp e
i
wavelet
o eÆ
ients of the
density fun
tion of the lo ade d data.
A straigh tforw ard arra y-based or sparse implemen-
tation of a POLAP database has retriev al
ost prop or-
tional to d and up date
ost of O ((d
1
log N )
d
). Su
h
a database
an b e p opulated in time O (log N ) using
Mallat’s algorithm. F or sparse data other te
hniques
ma y b e preferable. F or algorithms that use a POLAP
database as its only means of data retriev al, w e use
the n um b er of
o eÆ
ien ts retriev ed as a pro xy for the
I/O
ost.
In Se
tion 6 w e sa w that the w a v elet transform of
a range fun
tion
ould b e
omputed qui
kly b y
om-
puting d one dimensional w a v elet transforms using the
optimized Mallat algorithm. W e will use this fa
t so
extensiv ely that w e in tro du
e the follo wing denition
Denition 4 The POLAP Query Cub e for a r ange
fun
tion f R
wher e f =
Q
p
i
is a sep ar able p oly-
nomial and R =
Q
[l
i
; h
i
℄ is an obje
t that
ontains
the d sets of nonzer o wavelet
o eÆ
ients of the one-
dimensional r ange fun
tions p
i
[l i ; h i ℄
and has a me
h-
anism to pr o du
e an iter ator whi
h visits every ele-
ment in the Cartesian pr o du
t of these sets. A t e a
h
p oint, the iter ator
an pr ovide the pr o du
t of the one-
dimensional wavelet
o eÆ
ients as wel l as the index of
the
ontributing
o eÆ
ients.
7.1 Exa
t Query Ev aluation
With this terminology established, w e
an no w state
an exa
t query ev aluation algorithm. Our
omputa-
tion pro
eeds b y iterating o v er a POLAP query
ub e
and a
um ulating the partial sum for Equation 2. F or
reasons that will b e
ome
lear later, w e
all this partial
sum the pr o gr essive estimate.
1. Initialize the progressiv e estimate to b e zero.
2. Using the optimized Mallat algorithm, build a
POLAP query
ub e for the range fun
tion.
3. Iterate o v er the POLAP query
ub e. A t ea
h iter-
ation step, use the iterator’s index to retriev e the
asso
iated w a v elet
o eÆ
ien t from the database,
m ultiply it b y the w a v elet
o eÆ
ien t of the itera-
tor, and add the result to the progressiv e estimate.
4. When there are no more terms, the
omputation
is
omplete and the progressiv e estimate is exa
t.
A few imp ortan t
ommen ts are in order. First of
all, the time sp en t ev aluating w a v elet transforms is
O (d
1
log N ) for ea
h dimension, as is the spa
e re-
quiremen t for storing these
o eÆ
ien ts. This giv es us
a time and spa
e
omplexit y of O (log N ) in step 2.
Dep ending on ho w one
onstru
ts the iterator for the
POLAP query
ub e, it is p ossible to p erform the rest
of the algorithm without
hanging the spa
e
omplex-
it y . Ho w ev er, there will b e O (((log N )=d)
d
) terms in
the iteration, making the iteration dominate the
om-
putational
omplexit y of the algorithm. Sin
e ea
h
iteration step requires one a
ess of the stored data,
this giv es us our I/O
omplexit y .
Com bining this with the dis
ussion after Denition
3, w e obtain the follo wing
Theorem 2 The exa
t r ange query evaluation algo-
rithm pr esente d in steps 1-4 has
omputational and
I/O
omplexity of O ((‘d
1
log N )
d
) for b oth queries
and up dates. It has lo ad
omplexity of O (N ).
It is striking that the query and up date results are
iden ti
al to those for the Dynami
Data Cub e [6℄. So
far w e ha v e ta
itly assumed that some de
ision w ould
b e made ab out ho w to iterate o v er the POLAP query
ub e. F or the exa
t algorithm presen ted here, naiv e
iteration w orks p erfe
tly w ell. In the next se
tion w e
sho w that if w e
ho ose our iteration strategy appropri-
ately , the progressiv e estimate will pro vide us with an
a
urate appro ximation of the exa
t result, and that
w e
an
ompute expli
it b ounds for the maxim um er-
ror of this estimate. In Se
tion 8 w e will see that when
the algorithm is used to ev aluate queries on real and
syn theti
data sets, these estimates are v ery a
urate
after a fra
tion of the total required I/O.
P age 6
7.2 Appro ximate Query Ev aluation
While the algorithm presen ted ab o v e p erforms w ell,
there will alw a ys b e problems for whi
h it is to o slo w
to b e used in tera
tiv ely . A theme in re
en t OLAP lit-
erature [16, 15 , 14 , 2℄ has b een that for these v ery
large problems, the database
an b e
ompressed, and
appro ximate query results
an b e deliv ered qui
kly on
the smaller appro ximate database. The prin
iple that
appro ximate query ev aluation leads to a smaller prob-
lem is imp ortan t, but in our situation it is more fruit-
ful to appro ximate the query than the datab ase. Ev-
ery w a v elet
o eÆ
ien t of the range fun
tion that w e
use requires one a
ess to the database. By ignoring
o eÆ
ien ts of the range fun
tion, w e redu
e the I/O
requiremen ts at the exp ense of obtaining an appro xi-
mate query result.
In order to
ho ose whi
h w a v elet
o eÆ
ien ts to ig-
nore one needs to de
ide ho w to measure the distan
e
b et w een a range fun
tion and its appro ximation. Be-
ause the w a v elets w e are using form an orthonormal
basis for L
2
(D ) and range-sum queries are an inner
pro du
t in this spa
e, it is natural to use the L
2
met-
ri
as a starting p oin t. It w ould b e in teresting to de-
termine whether, giv en a p enalt y fun
tion for query
errors and some assumptions ab out the database, L
2
is the b est
hoi
e of metri
. Answ ering that question
is b ey ond the s
op e of this pap er.
Giv en the w a v elet
o eÆ
ien ts of a range fun
tion, it
is trivial to determine the b est L
2
appro ximation giv en
b y a xed n um b er of w a v elet
o eÆ
ien ts. If w e denote
the set of nonzero w a v elet
o eÆ
ien ts for f R
b y ,
and w e
ho ose to dis
ard all
o eÆ
ien ts in dis
then the L
2
size of our error is giv en b y
k App(f R
) f R
k
2
=
X
(j;k)2 dis
j j;k
(f R
)j
2
where App (f R
) denotes our appro ximation. In or-
der to minimize this error, w e simply dis
ard the
terms where the magnitude of the w a v elet
o eÆ
ien ts
hf R
;
j;k
i is smallest. W e denote the appro ximate
query obtained b y using App(f R
) in pla
e of f R
b y
Q
App
(R ; f ; ) =
X
(j;k)2 n dis
j;k
(f R
) j;k
()
By this analysis, w e see that in order to appro xi-
mate f R
as
losely as p ossible with a xed n um b er
of w a v elets, w e simply need to pi
k the w a v elet
o ef-
ien ts of f R
with the largest magnitude. In order
to tak e adv an tage of this, w e in tro du
e an augmen ted
POLAP Query Cub e
Denition 5 A n Ordered POLAP Query Cub e is
a POLAP Query Cub e for whi
h the iter ator visits
wavelet indi
es in de
r e asing or der of wavelet
o eÆ-
ient magnitude.
The naiv e w a y to
onstru
t an Ordered POLAP
Query Cub e is to
onstru
t an unordered one,
om-
pute all of the
o eÆ
ien ts, sort them, then iterate
o v er this list. This strategy has a drasti
impa
t on
our memory requiremen ts (in
reasing from O (log N )
to O ((d
1
log N )
d
)), and w orsens our
omputational
omplexit y b y a fa
tor of log log N . Despite these
problems, w e use it b e
ause it highligh ts the p oten tial
of progressiv e OLAP without in tro du
ing unneeded
ompli
ations. W e are in v estigating w a ys to mo v e
this
omputational
ost in to the iteration steps, and
to pro du
e an ordered or appro ximately ordered itera-
tor that requires no extra memory . Ev en when w e use
naiv e sorting to
onstru
t our Ordered POLAP Query
Cub e, this requires no a
ess to the POLAP Database.
In prin
iple this imp oses no I/O
ost.
With this denition, our appro ximate range-sum
ev aluation algorithm is simple:
1. Initialize the progressiv e estimate to b e zero.
2. Constru
t an Ordered POLAP Query Cub e form
the range fun
tion.
3. Iterate o v er the rst K terms of the query
ub e.
A t ea
h iteration step, use the iterator’s index to
retriev e the asso
iated w a v elet
o eÆ
ien t from the
database, m ultiply it b y the w a v elet
o eÆ
ien t of
the iterator, and add the result to the progressiv e
estimate.
4. After K steps, the progressiv e estimate is the ap-
pro ximate query result.
K
an b e
hosen to a
hiev e an exp e
ted lev el of error,
although w ork needs to done to justify this t yp e of
hoi
e. It ma y also b e
hosen to a
hiev e a desired
resp onse time.
Our L
2
analysis of the query appro ximation also
giv es us estimates for query errors. First, note that w e
ha v e appro ximated the range fun
tion b y pro je
ting it
on to a subspa
e spanned b y its most imp ortan t w a v elet
omp onen ts. The error of this appro ximation is the
inner pro du
t of the pro je
tions of the data densit y
fun
tion and the range fun
tion on to the orthogonal
omplemen t of this subspa
e. Denoting this pro je
tion
b y P and the magnitude of appro ximation error b y
E (R ; f ; ), this giv es our rst error estimate
E (R ; f ; ) kf R
App (f R
)k
2
kP () k
2
(5)
This estimate in v olv es the global size of and do es
not tak e in to a
oun t the lo
al nature of a range-sum
query . A lo
alized estimate
an b e obtained b y noti
-
ing that the L
2
norm of restri
ted to R will alw a ys
b e less than k k
1
p
V ol(R ) , giving the estimate
E (R ; f ; ) kf R
App(f R
)k
2
k k
1
p
V ol(R ) (6)
P age 7
This estimate ignores the lo
al densit y of , and
is also rough. Using our implemen tation in Se
tion 8
w e will see that these error estimates, while
orre
t,
app ear far to o
onserv ativ e. The query errors w e ha v e
observ ed on b oth real and syn theti
data are m u
h
smaller than either of these estimates w ould indi
ate.
7.3 Progressiv e Query Ev aluation
In Se
tions 7.1 and 7.2 w e prop osed exa
t and appro xi-
mate range-sum query ev aluation algorithms based on
the sum in Equation 2. These t w o algorithms are re-
lated at a fundamen tal lev el. A t ea
h step in the exa
t
query ev aluation algorithm, the progressiv e estimate
ree
ts a
urren t appro ximation of the nal answ er.
In the appro ximate algorithm, w e dev elop ed the prin-
iple that the w a v elet
o eÆ
ien ts of larger magnitude
should b e used b efore those of smaller magnitude. W e
an bring the t w o algorithms together b y using our ap-
pro ximation prin
iple to determine the order in whi
h
w e iterate o v er
omp onen ts in the exa
t algorithm.
This will allo w the exa
t algorithm to yield go o d and
impro ving appro ximate results throughout the I/O in-
tensiv e phase of
omputation.
POLAP , or Progressiv e OLAP , is our in tegration of
these algorithms. It in v olv es the follo wing three steps:
1. Initialize the progressiv e estimate to b e zero.
2. Build an Ordered POLAP Query Cub e for the
range fun
tion.
3. Iterate o v er the query
ub e. A t ea
h iteration
step, use the iterator’s index to retriev e the asso-
iated w a v elet
o eÆ
ien t from the database, m ul-
tiply it b y the w a v elet
o eÆ
ien t of the iterator,
and add the result to the progressiv e estimate.
Mak e this estimate a v ailable to the user.
4. After the iteration is
omplete, the progressiv e es-
timate is the exa
t query result.
This algorithm op ens the p ossibilit y for an en tirely
new t yp e of fun
tionalit y for OLAP
lien t to ols. An
OLAP user
an issue a
ompli
ated query that re-
quires the
omputation of man y range-sums and re-
eiv e a qui
k, blurry resp onse that progressiv ely b e-
omes sharp er. If the results ha v e stabilized or app ear
unin teresting, the remainder of the
omputation
an
b e abandoned. As another pra
ti
al example, a query
optimizer
ould use this to get the b est sele
tivit y es-
timate p ossible giv en a dynami
time
onstrain t.
8 P erforman
e Ev aluation
W e ha v e implemen ted POLAP in C++ using Db4
w a v elets. Our implemen tation w as tested on b oth
real and syn theti
data. In this se
tion w e presen t
some of our observ ations. While these results app ear
to
ompare fa v orably with other appro ximate OLAP
algorithms [16 , 14 ℄, w e do not attempt a dire
t
om-
parison. The purp ose of this demonstration is to sho w
ho w qui
kly POLAP
on v erges to the exa
t result, and
ho w this
on v ergen
e is ae
ted b y the sparseness of
the data, the size of the query relativ e to the domain,
and the size of the domain itself. Throughout this se
-
tion, w e use the n um b er of a
esses to the POLAP
database as a pro xy for the n um b er of I/Os.
8.1 Exp erimen tal Setup
8.1.1 Syn theti
Data
In order to test the ee
ts of
hanging data densit y and
domain size on our algorithm’s rate of
on v ergen
e, w e
generated uniformly distributed random data. This
simple
hoi
e w as delib erate: w e w an ted the densit y
to b e the only v ariable for the test. T ests ha v e b een
p erformed on data generated using other distributions,
and the results are
onsisten t with those rep orted here.
8.1.2 Real Data
Our real data
ome from 22 GPS ground stations lo-
ated throughout California that regularly re
ord their
latitude, longitude, heigh t, and v elo
it y . The ground
stations are xed, but these observ ations are in terest-
ing b e
ause the ground underneath them is mo ving.
Data from observ ations o v er a p erio d of 256 da ys b e-
ginning Mar
h 1, 1999 w ere pla
ed in a four dimen-
sional spa
e
onsisting of latitude, longitude, time, and
heigh t v elo
it y . In this spa
e, the data are distributed
sp oradi
ally in the lo
ation dimensions, but are
on-
en trated densely on a few b ers (
orresp onding to
ground stations) in the time dimension. The
on tin-
uous dimensions w ere dis
retized yielding a data
ub e
with 2
22
p oin ts and sparseness of 0.075%.
Range-sum queries on this data
ub e
an b e used to
understand ho w the earth is mo ving throughout Cali-
fornia, and ho w this is
hanging throughout time. F or
example, Range-sums
an b e used to answ er the fol-
lo wing questions:
Giv en a geographi
range (p erhaps sp e
ied
through a map-based user in terfa
e), what is the
a v erage v erti
al v elo
it y?
What p er
en tage of ground stations in Southern
California w ere mo ving up in July 1999?
Of the ones that w ere mo ving up, what w as the
a v erage up w ard v elo
it y?
8.1.3 Error Measures
W e rep ort query error using b oth a v erage relativ e er-
ror and a normalized absolute error. W e attempted
to use the absolute error normalization of [16 ℄ whi
h
divides ea
h error b y the largest v alue of the partial
sum
ub e, but found that our normalized errors lo ok ed
P age 8
ex
eptionally go o d. Hen
e w e prop ose a more
onser-
v ativ e measure: giv en a
lass of queries, w e divide the
a v erage absolute error b y the a v erage magnitude of all
nonzero query results. This measure pro vides us with
an estimate of ho w w ell w e
an distinguish queries that
will progress to zero from those that ha v e a nonzero
exa
t answ er.
8.1.4 Query Generation
W e generate queries for three t yp es of range: small,
medium, and large. Our small ranges ha v e edges of a
small
onstan t size in ea
h dimension, and this size
do es not
hange throughout our exp erimen ts. The
medium ranges ha v e side lengths prop ortional to the
square ro ot of the
orresp onding dimension size. The
large ranges ha v e side lengths equal to one-half of
the
orresp onding dimension size. These range sizes
w ere
hosen so that the w a v elet transform of the small
ranges w ould fall primarily in the high resolution lev-
els, the w a v elet transform of the large ranges will fall
primarily in the lo w resolution lev els, and the w a v elet
transform of the medium ranges will b e
on
en trated
at the middle resolution lev els.
F or ea
h dataset tested (in
luding the GPS
dataset), 1000 ranges of ea
h range size w ere generated
uniformly throughout the data domain. Error statis-
ti
s w ere a
um ulated o v er these query ev aluations to
pro du
e the results rep orted here.
8.2 Results
Figures 2 and 3 demonstrate the rate at whi
h PO-
LAP
on v erges to the exa
t result for the GPS data
set. Figure 2 sho ws the normalized absolute error of
the POLAP query result plotted v ersus the p er
en tage
of total I/O
ompleted. Ea
h
urv e in the graph rep-
resen ts a dieren t range size, results are displa y ed for
small, medium, and large ranges. Here the results are
striking: after 10% of the I/O is
omplete, the a v er-
age absolute error for ea
h range t yp e is less than 5%
of the a v erage nonzero query result. W e also observ e
that POLAP
on v erges more qui
kly for large ranges
than it do es for small ranges. There is an in teresting
anomaly w orth noting: for small ranges, the error a
-
tually gets w orse during the rst 5% of the I/O. This
is due to the fa
t that most of the a
tual query results
w ere zero, so the initial progressiv e estimate w as equal
to the exa
t answ er. F rom that p oin t, an y su
essiv e
hanges w ould only w orsen the estimate.
Figure 3 is similar to Figure 2, but rep orts a v erage
relativ e error instead of absolute error. These results
are ae
ted b y the fa
t that man y small and medium
ranges on the GPS dataset w ere empt y , but lo
ated
near dense b ers of data. The appro ximated range
fun
tions o v erlap these b ers, k eeping the estimated
query results a w a y from zero. This parti
ularly ae
ts
the results of the medium sized ranges. Despite these
0 10 20 30 40 50 60 70 80 90 100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Percentage of I/O Completed
Normalized Absolute Error
small
medium
large
Figure 2: Absolute Error Vs. I/O for GPS Data
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of I/O Completed
Average Relative Error
small
medium
large
Figure 3: Relativ e Error Vs. I/O for GPS Data
ee
ts, w e still see reasonable relativ e errors after 20%
of the I/O is
omplete, and v ery small error after 50%.
W e
ondu
ted analogous exp erimen ts with our syn-
theti
data v arying domain dimension from 3 to 6, do-
main size from 2
15
to 2
26
, and densit y from 0.001% to
70%. The trends of the results w ere
onsisten t with
Figures 2 and 3, but often sho w ed m u
h faster
on v er-
gen
e.
In Figure 4 w e examine the ee
t of
hanging data
densit y on the a
ura
y of our progressiv e estimates.
Uniformly distributed data w ere generated in a domain
onsisting of 4 dimensions of 32 elemen ts. The densit y
of the data v aried from 10% to 0.001%, and the rel-
ativ e error of the progressiv e estimates w ere re
orded
after 20% of the I/O w as
omplete. Data densit y is
depi
ted on the horizon tal axis. The v erti
al axis rep-
resen ts a v erage observ ed relativ e error. A log s
ale is
used for b oth of these axes. Our primary observ ation
is that in
reasing the densit y app ears to redu
e the
P age 9
−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
Logarithm of density percentage (base 10)
Logarithm of relative error (base 10)
small range
medium range
large range
Figure 4: Relativ e Error Vs. Data Densit y
observ ed errors for large ranges, but dramati
ally in-
rease the relativ e errors for small ranges. W e b eliev e
this is due to the fa
t that for sparse data, the high
resolution w a v elet
o eÆ
ien ts
on tain most of the in-
formation ab out the dataset. F or dense data, the lo w
resolution w a v elet
o eÆ
ien ts will b e more imp ortan t.
The w a v elet transform of a small range is
on
en trated
in the high resolution lev els, so our exe
ution of these
queries will pi
k up the most imp ortan t information
ab out the sparse datasets rst. It only lo oks at the
lo w resolution lev els later, th us missing imp ortan t in-
formation ab out dense datasets.
Through all of these tests, w e
omputed the mini-
m um of the error b ounds of Equations 5 and 6. F or
small ranges at 20% progressiv e ev aluation, the a v-
erage v alue of this fore
ast w as 22 times larger than
the a v erage absolute error. F or medium ranges, this
ratio w as 170, and for large ranges the ratio w as ap-
pro ximately 450. There app ears to b e ample ro om for
impro v emen t of these error estimates.
9 Con
lusions and F uture W ork
In this pap er w e presen ted Progressiv e OLAP , an al-
gorithm that pro vides a new w a y to bring OLAP to
users. Existing range-sum algorithms fa
e formidable
hallenges as datasets gro w. W e b eliev e that POLAP
and similar algorithms will pla y a fundamen tal role in
o v er
oming these
hallenges.
This pap er is a b eginning, and w e ha v e a great deal
of w ork to do. One priorit y is to establish b etter te
h-
niques for building Ordered POLAP Query Cub es, and
to
ompare the ee
tiv eness of eÆ
ien t almost-ordered
implemen tations with the ideal results presen ted here.
W e also plan to in v estigate the ee
tiv eness of densit y
fun
tion based approa
hes with te
hniques based on
lo w er dimensional w a v elet transforms of sp e
i
mea-
sure fun
tions. Besides impro ving POLAP for range-
sum queries, w e w ould lik e to in v estigate other query
t yp es that ma y b e amenable to progressiv e ev aluation.
Finally , w e need to nd b etter error measures that
or-
resp ond dire
tly to user in terests. Large relativ e error
ma y b e a
eptable if appro ximate results from a bat
h
of sim ultaneous queries still a
urately
apture data
features the user is in v estigating. These error mea-
sures need to b e used to ev aluate existing appro ximate
metho ds, and to driv e the design of new algorithms.
10 A
kno wledgmen ts
W e w ould lik e to a
kno wledge Seokkyung Ch ung for
his help in preparing this exp erimen tal setup. W e
w ould also lik e to thank our
olleagues at JPL, Brian
Wilson and George Ha jj, for pro viding us with the
GPS data.
Referen
es
[1℄ A. Balmin, T. P apadimitriou, and Y. P apak onstan ti-
nou. Hyp otheti
al queries in an OLAP en vironmen t.
In VLDB 2000, Pr o
. of 26th Int’l Conf. on V ery
L ar ge Data Bases, pages 220{231. Morgan Kaufmann,
2000.
[2℄ D. Barbara and X. W u. Data
ub es in dynami
en vi-
ronmen ts. IEEE Data Engine ering Bul letin, 22(4):15{
21, 1999.
[3℄ I. Daub e
hies. T en L e
tur es on Wavelets, v olume 61
of CBMS-NSF L e
tur e Notes. SIAM, 1992.
[4℄ R. A. DeV ore and B. J. Lu
ier. W a v elets. A
ta Nu-
meri
a, 1:1{56, 1992.
[5℄ M. F ang, N. Shiv akumar, H. Gar
ia-Molina, R. Mot-
w ani, and J. D. Ullman. Computing i
eb erg queries
eÆ
ien tly . In VLDB’98, Pr o
. of 24th Int’l Conf. on
V ery L ar ge Data Bases, pages 299{310. Morgan Kauf-
mann, 1998.
[6℄ S. Gener, D. Agra w al, and A. E. Abbadi. The dy-
nami
data
ub e. In EDBT 2000, 6th Int’l Conf.
on Extending Datab ase T e
hnolo gy, v olume 1777 of
L e
tur e Notes in Computer S
ien
e, pages 237{253.
Springer, 2000.
[7℄ S. Gener, D. Agra w al, A. E. Abbadi, and T. Smith.
Relativ e prex sums: An eÆ
ien t approa
h for query-
ing dynami
OLAP data
ub es. In Pr o
. of the
15th Int’l Conf. on Data Engine ering, pages 328{335.
IDDD Computer So
iet y , 1999.
[8℄ J. Gra y , A. Bosw orth, A. La yman, and H. Pirahesh.
Data
ub e: A relational aggregation op erator general-
izing group-b y ,
ross-tab, and sub-total. In Pr o
. of
the 12th Int’l Conf. on Data Engine ering, pages 152{
159, 1996.
[9℄ C. Ho, R. Agra w al, N. Megiddo, and R. Srik an t.
Range queries in OLAP data
ub es. In SIGMOD
1997, Pr o
e e dings A CM SIGMOD Int’l Conf. on Man-
agement of Data, pages 73{88. A CM Press, 1997.
[10℄ W. Press, S. T euk olsky , W. V etterling, and B. Flan-
nery . Numeri
al R e
ip es in C. Cam bridge Univ. Press,
1992.
P age 10
[11℄ M. Riedew ald, D. Agra w al, and A. E. Abbadi. p
ub e:
Up date-eÆ
ien t online aggregation with progressiv e
feedba
k. In Pr o
. of the 12th Int’l Conf. on S
ienti
and Statisti
al Datab ase Management(SSDBM’00),
pages 95{108. IEEE, 2000.
[12℄ K. A. Ross and D. Sriv asta v a. F ast
omputation of
sparse data
ub es. In VLDB’97, Pr o
. of 23r d Int’l
Conf. on V ery L ar ge Data Bases, pages 116{125. Mor-
gan Kaufmann, 1997.
[13℄ R. R. S
hmidt and C. Shahabi. W a v elet based den-
sit y estimators for mo deling OLAP data sets. In Thir d
Workshop on Mining S
ienti
Datasets, in
onjun
-
tion with First SIAM Int’l Confer en
e on Data Min-
ing, 2001.
[14℄ J. Shanm ugasundaram, U. F a yy ad, and P . Bradley .
Compressed data
ub es for OLAP aggregate query ap-
pro ximation on
on tin uous dimensions. In Fifth A CM
SIGKDD Inter. Conf. on Know le dge Dis
overy and
Data Mining, August 1999.
[15℄ J. S. Vitter and M. W ang. Appro ximate
omputation
of m ultidimensional aggregates of sparse data using
w a v elets. In SIGMOD 1999, Pr o
. of the A CM SIG-
MOD Int’l Conf. on the Management of Data, pages
193{204. A CM Press, 1999.
[16℄ J. S. Vitter, M. W ang, and B. R. Iy er. Data
ub e
appro ximation and histograms via w a v elets. In Pr o
.
of the 7th Int’l Conf. on Information and Know le dge
Management, pages 96{104. A CM, 1998.
[17℄ Y.-L. W u, D. Agra w al, and A. E. Abbadi. Using
w a v elet de
omp osition to supp ort progressiv e and ap-
pro ximate range-sum queries o v er data
ub es. In Pr o
.
of the 9th Int’l Conf. on Information and Know le dge
Management, pages 414{421. A CM, 2000.
P age 11
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 754 (2002)
PDF
USC Computer Science Technical Reports, no. 893 (2007)
PDF
USC Computer Science Technical Reports, no. 721 (2000)
PDF
USC Computer Science Technical Reports, no. 826 (2004)
PDF
USC Computer Science Technical Reports, no. 733 (2000)
PDF
USC Computer Science Technical Reports, no. 855 (2005)
PDF
USC Computer Science Technical Reports, no. 839 (2004)
PDF
USC Computer Science Technical Reports, no. 766 (2002)
PDF
USC Computer Science Technical Reports, no. 896 (2008)
PDF
USC Computer Science Technical Reports, no. 740 (2001)
PDF
USC Computer Science Technical Reports, no. 587 (1994)
PDF
USC Computer Science Technical Reports, no. 645 (1997)
PDF
USC Computer Science Technical Reports, no. 799 (2003)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 592 (1994)
PDF
USC Computer Science Technical Reports, no. 943 (2014)
PDF
USC Computer Science Technical Reports, no. 646 (1997)
PDF
USC Computer Science Technical Reports, no. 785 (2003)
PDF
USC Computer Science Technical Reports, no. 701 (1999)
PDF
USC Computer Science Technical Reports, no. 968 (2016)
Description
Rolfe R. Schmidt, Cyrus Shahabi. "POLAP: A fast wavelet-based technique for progressive evaluation of OLAP queries." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 744 (2001).
Asset Metadata
Creator
Schmidt, Rolfe R.
(author),
Shahabi, Cyrus
(author)
Core Title
USC Computer Science Technical Reports, no. 744 (2001)
Alternative Title
POLAP: A fast wavelet-based technique for progressive evaluation of OLAP queries (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
11 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269754
Identifier
01-744 POLAP A Fast Wavelet-based Technique for Progressive Evaluation of OLAP Queries (filename)
Legacy Identifier
usc-cstr-01-744
Format
11 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/