Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 908 (2009)
(USC DC Other)
USC Computer Science Technical Reports, no. 908 (2009)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
C-SKY: Caching Skylines for Efficient Skyline
Computations with Partially-Ordered Domains
Yu-Ling Hsueh
y
Roger Zimmermann
z
Wei-Shinn Ku
x
y
Dept. of Computer Science, University of Southern California, USA
z
Computer Science Department, National University of Singapore, Singapore
x
Dept. of Computer Science and Software Engineering, Auburn University, USA
fhsueh@usc.edu, rogerz@comp.nus.edu.sg, weishinn@auburn.edug
Abstract— The results of skyline queries performed on data
sets with partially-ordered domains vary depending on users’
preference profiles specified for the partially-ordered domains.
Existing work has addressed the issue of handling each individual
query with some efficiency. However, processing large volumes
of such queries for online applications with low response time
is still very challenging. In this paper, we introduce a novel
approach, termed C-SKY, to reduce the latency by caching
query results with their unique user preferences. Of paramount
importance in this case is that cached queries with compatible
preference profiles need to be utilized. For this purpose, we
introduce a similarity measure that establishes how related a
new query is to each of the previously cached queries and
profiles. The similarity measure allows the cached entries to be
effectively ordered according to descending values and hence
query processing can start with the most promising candidates.
If a new query is only partially answerable from the cache, the
proposed method pursues a second optimization step. The query
processor utilizes the partial result sets and augments them by
performing less expensive constraint skyline queries guided by
constraint violations between different query preference profiles.
Furthermore, to lower the space overhead, we propose a cache
management scheme where only the most popular preferences are
preserved. Extensive experiments are presented to demonstrate
the performance and utility of our novel approach.
I. INTRODUCTION
Skyline query computations are important for multi-criteria
decision making applications and they have been studied
intensively in the context of spatio-temporal databases. Skyline
queries have been defined as retrieving in multi-dimensional
space a set of points, which are not dominated by any other
points. An object p dominates p
0
, if p has more favorable
values than p
0
in all dimensions. In many applications, some
data dimensions – for example in the form of hierarchies,
intervals and preferences – are partially-ordered (PO), where
some data lack preferences (e.g., non-specified data nodes). In
Figure 1 (a), domains d
1
and d
2
are totally-ordered whereas
domain d
3
, with the options (or values) of A, B, C, and D,
is partially-ordered. On each partially-ordered domain, every
user (i.e., a query) may declare a user preference profile
which describes the preference order among the options.
Figure 1 (b) shows the user preference profiles g
1
and g
2
of the corresponding queries q
1
and q
2
. Such an ordering
may, for example, represent the preferences that a frequent
traveller has with regards to flying with different airlines.
We use a directed acyclic graph (DAG) to represent a user
preference profile and each node in the graph denotes a value
in a partially-ordered domain. Profile g
1
declares a preference
on all the options in the domain, while the profile g
2
specifies
an ordering only on optionsA andB (i.e., the associated query
has no preference on options C and D). Nodes C and D
are unspecified nodes. With different user preference profiles,
the results of the skyline queries are different. For preference
profileg
1
, pointp
5
dominatesp
8
, because the dimensional data
in both the totally-ordered and the partially-ordered domains
for p
8
are equal or worse than for p
5
(recall that query q
1
prefers A to C). Note that p
6
cannot dominate p
9
(although
all theTO attributes ofp
6
are better than theTO attributes of
p
9
), since the profile has an equivalent preference between B
andD (i.e.,B andD are sibling nodes). Forg
2
, since the user
does not specify any preferences for the nodes C and D, the
query processor allows the dominance of two data tuples with
the same PO values. For example, p
10
dominates p
8
, and p
9
dominates p
7
. Therefore, the rest of the non-dominated data
tuples with PO values equal to C or D must be preserved as
skyline points.
id d
1
d
2
d
3
p1 1.8 0.3 A
p2 2.0 0.3 A
p3 1.8 0.3 B
p4 1.2 1.0 B
p5 1.4 1.0 A
p6 1.0 1.0 B
p7 2.0 1.0 D
p8 1.8 1.0 C
p9 1.5 1.0 D
p10 0.8 1.0 C
(a) Sample data set.
User Preference
Profile
Skyline Results
g
1
p
1
, p
5
, p
6
, p
10
g
2
p
1
, p
5
, p
6
, p
9
, p
10
(b) DAGs and query results.
Fig. 1. Partially-ordered skyline query example.
The traditional methods to execute queries over totally-
ordered domains cannot efficiently handle data sets with
partially-ordered domains. Current solutions ([2], [13]) convert
each partially-ordered domain data column into integer inter-
vals that enable the traditional index-based skyline algorithms
(e.g., BBS) to handle such queries. The TSS [13] method
enhances the pruning ability and progressiveness of this idea
further by applying topological sorts on the user preference
profiles.
Skyline query computations with partially-ordered domains
are very computationally complex in higher dimensions. The
cost of the query evaluation process increases as both the num-
ber of options for a partially-ordered domain or the number
of partially-ordered domains increase. Therefore, existing sys-
tems are often unable to provide up-to-date query results with
quick response times. To address this challenge we propose a
novel approach termed Caching Skylines for Efficient Skyline
Computations (C-SKY, for short). The main contribution of C-
SKY is that it caches previous queries with both their results
and user preference profiles such that the query processor
can rapidly retrieve a skyline result set for a new query
from a set of existing candidate queries with compatible user
preference profiles. One of the innovations of the approach
lies in our proposed similarity function that measures the
degree of closeness between two user preference profiles.
Since the query processor directly accesses a relatively small
candidate result set to retrieve the skyline points for a new
query, the response time of the skyline computation can be
greatly reduced. To respect the generally limited cache space,
we also propose a novel cache management approach that only
reserves the most popular user profiles and reduces the number
of false hits.
The remainder of this paper is organized as follows. The
concept of similarity measure and the design of the similarity
function are described in Sections III-A and III-B, respec-
tively. The method for the candidate cached query selection
is described in Section III-C, and the details of handling
unanswerable queries is outlined in Section III-D. Section IV
presents and details our cache management design. Finally,
we describe the overall C-SKY algorithm in Section V and we
verify the performance of our techniques in Section VI.
II. RELATED WORK
Numerous secondary storage based algorithms for comput-
ing skylines have been proposed before. Borzsonyi et al. [1]
introduced the non-progressive Block-Nested-Loop (BNL) and
Divide-and-Conquer (D&C) algorithms. The BNL approach
recursively compares each data point p with the current set
of candidate skyline points, which might be dominated later.
BNL does not require data indexing and sorting, however
its performance is influenced by the main memory size. The
D&C technique divides the dataset into several partitions and
computes the partial skyline of the points in every partition. By
merging the partial skylines, the final skyline can be obtained.
Both algorithms may incur many iterations and are inadequate
for on-line processing. Tan et al. [15] presented two progres-
sive skyline processing algorithms: the bitmap approach and
the index method. Bitmap encodes dimensional values of data
points into bit strings to speed up the dominance comparisons.
The index method classifies a set of d-dimensional points into
d lists, which are sorted in increasing order of the minimum
coordinate value. The index scans the lists synchronously
from the first entry to the last. With pruning strategies, the
search space can be reduced. The nearest neighbor (NN)
method [5] indexes the dataset with an R-tree and utilizes
a nearest neighbor search to find the skyline results. The
approach repeats the query-and-divide procedure and inserts
the new partitions that are not dominated by any skyline
point into a to-do list. The algorithm terminates when the
to-do list is empty. A special method is applied to remove
duplicates retrieved from overlapping partitions. The branch
and bound skyline (BBS) algorithm [12] traverses an R-tree
to find the set of skyline points. BBS recursively performs
a nearest neighbor search to compute intermediate/leaf nodes
which are not dominated by the currently discovered skyline
points. Because BBS traverses R-tree nodes based on their
mindist from the origin, each retrieved point is guaranteed
to be a skyline point and can be returned to users immedi-
ately. Additionally, many of the recent techniques are aimed
at continuous skyline support for moving objects and data
streams. Lin et al. [8] utilize n-of-N skyline queries with the
most recent n of N elements to support on-line computation
against sliding windows over a rapid data stream. Morse et
al. [10] illustrated a scalable LookOut algorithm for updating
a continuous time-interval skyline efficiently. Sharifzadeh and
Shahabi [14] introduced the concept of Spatial Skyline Queries
(SSQ). Given a set of data points P and a set of query points
Q, SSQ retrieves those points of P which are not dominated
by any other point inP while considering their derived spatial
attributes with respect to query points inQ. For moving query
points, a continuous skyline query processing strategy has
been presented with a kinetic-based data structure [4]. A suite
of novel skyline algorithms based on a Z-order curve [3]
has also been proposed [6]. Among the solutions, ZUpdate
facilitates incremental skyline result maintenance by utilizing
the properties of aZ-order curve. Other related techniques can
be found in the literature [9], [11], [16], [17]. However, all
the aforementioned studies differ from the main goal of this
research which is to support efficient evaluation for skyline
queries with partially-ordered domains.
Two groups of prior methods are the most relevant to our
work [2], [13]. Chan et al. [2] presented three algorithms
for evaluating skyline queries with partially-ordered attributes.
Their solution is to transform each partially-ordered attribute
into a two-integer domain which allows users to utilize
index-based algorithms to compute skyline queries in the
transformed space. However, all the techniques proposed by
Chan et al. have limited progressiveness and pruning abilities.
Sacharidis et al. designed a topological sort based mechanism
named Topologically-sorted Skylines (TSS) [13] which is both
progressive and exact. TSS introduces a novel dominance
check function which eliminates false hits and misses. In
addition, TSS is able to handle dynamic skyline queries.
Nevertheless, the research did not consider the utilization of
previously cached query results to further improve the query
evaluation performance.
III. CACHING SKYLINES FOR EFFICIENT SKYLINE
COMPUTATIONS
Consider a partially-ordered domain where a user q (note
that we will use the terms user and query interchangeably
B A
C
D
r
(a) DAG g.
r :
A*:
B*:
C :
D :
A* B*
D
C
D
C D
D
(b) Transitive closure g
+
of g.
Fig. 2. Sample user preference DAG and its transitive closure.
in this study) declares specific preferences for some data
dimensions. Skyline query results vary with different user
preferences (as is illustrated in the examples of Figure 1) and
the computation is very costly. Our conjecture is that query
results that were previously obtained with a user preference
profile similar to the profile of the query currently under
consideration may contribute useful candidate result points.
We will first introduce some terminology to help formally
describe the problem. A user preference profile (denoted by
g = (V;E)) can be represented by a directed acyclic graph
(DAG), which consists of a setV of option nodes and a setE
of edges. The node setV includes a unique artificial root node
r with no predecessor together with the actual entry node(s)
as successor(s). A primitive relation (or preference) in g from
node v
i
to node v
j
is denoted by v
i
! v
j
, and the edge e
2E (with a solid arrow) is directly connected from v
i
to v
j
.
A transitive relation is denoted by v
i
99Kv
j
, where the edge
(with a dashed arrow) does not exist ing, and hence there is at
least one additional node between nodes v
i
and v
j
. When v
i
and v
j
have equivalent preferences, such a relation is denoted
by v
i
$v
j
, which indicates that the user does not prefer one
over the other. To enable a quantitative comparison between
two DAGs, we define a numeric similarity measure as the
aggregate contribution of the preference relations that compare
pairs of nodes from both DAGs. We adopt an adjacency list
to represent a DAG g and then compute g’s transitive closure
g
+
= (V;E
+
) consisting of all the primitive and transitive
relations in g, such that for all v
i
and v
j
in V , there exists
a non-null path (either v
i
! v
j
or v
i
99K v
j
) in E
+
. A
transitive closure list contains at most n sub-lists (n equals
the maximum number of options allowed for a user preference
profile), each of which starts with an intermediate node in
the DAG. An actual entry node of a DAG is marked with
an asterisk (
¤
) to distinguish it from other intermediate and
leaf nodes. Figures 2 (a) and (b) illustrate a DAG g and its
corresponding transitive closure. The artificial root node is r,
whereas A
¤
and B
¤
with a asterisk mark each are the actual
entry nodes.
Figure 3 shows the overall framework of the C-SKY system.
The query processor initially computes the skyline query
results for a query request (A) and caches the result set with
the associated preference profiles (C). When a new query
request q enters the system, Task (B) performs the similarity
measure by computing the similarity values between the new
queryq and the cached queries. Upon its completion, Task (B)
forwards a sorted list of candidate queries to Task (D), which
in turn selects a set of candidates from the list. Next, Task
(F) computes q’s skyline results based on the result sets of
these candidate queries. If the new query q is not answerable
from the cache, the Data Restoration component accesses the
whole data set (E) to perform less expensive constraint queries
to restore all of the possibly missing answer points. Finally, the
query processor evaluates the result based on the preference
profiles of the new query to refine the final answer. Since
cache space is limited, Task (G) purges the cache by preserving
the most popular preferences; i.e., the strategy is to eliminate
queries with the least recently used preference profiles from
the cache.
Similarity Measure
Cached
Query
Set
Query Evaluation
B
F
Query Set
Maintenance
G
Cache
updates
Skyline
Queries
A
Skyline query results
Request
Data Set
CSKY Query Processor
C
Candidate Cached
Query Selection &
Data Restoration
E
I/O
D
Fig. 3. C-SKY system framework.
Before we describe each task in detail we explain the main
intuition behind our work. The key concept of the C-SKY algo-
rithm involves preference filtering. Let
¹
V
k
be the unspecified
preference nodes for a partially-ordered domain in dimension
k (PO
k
), where
¹
V
k
µV
k
of g
k
is used by a query q.V
k
is a
node set of all the possible node values allowed in the system
for the PO
k
dimension.V
k
consists of the unspecified node
set
¹
V
k
and specified node set V
k
indicated in a user profile
DAG. We define a data tuple set T(P;
¹
V), where each tuple
contains at least one unspecified preference node in one of the
PO dimensions. T(P;
¹
V) is a potential skyline point set for
q. For the data tuples with unspecified preference values, a
data point p
i
= (TO
1
;:::;TO
d
;PO
1
;:::;PO
n
) can dominate
p
j
=(TO
1
;:::;TO
d
;PO
1
;:::;PO
n
), ifp
i
:TO
k
0) then
9: R = SkylineQuery(T(q
0
:result;g:V), g)
10: else
11: (vioEdges,D) = FindCandidateResultSet(G, ±)
12: if (vioEdges is not empty) then
13: R = SkylineQuery(D, g) /* q is answerable */
14: else
15: perform constraint queries to restore eliminated
data tuplesR by each violated relation in vioEdges;
16: R = SkylineQuery(D[R, g)
17: end if
18: end if
19: end if
20: return R /* the skyline points */
which handles partially-ordered domains. Unlike C-SKY, TSS
consults the entire data set whenever it executes a new skyline
query request. C-SKY adopts TSS as the baseline algorithm to
evaluate the skyline results for partially-ordered domains and
adds its own caching mechanisms. Therefore, the CPU execu-
tion time for the first query is identical to the TSS approach.
Subsequently, as the cache takes effect, performance gains are
achieved. We utilize R-trees as the underlying structure for
indexing the data and skyline points. Specifically, we use the
Spatial Index Library [7] for the R-tree index. A page size
of 4 KBytes is deployed, resulting in node capacities between
78 (d = 6) and 204 (d = 2). The skyline result points are
indexed by a main-memory R-tree to improve the performance
of the dominance checks. Our data set for the a totally-ordered
domain is in the range of [0;1000) and we generated up to
100,000 normal distributed data points with dimensions in the
range of 2 to 4. For a partially-ordered domains, we generate
a PO value for each data dimension from 2 to 10, which is
the maximal number of distinct options for a user preference
profile in the system. The height of a DAG is the maximum
length of any path in the graph. The lattice node size for a
DAG is determined by a height from 2
2
to 2
10
and a density
ratio 0:6. We set the threshold ± as a percentage of the data
set. ± is used for the query selection operation, which avoids
caching a query with a result set size larger than ±. The cache
size », on the other hand, is the percentage of the maximum
result size for all the queries (equals±£the number of queries).
Experiments are conducted with a Pentium 3.20 GHz CPU
and 1 GByte of memory. The query results are evaluated
with an event-driven approach. The main measurement in the
following simulations is the CPU time (for evaluating each
skyline query) and the I/O cost. Our experiments use several
metrics to compare these algorithms. Table II summarizes the
default parameter settings used in the following simulations.
Parameter Default Range
Data cardinality (P ) 20,000 20,000, 40,000, 60,000,
80,000, 100,000
Query cardinality (Q) 100 1 to 100
Number of TO domains
(jTOj)
2 2, 3, 4
Number of PO domains
(jPOj)
2 1, 2
DAG height (h) 6 2, 4, 6, 8, 10
DAG density (d) 0.6 -
Cache threshold (±) 0.8% 0.05%, 0.1%, 0.2%, 0.4%,
0.8%, 1.6%
Cache size (») 3% 1%, 2%, 3%, 4%, 5%
TABLE II
SIMULATION PARAMETERS FOR THE C-SKY APPROACH.
A. Cache Threshold (±)
First, we measured the CPU execution time by varying the
cache threshold size. The choice of a threshold size is critical
for the performance of the system. If the threshold is too small,
more queries with small result sets are cached. Such queries
with small result sets often have intricate user preference
profiles. The system might busily perform constraint skyline
queries to restored missing data tuples due to a large number of
violations. Furthermore, less queries would be qualified to be
cached in the system, which results in less chances to utilize
the cached queries with compatible user profiles. Therefore,
the performance degrades. If the threshold is too high, more
queries with a large result set are cached. Consequently, more
cache space is occupied and the cache is more likely to be
full. Hence, the system has to perform cache replacement
operations more often. However, since the C-SKY algorithm
involves preference filtering to facilitate the retrieval of a small
candidate result set, a cached query with a large result set does
not necessarily reduce the gain for the skyline evaluation time.
A small size of the candidate result set results in an efficient
skyline evaluation time, which is the most time-consuming
portion of the overall performance. Figures 6 (a), (b) and
(c) show the CPU overhead and I/O cost as a function of
the threshold size ranging from 0.05% to 1.6% of the entire
data set. When the threshold size is set to 0.05% or below,
the performance of C-SKY is degraded in terms of the CPU
execution time and I/O cost. A threshold size>0:8% provides
a better performance in terms of the CPU time and I/O cost.
Therefore, we chose 0.8% as the default threshold size for the
rest of our experiments.
0
2
4
6
8
10
1.6 0.8 0.6 0.4 0.2 0.1 0.05
CPU Time (sec)
Threshold
CSKY
TSS
(a) CPU time
10
12
14
16
18
20
22
24
1.6 0.8 0.6 0.4 0.2 0.1 0.05
I/O cost
Threshold
CSKY
TSS
(b) I/O cost
Fig. 6. Performance as a function of the cache threshold ±.
B. Data Cardinality
Figures 7 (a) and (b) show the CPU execution time and I/O
cost as a function of the number of data points, respectively.
Overall, the CPU overhead increases with the number of data
points. C-SKY achieves a significant reduction in terms of the
CPU time compared with TSS. This is indicative of how C-SKY
takes advantage of the results of a set of cached queries with
compatible user preference profiles. Since the TSS approach
considers the entire data set when evaluating the skyline result
for each new query, the CPU overhead is significant with a
large data set, especially as a result of the R-tree constructions.
In C-SKY, since the system only has to construct R-trees on
a small candidate result set, the overall CPU time is reduced.
The experimental results confirm the benefits of the C-SKY
approach that adopts caching and therefore achieves better
CPU performance and lower I/O cost than the TSS technique.
C. Query Cardinality
Next, we report on the impact of the query cardinality on
the performance of the two approaches. Figures 8 (a) and
(b) show the CPU overhead and I/O cost versus the query
cardinality as it ranges from 1 to 100, respectively. When
starting the system, the CPU overheads of both approaches
for evaluating the first skyline query are identical. As time
progresses, the C-SKY system caches more queries and hence
the algorithm can utilize and retrieve a candidate result set
that is a subset of the entire data set. The CPU performance
is improved as more relevant queries are accessed by new
queries. However, as the number of queries increases (the
cache is likely full), the improvement of the C-SKY approach
slows as the system handles more cached queries and more
similarity comparisons are performed. For cache management,
since the cache is more likely full, replacement operations are
executed more frequently. However, overall we can see that
C-SKY still outperforms TSS in terms of the CPU time and
I/O cost.
D. User Profile Cardinality
In this experiment we investigate the effect of the DAG
height associated with the PO domains. In Figures 9 (a) and
(b), we vary the DAG height from 2 to 10. Both algorithms
incur an increasing CPU load and I/O cost as the DAG height
increases. When the total number of lattice nodes of a DAG
increases, C-SKY mainly suffers from higher computation
costs of the similarity measurements, since the system has
to check a large number of lattice nodes (or relations) for
similarity comparisons. Furthermore, t-dominance operations
are performed intensively, because the query processor might
access intricate user profiles composed of more lattice nodes.
Consequently, the skyline result points are often large such that
the performance of the dominance checks is degraded. The
performance of the TSS approach remains relatively stable,
albeit at a worse level than C-SKY.
E. Dimensionality
Next, we investigate the impact of the dimensionality on the
performance of the TSS and C-SKY techniques. Figures 10 (a),
(b) and (c) illustrate the CPU overhead and I/O cost versus the
PO and TO dimensionality in pairs of (size of PO, size of
TO), ranging from 2 to 4 for the TO domains and 1 to 2 for the
PO domains, respectively. When the dimensionality increases,
the performance of all methods is degraded because the R-trees
fail to filter out irrelevant data entries in higher dimensions.
The system possibly outputs more skyline points that in turn
incur more dominance checks. From all the figures we can see
that C-SKY outperforms TSS slightly when the dimensionality
is high. Because the skyline result sets are often large and
significant dominance checks are required, the cached queries
cannot contribute much in this case.
F . Cache Size
We investigate the effect of the cache size in Figure 11. The
size of the cache is important in terms of its overall impact
on improving the performance of the system. If the cache size
is too small, the system suffers from more disk I/Os because
less useful queries are cached. On the other hand, if the cache
size is too large, the C-SKY algorithm must process a large
set of relevant cached queries with respect to each new query.
Specifically, many similarity measurement operations need to
be executed to retrieve the candidate data set. Figure 11 nicely
0
10
20
30
40
50
100,000 80,000 60,000 40,000 20,000
CPU Time (sec)
Number of Data (P)
CSKY
TSS
(a) CPU time
20
40
60
80
100
100,000 80,000 60,000 40,000 20,000
I/O cost
Number of Data (P)
CSKY
TSS
(b) I/O cost
Fig. 7. Performance as a function of data cardinality.
2
4
6
8
10
12
100 90 70 50 30 10 1
CPU Time (sec)
Number of Query (Q)
CSKY
TSS
(a) CPU time
10
12
14
16
18
20
22
24
100 90 70 50 30 10 1
I/O cost
Number of Query (Q)
CSKY
TSS
(b) I/O cost
Fig. 8. Performance as a function of query cardinality.
0
2
4
6
8
10
12
14
10 8 6 4 2
CPU Time (sec)
DAG Height (h)
CSKY
TSS
(a) CPU time
0
5
10
15
20
25
30
35
10 8 6 4 2
I/O cost
DAG Height (h)
CSKY
TSS
(b) I/O cost
Fig. 9. Performance as a function of the DAG height.
illustrates the tradeoff as a cache size of between 3% to 4%
seems to result in optimal performance.
VII. CONCLUSIONS
We have introduced a novel approach, termed C-SKY, to
process skyline queries with partially-ordered domains by
caching the query results with their unique user preference pro-
files. The query response time of a new query is significantly
reduced by retrieving its result from the cached result sets with
compatible specifications. Our similarity measure enables the
query processor to find the minimum set among the candidate
results. In case a query result cannot be fully computed from
the cache, we propose the use of less expensive constraint
skyline queries to restore missing data tuples. Finally, to lower
the space overhead, we propose a cache management scheme
where only the most popular specifications are preserved. Our
experimental evaluation demonstrates that C-SKY improves
existing methods.
VIII. ACKNOWLEDGMENTS
This research has been funded in part by NSF grant IIS-
0534761, NUS AcRF grant WBS R-252-050-280-101/133 and
0
5
10
15
20
25
4,2 4,1 3,2 2,2 3,1 2,1
CPU Time (sec)
Dimensionality (|TO| : |PO|)
CSKY
TSS
(a) CPU time
0
5
10
15
20
25
4,2 4,1 3,2 2,2 3,1 2,1
I/O cost
Dimensionality (|TO| : |PO|)
CSKY
TSS
(b) I/O cost
Fig. 10. Performance as a function of the dimensionality.
0
2
4
6
8
10
12
14
5 4 3 2 1
CPU Time (sec)
Cache Size
CSKY
TSS
(a) CPU time
10
15
20
25
30
5 4 3 2 1
I/O cost
Cache Size
CSKY
TSS
(b) I/O cost
Fig. 11. Performance as a function of the DAG density.
equipment gifts from the Intel Corporation, Hewlett-Packard,
Sun Microsystems and Raptor Networks Technology. We also
acknowledge the support of the NUS Interactive and Digital
Media Institute (IDMI).
REFERENCES
[1] S. B¨ orzs¨ onyi, D. Kossmann, and K. Stocker. The Skyline Operator. In
Proceedings of the 17th International Conference on Data Engineering
(ICDE), Heidelberg, Germany, pages 421–430, 2001.
[2] C. Y . Chan, P.-K. Eng, and K.-L. Tan. Stratified computation of skylines
with partially-ordered domains. In SIGMOD Conference, pages 203–
214, 2005.
[3] V . Gaede and O. G¨ unther. Multidimensional Access Methods. ACM
Comput. Surv., 30(2):170–231, 1998.
[4] Z. Huang, H. Lu, B. C. Ooi, and A. K. H. Tung. Continuous
Skyline Queries for Moving Objects. IEEE Trans. Knowl. Data Eng.,
18(12):1645–1658, 2006.
[5] D. Kossmann, F. Ramsak, and S. Rost. Shooting Stars in the Sky:
An Online Algorithm for Skyline Queries. In Proceedings of 28th
International Conference on Very Large Data Bases (VLDB), Hong
Kong, China, pages 275–286, 2002.
[6] K. C. K. Lee, B. Zheng, H. Li, and W.-C. Lee. Approaching the Skyline
in Z Order. In Proceedings of the 33rd International Conference on Very
Large Data Bases (VLDB), University of Vienna, Austria, pages 279–
290, 2007.
[7] S. I. Library. http://www.research.att.com/ mar-
ioh/spatialindex/index.html.
[8] X. Lin, Y . Yuan, W. Wang, and H. Lu. Stabbing the Sky: Efficient
Skyline Computation over Sliding Windows. In Proceedings of the 21st
International Conference on Data Engineering (ICDE), Tokyo, Japan,
pages 502–513, 2005.
[9] X. Lin, Y . Yuan, Q. Zhang, and Y . Zhang. Selecting Stars: The k
Most Representative Skyline Operator. In Proceedings of the 23rd
International Conference on Data Engineering (ICDE), Istanbul, Turkey,
pages 86–95, 2007.
[10] M. D. Morse, J. M. Patel, and W. I. Grosky. Efficient Continuous Skyline
Computation. In Proceedings of the 22nd International Conference on
Data Engineering (ICDE), Atlanta, GA, USA, page 108, 2006.
[11] M. D. Morse, J. M. Patel, and H. V . Jagadish. Efficient Skyline
Computation over Low-Cardinality Domains. In Proceedings of the 33rd
International Conference on Very Large Data Bases (VLDB), University
of Vienna, Austria, pages 267–278, 2007.
[12] D. Papadias, Y . Tao, G. Fu, and B. Seeger. An Optimal and Progressive
Algorithm for Skyline Queries. In Proceedings of the 2003 ACM
SIGMOD international conference on Management of data, pages 467–
478, New York, NY , USA, 2003.
[13] D. Sacharidis, S. Papadopoulos, and D. Papadias. Topologically sorted
skylines for partially ordered domains. In Proceedings of the 25th In-
ternational Conference on Data Engineering (ICDE), Shanghai, China,
2009.
[14] M. Sharifzadeh and C. Shahabi. The spatial skyline queries. In
Proceedings of the 32nd International Conference on Very Large Data
Bases (VLDB), Seoul, Korea, pages 751–762, 2006.
[15] K.-L. Tan, P.-K. Eng, and B. C. Ooi. Efficient progressive skyline
computation. In VLDB, pages 301–310, 2001.
[16] L. Tian, L. Wang, P. Zou, Y . Jia, and A. Li. Continuous Monitoring of
Skyline Query over Highly Dynamic Moving Objects. In Sixth ACM
International Workshop on Data Engineering for Wireless and Mobile
Access (MobiDE), Beijing, China, pages 59–66, 2007.
[17] P. Wu, D. Agrawal,
¨
O. Egecioglu, and A. E. Abbadi. Deltasky: Optimal
Maintenance of Skyline Deletions without Exclusive Dominance Region
Generation. In Proceedings of the 23rd International Conference on
Data Engineering (ICDE), The Marmara Hotel, Istanbul, Turkey, pages
486–495, 2007.
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 892 (2007)
PDF
USC Computer Science Technical Reports, no. 909 (2009)
PDF
USC Computer Science Technical Reports, no. 886 (2006)
PDF
USC Computer Science Technical Reports, no. 843 (2005)
PDF
USC Computer Science Technical Reports, no. 871 (2005)
PDF
USC Computer Science Technical Reports, no. 628 (1996)
PDF
USC Computer Science Technical Reports, no. 912 (2009)
PDF
USC Computer Science Technical Reports, no. 742 (2001)
PDF
USC Computer Science Technical Reports, no. 911 (2009)
PDF
USC Computer Science Technical Reports, no. 739 (2001)
PDF
USC Computer Science Technical Reports, no. 762 (2002)
PDF
USC Computer Science Technical Reports, no. 685 (1998)
PDF
USC Computer Science Technical Reports, no. 846 (2005)
PDF
USC Computer Science Technical Reports, no. 693 (1999)
PDF
USC Computer Science Technical Reports, no. 748 (2001)
PDF
USC Computer Science Technical Reports, no. 625 (1996)
PDF
USC Computer Science Technical Reports, no. 699 (1999)
PDF
USC Computer Science Technical Reports, no. 598 (1994)
PDF
USC Computer Science Technical Reports, no. 747 (2001)
PDF
USC Computer Science Technical Reports, no. 878 (2006)
Description
Yu-Ling Hsueh, Roger Zimmermann, Wei-Shinn Ku. "C-SKY: Caching skylines for efficient skyline computations with partially-ordered domains." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 908 (2009).
Asset Metadata
Creator
Hsueh, Yu-Ling
(author),
Ku, Wei-Shinn
(author),
Zimmermann, Roger
(author)
Core Title
USC Computer Science Technical Reports, no. 908 (2009)
Alternative Title
C-SKY: Caching skylines for efficient skyline computations with partially-ordered domains (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
12 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269477
Identifier
09-908 C-SKY Caching Skylines for Efficient Skyline Computations with Partially-Ordered Domains (filename)
Legacy Identifier
usc-cstr-09-908
Format
12 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/