Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient updates for continuous queries over moving objects
(USC Thesis Other)
Efficient updates for continuous queries over moving objects
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENT UPDATES FOR CONTINUOUS QUERIES OVER MOVING OBJECTS by Yu-Ling Hsueh A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful¯llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2009 Copyright 2009 Yu-Ling Hsueh Dedication This dissertation is dedicated to my parents, Fu-An Hsueh and Yu-Hua Kuo Hsueh, who are always there to support me with their love and endless encouragement. ii Acknowledgements I would like to express the deepest appreciation to my advisor Dr. Roger Zimmer- mann for always encouraging and inspiring me to complete this research. Without his persistent guidance and ¯nancial support, my dissertation would not have been possible. I am grateful to my committee members, Dr. Cyrus Shahabi and Dr. C.-C. Jay Kuo, who gave me valuable comments and feedbacks. In addition, I thank my family, friends and colleagues who provide me equally important assistance throughout the dissertation process. iii Table of Contents Abstract xi Chapter 1: Introduction 1 1.1 Research Aim and Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 2: Background and Related Work 13 2.1 Location-based Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Spatial Index Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Location Updates for Continuous Queries . . . . . . . . . . . . . . . . . . 17 2.3.1 Object Movement Prediction . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Periodic (Time-based) Updates . . . . . . . . . . . . . . . . . . . . 18 2.3.3 Safe-region Updates . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.4 Threshold-based Updates . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Query Result Updates for Continuous Queries . . . . . . . . . . . . . . . . 20 2.4.1 Continuous Spatial Queries . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 Skyline Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Query Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 3: E±cient Location Updates for Continuous Queries over Moving Objects 25 3.1 System Overview and Assumptions . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Trajectory Movement Model for Moving Objects . . . . . . . . . . . . . . 26 3.2.1 Adaptive Safe Region Computation . . . . . . . . . . . . . . . . . 28 3.2.2 Query Evaluation with Location Probes . . . . . . . . . . . . . . . 31 3.2.2.1 Query Result Updates for Range Queries . . . . . . . . . 32 3.2.2.2 Query Result Updates for c-kNN Queries . . . . . . . . . 34 3.2.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.3.1 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.3.2 Number of Extra NNs . . . . . . . . . . . . . . . . . . . . 40 3.2.3.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.3.4 Query Coverage . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.3.5 Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 Arbitrary Movement Model for Moving Objects . . . . . . . . . . . . . . . 44 iv 3.3.1 Partition-based Lazy Location Updates . . . . . . . . . . . . . . . 46 3.3.1.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.1.2 LIT Details . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.1.3 Mobile-Side Processing . . . . . . . . . . . . . . . . . . . 50 3.3.1.4 Server-Side Processing . . . . . . . . . . . . . . . . . . . 54 3.3.1.5 Spatial Data Compression for Mobile-side LITs . . . . . 57 3.3.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.2.1 System Implementation . . . . . . . . . . . . . . . . . . . 59 3.3.2.2 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.2.3 LIT Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.2.4 Query Coverage . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.2.5 Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.3 A Message-E±cient Prototype for Location-Based Applications . . 66 3.3.3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . 68 3.3.3.2 Server and Mobile Task Modules . . . . . . . . . . . . . . 69 3.3.3.3 System Demonstration . . . . . . . . . . . . . . . . . . . 71 3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Chapter 4: E±cient Query Result Updates for Skyline Computations 75 4.1 System Overview and Assumptions . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Skylines with Totally-ordered Domains . . . . . . . . . . . . . . . . . . . . 78 4.2.1 E±cient Updates for Continuous Skyline Computations . . . . . . 80 4.2.1.1 Second Skyline Computation . . . . . . . . . . . . . . . . 80 4.2.1.2 Description of the ESC Algorithm . . . . . . . . . . . . . 84 4.2.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.2.1 System Implementation . . . . . . . . . . . . . . . . . . . 89 4.2.2.2 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . 91 4.2.2.3 Update Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.2.2.4 Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2.2.5 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3 Skylines with Partially-ordered Domains . . . . . . . . . . . . . . . . . . . 96 4.3.1 CachingSupportforSkylineQueryProcessingwithPartially-Ordered Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3.1.1 User Preference Pro¯le Similarity Measure . . . . . . . . 100 4.3.1.2 Similarity Functions . . . . . . . . . . . . . . . . . . . . . 103 4.3.1.3 Cached Query Selection . . . . . . . . . . . . . . . . . . . 105 4.3.1.4 Unanswerable Queries . . . . . . . . . . . . . . . . . . . . 110 4.3.2 Cache Management and Replacement . . . . . . . . . . . . . . . . 113 4.3.2.1 Query Indexing . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.2.2 Cache Replacement . . . . . . . . . . . . . . . . . . . . . 115 4.3.3 Description of the C-SKY Algorithm . . . . . . . . . . . . . . . . 116 4.3.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 117 4.3.4.1 Cache Threshold (±) . . . . . . . . . . . . . . . . . . . . . 118 4.3.4.2 Data Cardinality . . . . . . . . . . . . . . . . . . . . . . . 120 v 4.3.4.3 Query Cardinality . . . . . . . . . . . . . . . . . . . . . . 121 4.3.4.4 User Pro¯le Cardinality . . . . . . . . . . . . . . . . . . . 122 4.3.4.5 Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 123 4.3.4.6 Cache Size . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Chapter 5: Approximate Continuous K-Nearest Neighbor Queries for Moving Ob- jects 126 5.1 System Overview and Assumptions . . . . . . . . . . . . . . . . . . . . . . 126 5.2 Approximation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.2.0.1 Split Point Computations . . . . . . . . . . . . . . . . . . 128 5.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Chapter 6: Conclusions and Future Work 137 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2.1 Supporting Road Network Data Objects . . . . . . . . . . . . . . . 139 6.2.2 Handling Large Queries . . . . . . . . . . . . . . . . . . . . . . . . 140 6.2.3 The Impact of Communication Delay . . . . . . . . . . . . . . . . 141 References 141 vi List of Tables 3.1 Symbols and functions for the ASR approach. . . . . . . . . . . . . . . . . 28 3.2 Simulation parameters for the ASR approach. . . . . . . . . . . . . . . . . 39 3.3 Simulation parameters for the PLU approach. . . . . . . . . . . . . . . . . 62 4.1 Symbols and functions for the ESC approach. . . . . . . . . . . . . . . . . 81 4.2 Simulation parameters for the ESC approach. . . . . . . . . . . . . . . . . 92 4.3 Symbols and functions for the C-SKY approach. . . . . . . . . . . . . . . 100 4.4 Simulation parameters for the C-SKY approach. . . . . . . . . . . . . . . 118 5.1 Segment-based location table. . . . . . . . . . . . . . . . . . . . . . . . . . 132 vii List of Figures 1.1 Taxonomy of the proposed approaches in this dissertation. . . . . . . . . . 4 1.2 The ASR approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 System framework of the PLU approach. . . . . . . . . . . . . . . . . . . 8 3.1 System architecture overview of the ASR approach. . . . . . . . . . . . . 27 3.2 Traditional safe region v.s. adaptive safe region.. . . . . . . . . . . . . . . 29 3.3 An adaptive safe region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Query result updates in the ASR approach. . . . . . . . . . . . . . . . . . 33 3.5 The order checks of a c-kNN query. . . . . . . . . . . . . . . . . . . . . . . 38 3.6 Extra NNs v.s. communication cost. . . . . . . . . . . . . . . . . . . . . . 40 3.7 Object and query cardinality. . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.8 E®ect of query coverage with k and q len . . . . . . . . . . . . . . . . . . . . 42 3.9 Object and query mobility. . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.10 Illustration of concepts for the PLU approach. . . . . . . . . . . . . . . . 45 3.11 LIT data structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.12 The object grid and a server-side LIT example. . . . . . . . . . . . . . . . 49 3.13 The object grid and mobile-side LITs. . . . . . . . . . . . . . . . . . . . . 53 3.14 A query insertion example. . . . . . . . . . . . . . . . . . . . . . . . . . . 56 viii 3.15 PLU system °ow chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.16 Performance v.s. LIT size.. . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.17 E®ect of query coverage with Q and q len . . . . . . . . . . . . . . . . . . . 64 3.18 Performance v.s. object mobility. . . . . . . . . . . . . . . . . . . . . . . . 65 3.19 Traditional safe regions and LITs. . . . . . . . . . . . . . . . . . . . . . . 67 3.20 PLUS system infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.21 PLUS main interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.22 The data compression module. . . . . . . . . . . . . . . . . . . . . . . . . 73 3.23 PLUS-client interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.1 A skyline query example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Partially-ordered skyline query example. . . . . . . . . . . . . . . . . . . . 77 4.3 S1 and S2 sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4 Dominance set v.s. EDR set. . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5 Traditional EDR v.s. AEDR. . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.6 ESC system framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.7 Performance v.s. update ratio (P =100k, d=5). . . . . . . . . . . . . . . 92 4.8 Performance v.s. dimensionality (P =100k, f update =10%). . . . . . . . . 94 4.9 Performance v.s. cardinality (d=5, f update =10%). . . . . . . . . . . . . 95 4.10 Sample user preference DAG and its transitive closure. . . . . . . . . . . . 97 4.11 C-SKY system framework. . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.12 Example DAGs and query indexing structure. . . . . . . . . . . . . . . . . 102 4.13 Query indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.14 Cache structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 ix 4.15 Performance as a function of the cache threshold ±. . . . . . . . . . . . . . 119 4.16 Performance as a function of data cardinality. . . . . . . . . . . . . . . . . 120 4.17 Performance as a function of query cardinality. . . . . . . . . . . . . . . . 121 4.18 Performance as a function of the DAG height. . . . . . . . . . . . . . . . . 122 4.19 Performance as a function of the dimensionality. . . . . . . . . . . . . . . 123 4.20 Performance as a function of the DAG density. . . . . . . . . . . . . . . . 124 5.1 The data set and the query object. . . . . . . . . . . . . . . . . . . . . . . 127 5.2 The increment points and the candidate set. . . . . . . . . . . . . . . . . . 129 5.3 P =fq s i ;q s i+1 ;q s i+2 g.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4 Performance v.s. number of nodes. . . . . . . . . . . . . . . . . . . . . . . 135 x Abstract As a result of recent technological advances, mobile devices with signi¯cant computa- tionalabilities,gigabytesofstorage,andwirelesscommunicationcapabilitieshavebecome widely available. In addition, positioning chips are embedded in more and more of these mobile devices. The combination of Global Positioning System (GPS), 2G, 3G and in the future 4G cellular communication technologies provides a compelling environment to provide mobile users with various location-based services. These services correspond to continuous spatial queries that are posted within an environment of moving objects and produce as their results a time-varying set of objects. In the most ambitious case both queriesanddataobjectsaredynamic,makingitverychallengingto¯ndane±cientquery evaluationstrategy. Furthermore, monitoringmovingobjectstomaintainthecorrectness of the query answers often incurs frequent location updates from these moving objects. To address these two challenges we group our work into three main topics, namely (i) ef- ¯cient location updates, (ii) e±cient query result updates, and (iii) query approximation. We now give an overview of each group in turn. E±cient Location Updates: The signi¯cant overhead related to frequent location updates from moving objects often results in poor performance. As the most of the loca- tionupdatesdonota®ectthequeryresults,thenetworkbandwidthandthebatterylifeof xi moving objects are wasted. Existing solutions propose lazy updates, but such techniques generally avoid only a small fraction of all unnecessary location updates because of their basic approach (e.g., safe regions, time or distance thresholds). Furthermore, most prior workfocusesonasimpli¯edscenariowherequeriesareeitherstaticorrarelychangetheir positions. Twonovele±cientlocationupdatestrategiesareproposedinthisdissertation. The ¯rst strategy for a trajectory movement environment is the Adaptive Safe Region (ASR) technique that retrieves an adjustable safe region which is continuously reconciled with the surrounding dynamic queries. The communication overhead is reduced in a highly dynamic environment where both queries and data objects change their positions frequently. In addition, we design a framework that supports multiple query types (e.g., range and c-kNN queries). In this framework, our query re-evaluation algorithms take advantage of ASRs and issue location probes only to the a®ected data objects, without °ooding the system with many unnecessary location update requests. The second strategy for an arbitrary movement environment is the Partition-based Lazy Update (PLU, for short) algorithm that elevates this idea further by adopting Lo- cation Information Tables (LIT) which (a) allow each moving object to estimate pos- sible query movements and issue a location update only when it may a®ect any query results and (b) enable smart server probing that results in fewer messages. We ¯rst de- ¯ne the data structure of a LIT which is essentially packed with a set of surrounding query locations across the terrain and discuss the mobile-side and server-side processes in correspondence to the utilization of LITs. In addition, we further apply three lossless compression methods that condense a LIT to reduce the data stream size. xii E±cient Query Result Updates: In the second part of the dissertation, we fo- cus on the problem of maintaining skyline queries e±ciently over dynamic objects with d dimensions for totally-ordered and partially-ordered domains. Skyline queries are an importantnewsearchcapabilityformulti-dimensionaldatabases. Forthetotally-ordered domain skyline queries, we propose the ESC algorithm, an E±cient update approach for Skyline Computations, which creates a pre-computed second skyline set that facilitates an e±cient and incremental skyline update strategy and results in a quicker response time. With the knowledge of the second skyline set, ESC enables (a) to e±ciently ¯nd the substitute skyline points from the second skyline set only when removing or updat- ing a skyline point (which is called a ¯rst skyline point) and (b) to delegate the most time-consuming skyline update computation to another independent procedure, which is executed after the complete updated query result is reported. The basic idea of the traditional branch-and-bound skyline (BBS) algorithm is leveraged for our novel design of a two-threaded approach. The ¯rst skyline can be replenished quickly from a small set of second skylines - hence enabling a fast query response time - while de-coupling the computationally complex maintenance of the second skyline. Furthermore, we propose the Approximate Exclusive Data Region algorithm (AEDR) to reduce the computational complexity of determining a candidate set for second skyline updates. Fortheskylinequerieswithpartially-ordereddomains,weintroduceanovelapproach, termed C-SKY, to reduce the latency by caching query results with their unique user preferences. The results of skyline queries performed on data sets with partially-ordered domains vary depending on users' preference pro¯les speci¯ed for the partially-ordered domains. Existing work has addressed the issue of handling each individual query with xiii some e±ciency. However, processing large volumes of such queries for online applications with low response time is still very challenging. Of paramount importance in this case is that cached queries with compatible preference pro¯les need to be utilized. For this purpose, we introduce a similarity measure that establishes how related a new query is to each of the previously cached queries and pro¯les. The similarity measure allows the cached entries to be e®ectively ordered according to descending values and hence query processing can start with the most promising candidates. If a new query is only partially answerable from the cache, the proposed method pursues a second optimization step. The query processor utilizes the partial result sets and augments them by performing less expensive constraint skyline queries guided by constraint violations between di®er- ent query preference pro¯les. Furthermore, to lower the space overhead, we propose a cache management scheme where only the most popular preferences are preserved. Ex- tensiveexperimentsarepresentedtodemonstratetheperformanceandutilityofournovel approach. Query Approximation: In existing methods, the cost of retrieving the exact c- kNN data set is expensive, particularly in highly dynamic spatio-temporal applications. The cost includes the location updates of the moving objects when the velocities change over time and the number of continuous kNN queries posed by the moving object to the server. Insomeapplications(e.g.,¯ndingmynearesttaxieswhileIammoving),obtaining the perfect result set is not necessary. For such applications, we introduce an AC-kNN technique that approximates the results of the classic c-kNN algorithm, but with e±cient updates and while still retaining a competitive accuracy. xiv Chapter 1 Introduction Theimpressiveadvancementofmobilecommunicationtechnologies, suchasIEEE802.11 and cellular networks, together with ever more capable handheld devices has sparked intense interest in location-aware services. The advancement of mobile technologies such as IEEE 802.11x networks, cellular communications and GPS sensors enables a server to track the positions of moving objects and provide various location-based services. The e±cient evaluation of continuous spatial queries is a fundamental capability needed in many practical applications. An example range query launched from a ¯re engine while battling °ames might be to \continuously locate other ¯re engines within two miles of mycurrentlocation." Sinceallunits(i.e., users)areconstantlymoving, frequentlocation updates often result in high server re-indexing costs and immense communication over- head. With the mobility introduced by portable and handheld devices, the performance bottleneck for continuous spatial query processing is often concentrated in the handling of the frequent location updates at the server and the utilization of the communication channel between the moving client objects (also called mobiles) and the server. Wireless bandwidth is generally still much more scarce than wired bandwidth and { adding to 1 the challenge { the movement dynamics of such an environment require frequent mobile{ server message exchanges that contain location information for the database engine to maintain an up-to-date view of the world. Towards an e±cient continuous query com- putation the following challenges must be addressed: an e®ective query result update mechanism that is needed provides a fast response time of reporting the current query results, and an e±cient strategy to minimize the number of wireless location update messages. Many existing techniques [JLO04, MXA04, MHP05, TPS03, XMA05] have proposed continuous monitoring approaches without considering the cost of the communication overheadinvolved. Somepriorwork[HXL05,MPBT05,PXK + 02]hasprovidedsigni¯cant insightintotheseissuesbyassumingasetofcomputationallycapablemovingobjectsthat cache query-aware information (e.g., thresholds or safe regions) and locally determine a mobile-initiated location update. In the simplest case, whenever an object moves it sends its new location to the server. Obviously this can be very wasteful, for example if the moving object is located in an area where it does not a®ect any query results. Making informed decisions when to communicate update messages becomes a key design issue to improve scalability. The message count can be reduced through the following optimizations. The mobile client may be equipped with computation capabilities to maintainasafe region[PXK + 02]withthepurposethatmovementswithinthesaferegion will not a®ect any query results (hence no location updates must be sent to the server). Safe regions are bounded by the nearest query rectangles around a mobile client and must be recomputed when certain events take place such as a new query is inserted or a moving object moves beyond its safe region boundary. In some cases (e.g., query 2 insertion) a moving object is initially unaware of the event and the server must probe its current location. However, the focus of these solutions is mainly on static queries or simple types of queries (e.g., range queries). Because of the usually simple shape of safe regions (e.g., rectangles or spheres) they can only help to avoid a fraction of unnecessary location updates. If query movements are frequent, such systems su®er from repeated location detections to resolve location ambiguity (incurred on the objects that might become result points) and numerous down-link messages sent to refresh the query-aware information on those mobile objects. As many existing work has already targeted on the solutions for other types of spa- tial queries, such as range and k nearest neighbor queries, continuous skyline queries have been studied intensively in the context of spatio-temporal databases. Skyline query computationsareimportantformulti-criteriadecisionmakingapplicationsandtheyhave beenstudiedintensivelyinthecontextofspatio-temporaldatabases. Skylinequerieshave been de¯ned as retrieving a set of points, which are not dominated by any other points. Anobjectpdominatesp 0 ,ifphasmorefavorablevaluesthanp 0 inalldimensions. Someof thepriorworkonskylinequeriesassumedthatdataobjectsarestatic[PTFS03,PJET05]. Other approaches assumed that the skyline computation involved only a partial of dy- namicdimensions[HLOT06]. Existingwork[LZLL07,PTFS05,WAEA07]generallycom- putes a number of data point subsets, each of which is exclusively dominated by one skyline point. Therefore, when a skyline point moves or is deleted, only its exclusively dominated subset must be scanned. The determination of such an exclusive data set is very computationally complex in higher dimensions and it incurs a serious burden for the system in a highly dynamic environment. Therefore, these systems are often unable to 3 provide up-to-date query results with a quick response time. Figure 1.1 illustrates the taxonomy of the main approaches proposed in this dissertation. We focus on the sce- nario where both queries and data points are dynamic. There are two types of updates involved in the processes, namely (i) data set updates and (ii) query result updates. For the data set updates, we present two approaches: ASR and PLU for di®erent movement models. For the query result updates, two incremental approaches, ESC and C-SKY, are proposed to retrieve exact query answers. We also introduce the AC-kNN approach to perform an approximate continuous nearest neighbor search. Temporal/Spatial Queries Continuous Queries Snap-shut Queries Static Queries + Static Data Set Static Queries + Dynamic Data Set Dynamic Queries + Static Data Set Dynamic Queries + Dynamic Data Set Data Set Updates Query Result Updates Traditional Data Updates Lazy Updates ASR Approach for Trajectory movement PLU Approach for Arbitrary movement Exact Query Answer Evaluation Approximate Query Answer Evaluation AC-kNN Approach Non-incremental Approach Incremental Approach ESC Approach for Totally-Ordered Domains Cache-based C-SKY Approach for Partially- Ordered Domains Figure 1.1: Taxonomy of the proposed approaches in this dissertation. 4 1.1 Research Aim and Methodology The goal of this research is to introduce e±cient update solutions by focusing on reduc- ing the number of location updates (as a result, the communication cost is reduced) and minimizing the response time of query evaluation in two di®erent movement environ- ments. First, we introduce two methodologies that provide the e±cient location updates in a trajectory movement model and an arbitrary movement model for moving objects, respectively. Second, we focus on the problem of reducing the query response time for skyline queries, since much of the prior work has already targeted on the solutions for other types of spatial queries, such as range and k nearest neighbor queries. Last, we proposeaqueryapproximationapproachtoreturnasetofapproximatequeryresultwith a short query response time. To start with, we propose a framework to support multiple types of dynamic, contin- uousqueriesinatrajectorymovementmodelformovingobjects. Ourgoalistominimize thecommunicationoverheadinahighlydynamicenvironmentwherebothqueriesandob- jects change their locations frequently. When a new query enters the system we leverage the trajectory information that it can provide by registering its starting and destination points as a movement segment for continuous monitoring. For example, a policeman might request the following query \send me the top 5 police cars on the road as I am movingfrompointAtopointB."Forsimplicity,weassumeastraightmovementsegment between two points. This assumption can be easily extended to a more realistic scenario which may approximate a curved road segment with several straight-line sub-segments. Weproposeanadaptive safe region thatreconcilesthesurroundingqueriesbasedontheir 5 movement trajectories such that the system can avoid unnecessary location probes to the objects in the vicinity (i.e., the ones which overlap with the current query region). The basic concept of a safe region is that a moving object moving within the given safe region does not a®ect any query results. Therefore, a location update is necessary only when a moving object moves out of its safe region. Furthermore, our incremental result up- date mechanisms allow a query to issue location probes only to a minimum area where the query answers are guaranteed to be ful¯lled. In particular, to lower the amortized communication cost for c-kNN queries, we obtain extra nearest neighbors (n more NNs) which are bu®ered and reused later to update the query results. Thus, the number of location updates incurred from the query region expansion due to query movement is reduced. An example is shown in Figure 1.2 (a). The ASR of p 3 is determined based on the closest query q 1 , since p 3 has a high probability of being covered by the query regionofq 1 whenq 1 movesinthefuture. Thesaferegionofp 3 isadjustedtoareasonable size according to the trajectory information of q 1 . The safe region for p 8 is simply set to the maximum non-overlapping area with the query region of q 2 , because q 2 (due to its opposing moving direction) will never cover p 8 . We bu®er one extra NN for q 2 (a c-3NN query). When q 2 moves to q 0 2 , and since the number of NNs is equal to 3, the query region remains unchanged. In the traditional approach (as shown in Figure 1.2 (b)), the query region is expanded to cover p 5 (the ¯rst closest object outside the query region) such that the additional location probes to p 7 , p 9 , and p 10 are issued. Therefore, our approach reduces the number of query expansions to ¯nd su±cient NNs and the number of location probes. 6 a mobile-initiated update a server-initiated update ASR 2 p 1 p 1 q 3 p Range MR 8 p 8 p c ASR actual query region 2 q 6 p 5 p 7 p c-3NN 9 p 4 p 10 p 2 q c 5 p 2 q 6 p 10 p 7 p 9 p 4 p traditional expanded query region 2 q c (a) An example of ASRs. (b) A query expansion. Figure 1.2: The ASR approach. The second mythology for a arbitrary movement model is the partition-based lazy update approach that signi¯cantly reduces unnecessary location updates by maintaining aLocationInformationTable(LIT)oneachmovingobject. ALIT representsthedetailed query boundaries and distances across the terrain locally around the object. We build a mobile-sideLIT fromm£mtileswhereeachtilecorrespondstoaLIT cellthatstoresthe distance to the closest query boundary. The representation is easy to maintain and can be built on top of existing indexing methods for tracking moving objects, such as grids or R-trees. The server maintains its own LIT which is of size n£n (with n¸ m) and representsthecompleteservicespaceatthetimeinstanceofaspeci¯cevent. Signi¯cantly, a LIT encapsulates surrounding query information in much more detail, yet compactly, as compared with geometric safe regions. This concept (plus other enhancements) allows ourPartition-basedLazyUpdate (PLU)techniquetoreduceunnecessaryupdatemessages and perform signi¯cantly better than traditional techniques. Figure 1.3 illustrates the system framework consisting of a centralized sever with a global server-side LIT, and a set of moving objects caching a mobile-side LIT in their own local memory. 7 Database Server Mobile Clients Server-side LIT Mobile-side LITs Figure 1.3: System framework of the PLU approach. To reduce the response time for continuous skyline queries, we ¯rst propose a ESC approach(E±cient Updates for Continuous Skyline Computations)overdynamicobjects for totally-ordered domains, where objects with d dynamic dimensions move in an un- restricted manner. Each dimension represents a spatial or non-spatial value. The ESC algorithm e±ciently manages the query results by delegating the time-consuming sky- line update computations to another independent procedure, which is processed after the query processor reports the latest skyline query results. The key idea is to maintain a second skyline (or S2) set which is a skyline candidate set pre-computed when a tradi- tional skyline (which we refer as the ¯rst skyline, S1) point requests an update. With the knowledge of the second skyline set, the skyline query result can be updated within a limited search space and the expensive computations (e.g., searching for new second skylines to substitute a promoted second skyline point) can be decoupled from the ¯rst skyline update computations. For the skyline queries with partially-ordered domains, we propose a Caching Skylines approach (C-SKY, for short) for e±cient skyline computa- tions. C-SKY caches the query results with their speci¯ed user preference pro¯les and retrievesaresultsetforanewqueryfromasetofcandidatequerieswithcompatibleuser 8 pro¯les. We propose a novel similarity function to measure a degree of seminality of two userpro¯lesbyreturningascorethatrepresentstheaggregatecontributionforallpairsof the preferences from two compared user preference pro¯les. Such similarity measurement enables a quantitative comparison, from which the query processor performs the query result retrieval, starting from the caching query with the highest similarity value with respect to the new query. Since the query processor directly accesses to a relatively small candidate points to retrieve the result for a new query, the response time of the skyline computations can be greatly reduced. Due to the limited cache space, we also propose a novel cache management approach that only reserves the most popular relationship pairs and reduce the number of false hits. For query approximation, we present an approach, termed an approximate continu- ous k nearest neighbor query (AC-kNN) algorithm, to maintain a kNN result set with e±cient updates for moving objects. The algorithm is based on the observation that maintaining approximate continuous kNN queries can greatly reduce the computational cost. By de¯ning split points on the query trajectories, moving objects can only update their locations on a segment basis regardless of the change of their velocities. In some domains,theexactcontinuouskNNresultisnotrequiredanditisunnecessarytosacri¯ce disk access performance. 1.2 Contributions Themaincontributionsofthisdissertationarethemessage-e±cientandrealtime-response solutions for continuous spatial queries in a dynamic environment where both query and 9 data objects are moving. Speci¯cally, the contributions of the proposed research are as follows: 1. We propose a framework that supports multiple types of dynamic queries over moving objects and minimizes the communication overhead in highly dynamic en- vironments. 2. The concept of the Adaptive Safe Regions and the Location Information Tables are introduced for di®erent movement models to enhance traditional safe regions. The numerous downlink location probes as a result of frequent query movements are reduced. 3. Our e±cient location update approaches (ASR and PLU) send downlink query- aware information to a small set of objects a®ected by a query movement or an insertion of a new query. 4. With a realistic maximum speed assumption, our algorithms can estimate query movements and issue a location update only when it may a®ect any query results. Furthermore, the various cases such as query insertions and deletions have been handled in our current work. 5. We propose two incremental skyline query update approaches that achieve a faster response time for the totally-order domains and partially-ordered domains, respec- tively. 6. To update the results of the continuous skyline queries, ESC adopts a second sky- line set that delegates the most complex computations to a separate procedure 10 executedaftertheupdatesarecompleted. Wealsodesignanapproximateexclusive data region that has a low amortized cost of the exclusive data evaluation in high dimensional and dynamic data environments. 7. The novel utilization of the caching queries with user speci¯cations facilitates the query processor to quickly return the skyline query result without scanning the entire data set. An unanswerable query is converted to a less expensive constrained skyline query guided by the constraints that are missing in the cache. 8. In the C-SKY algorithm, to lower space overhead, we propose a scheme of cache management where only the most popular speci¯cations are preserved. 9. TheAC-kNNalgorithmreturnsapproximateresultsetsforcontinuousqueries, The queryevaluationtimehasbeengreatlyreducedandthesystemretainsacompetitive accuracy. By de¯ning split points on its trajectory, a moving object only needs to sendasegment-basedlocationupdatetotheserverwhenitreacheseachsplitpoint. 1.3 Outline The remainder of this dissertation is organized as follows. Chapter 2 describes the back- ground and related work. Chapter 3 provides the details of the ASR in a trajectory movement environment and PLU algorithms in an arbitrary movement environment. Chapter 4 introduces a novel ESC algorithm for e±cient skyline computations with totally-ordered domains and presents another cache-based algorithm (C-SKY) for sky- line computations with partially-ordered domains. Chapter 5 presents and details of the 11 approximation algorithm for continuous k nearest neighbor query processing. We exten- sivelyverifytheperformanceofourtechniquesandsummaryofthedesignineachchapter and ¯nally conclude our research and propose the future work in Chapter 6. 12 Chapter 2 Background and Related Work We survey the background knowledge about the location-based services and their appli- cations. We also describe the overall concept of the current spatial index design which is essential for an e±cient query processor when considering a very large data set. The existingworkonthee±cientlocationupdatesforspatialcontinuousqueriesiscategorized into four groups, each of which is summarized in details. For the query result updates, we discuss the recent studies on the general continuous spatial query evaluation. In this dissertation, we mainly focus on the skyline queries. The related work regarding skyline query computations for totally-ordered and partially-ordered domains are described in the following sections. 2.1 Location-based Services Location-based services (LBS) are information services based on the geographical loca- tions of wireless carries exploiting some existing techniques for knowing where a user is located. One of the technique is the Global Positioning System (GPS), based on a collection of 24 Navstar satellites which have been used by a land-based GPS receiver 13 embedded in a mobile client device to determine its location. Another approach is radio- location and trilateration based on the signal-strength of the closest cell-phone towers. This technique is used in E911, a most widely used location-based service in the United States. Mobile carriers are required by the Wireless Communications and Public Safety Act of 1999 to identify a caller's telephone number to emergency dispatchers. LBS revenueisforecasttobringthetelecommunicationscarriersandwirelessservices providers an annual global total of $13.3 billion by 2013, up from an estimated $515 million during 2007. More companies today focus on developing preliminary LBS and location-based wireless advertising test markets. The location-based service examples might be: ² Finding the nearest ATM machine or all the restaurants within ¯ve miles. ² Requestingthereal-timetra±cinformationaroundthemobileuser'sroadsegment. ² Receiving the advertisements and sales o®ers. The concerns of the privacy issues [BD03] are raised about being tracked by the ser- vice provider or any other third-party service companies. The privacy issue has been often addressed in terms of how sensitive information is kept secured in the application. The main aspect of the privacy concern here is the mobile user's position to be a speci¯c attribute of identity. The location of a speci¯c mobile user may be regarded as privacy since a LBS may cause a real time privacy intrusion or permit some types of illegal inter- ceptionofdata. Sincetremendousworkregardingtheprivacyissuehasbeeninvestigated, we do not focus on this subject in this dissertation. 14 2.2 Spatial Index Design Twomainchallengesinsupportingbothalargenumberofobjectsandcontinuousqueries in a highly dynamic environment are to (a) reduce the object tracking and query evalu- ation costs and (b) minimize the communication costs of location updates from objects and queries. To address these two issues, the design of an e±cient index structure that can manage the locations of moving objects has been intensely studied. Spatial indexes are leveraged in the spatial databases to optimize the query processing. The most e±- cient and popular disk-based spatial index structure is R-trees [BKSS90, Gut84, SRF87], which has been widely used to index points, trajectories and shapes that are grouped by the minimum bounding rectangles (MBR). The closest data points, partial segments of trajectories or shapes are clustered in the same MBR. With the same principle, the rest of tree nodes are grouped together recursively until the top level. Other similar spatial indexes such as Quad-trees and kd-trees are also widely used in the spatial databases. Each internal Quad-tree node has up to four child nodes and the trees are constructed to partition a two-dimensional space by recursively subdividing it into four quadrants. A kd-tree or a k-dimensional tree organizes data points by splitting planes that are perpen- dicular to one of the coordinates. Every kd-tree node stores a point. This di®ers from R-trees or Quad-trees, in which leaves are the only nodes that contain data points. There are numerous disk-based spatial indexes (e.g. B-trees or Octrees). However, these index structuresareonlye±cientforthelowdimensionalandstaticdatasets. Tofacilitatepre- dictivespatialqueries,thetime-parameterized R-trees [SJLL00]andTPR*-trees [TPS03] are proposed to address the challenges posed from continuous object movements. These 15 techniques are naturally an extension of R*-trees by augmenting the indexed objects and the MBRs with velocity vectors to e±ciently index the current and anticipated future locations of moving objects. To accommodate a dynamic data set without causing ex- pensive tree reconstruction, update algorithms are provided. Nevertheless, these index structures can not cope with continuous queries e±ciently and can not support arbitrary object movements. An alternative, grid-based indices [MA05, XMA06, YPK05] have attracted much at- tention because of their simplicity and scalability in handling position updates; both are desirable features in a highly dynamic environment. Assume that all the objects exist in the unit quare region of [0;1) 2 . A grid of cells with side length of ± is used to index a set of data objects that are assumed to ¯t in main memory. Therefore, a grid is a two-dimensional arrays of size [ 1 ± ]£[ 1 ± ]. Each cell is designated by row and column in- dexes and stores a list of objects which are residing in the region. Some methods have been proposed to process dynamic continuous queries over moving objects by utilizing a gridindex. Forinstance, MobiEyes[GL04]introducedadistributedinfrastructuretopro- cess dynamic range queries where the server is acting as a mediator to coordinate query processing on both the server and moving object sides. Under a uniform distribution, a gird index allows an e±cient performance for processing spatial queries. In some realistic scenarios, data objects might be skewed, leading to expensive query answering and data indexing. To alleviate the performance degradation, a hierarchical object-indexing struc- ture is introduced in [YPK05], which is based on an idea that a cell with dense objects is split into ¯ner sub-cells. The hierarchical index structure contains leaf cells and indexing cells pointing to sub-grids that are built on top of the gird cells. 16 2.3 Location Updates for Continuous Queries Cai et al. [CHC04] proposed the Monitoring Query Management (MQM) approach to leverage the computational capabilities of moving objects for e±cient processing of con- tinuous range queries. SINA [MXA04] has been introduced as centralized solution to process continuous range and k nearest neighbor (kNN) queries over moving objects. Yu et al. [YPK05] proposed an algorithm that computes the query results by de¯ning a search region based on the maximum distance between the query point and the current locations of previous kNNs. However, the algorithm results in high re-computation costs when the query point is highly dynamic. Similarly, Xiong et al. [XMA05] suggested the SEA-CNN framework which uses the concept of shared execution. SEA-CNN continu- ously maintains the search radius of the query point to avoid rebuilding the query result oncethequerypointchangesitslocation. Asanenhancement,Mouratidisetal.[MHP05] presented a technique called CPM that de¯nes a conceptual partitioning of the space by organizing grid cells into rectangles. Location updates are handled only when objects fall into the vicinity of queries, hence improving system throughput. The challenge of frequent updates issuing from moving objects has been addressed and the existing work has been proposed di®erent strategies to reduce such updates. These approaches can be classi¯ed into the following categories. 2.3.1 Object Movement Prediction Predicting the movement of objects (i.e., their motion functions or trajectory) has been used with R-tree-based structures (e.g., the TPR-tree [SJLL00] and its variants [TPS03]) 17 and B-tree-based structures (e.g., the B x tree [JLO04]). The most common motion func- tion is a linear function and it describes an object's movement by f(Y)=X ref +(t cur ¡ t ref )V, where X ref is the reference position or the most recently updated position at the server and V is a velocity vector. However, the linear motion function severely limits the applicability, since in practice an object may have drastically di®erent motion patterns. Tao et al. [TFPL04] introduced a general framework for monitoring and indexing moving objects. A recursive motion function is proposed to support non-linear motion patterns. However, this method incurs extensive location updates due to the arbitrary movements of the moving objects. These techniques require location updates from the objects when the parameters (e.g., moving direction, or speed) of the motion function change. 2.3.2 Periodic (Time-based) Updates In order to handle arbitrary object movements, periodic (time-based) position updates are widely used [MA05, MXA04, MHP05, YPK05]. However, with such a paradigm tree- based indices su®er from excessive node reconstructions when tracking object locations. Cheng et al. [CyLPL07] proposed a time-based location update mechanism with low communication costs to improve the temporal data inconsistency for the relevant objects to queries. Data objects with signi¯cance to the correctness of query results are required to send location updates more frequently. The main drawback of these methods is that an object will repeatedly send location updates to the server when it is enclosed by a query, which consumes a large amount of bandwidth when the query density is high. 18 2.3.3 Safe-region Updates Anumberofpioneeringtechniqueshavebeendesignedforprocessingofcontinuousqueries overmovingobjects. Prabhakaretal.[PXK + 02]¯rstproposedtwoelementarytechniques called Query Indexing and Velocity Constrained Indexing (VCI) and also introduced the important concept of safe regions. Subsequently, Hu et al. [HXL05] proposed a generic frameworktohandlecontinuousqueriesbyleveragingtheconceptofsaferegionsthrough which the location updates from mobile clients can be further reduced. However, these methods only address part of the mobility challenge since they are based on the assump- tionthatqueriesarestatic. Nowadays,anextensivenumberofspatialapplicationsrequire thecapabilitytoprocessmovingobjectsinconjunctionwithdynamiccontinuousqueries. 2.3.4 Threshold-based Updates A threshold-based algorithm is presented in [MPBT05] which assumes that moving ob- jects have some computational capabilities and aims to minimize the network cost when handling c-kNN queries. To each moving object a threshold is transmitted and when its moving distance exceeds the threshold, the moving object issues an update. However, the system su®ers from many downlink message transmissions for refreshing the thresh- olds of the entire moving object population due to frequent query movements. Cheng et al. [CyLPL07] proposed a time-based location update mechanism to improve the tempo- ral data inconsistency for the objects relevant to queries. Data objects with signi¯cance to the correctness of query results are required to send location updates more frequently. Themaindrawbackofthismethodisthatanobjectwillrepeatedlysendlocationupdates to the server when it is enclosed by a query region. 19 In contrast, our proposed techniques for e±cient location updates aim to reduce the communication cost of dynamic queries over moving objects. Based on the nature of the object movement, we ¯rst present the ASR approach which utilizes adaptive safe regions to reducethedownlinkmessagesoflocationprobes dueto querymovementsfora trajectorymovementmodel. Tofurtherreducethedownlinkmessages,theASR approach only probes a set of objects that might become part of the query results. Additionally, ASR allows for decoupled, query-aware information locally maintained by each moving object until the movement might a®ect the query results. Inspired by some of the prior techniques,weproposeanotherstrategytermedPLU approachinanarbitrarymovement model. Itsmaincontributionliesinthedevelopmentof\smarter"saferegionsrepresented vialocationinformationtablesthatenableenhanced(i.e., moreindependent)mobile-side decision making for location updates. These two techniques do not deteriorate when faced with high mobility rates as demonstrated by our simulation results and surpass the aforementioned solutions with higher scalability and lower communication cost. 2.4 Query Result Updates for Continuous Queries 2.4.1 Continuous Spatial Queries Continuous monitoring of queries over moving objects has become an important research topic,becauseitsupportsvarioususefulmobileapplications. TheMQM solution[CHC04] leverages the computational capabilities of moving objects for e±cient processing of con- tinuous range queries. Hu et al. [HXL05] proposed a generic framework to handle con- tinuous queries with safe regions through which the location updates from mobile clients 20 are further reduced. However, these methods only address part of the mobility challenge since they are based on the assumption that queries are static which is not always true in real world applications. By employing grid indices, a number of methods (e.g., LU- Grid [XMA06]) have been proposed to process dynamic continuous queries over moving objects. MobiEyes [GL04]presentedadistributedinfrastructuretoprocessdynamicrange queries where the server is acting as a mediator to coordinate query processing on both the server and moving objects. Mokbel et al. proposed the SINA algorithm [MXA04] for evaluating a set of concurrent continuous spatial-temporal queries. With the incremental evaluation paradigm, SINA updates query results by computing and sending only incre- ments of the previously reported answer. Yu et al. [YPK05] proposed an approach that continuously monitors a kNN query by de¯ning a search region based on the maximum distancebetweenthequerypointandthelocationsofcurrentkNNqueryresults. Moving objectdataareassumedto¯tinmainmemoryandareindexedwitharegulargrid. How- ever, the algorithm su®ers from high re-evaluation cost when the query point is highly mobile. The SEA-CNN framework designed by Xiong et al. [XMA05] is based on the concepts of incremental evaluation and shared execution. Moving objects are stored in secondary memory and indexed with a regular grid. SEA-CNN continuously maintains the search radius of the query point to avoid recomputing the query results once the query point changes its location. The conceptual partitioning (CPM) [MHP05] method assumes the same system architecture and indexing structures as SEA-CNN. CPM de- ¯nes a conceptual partitioning of the space by organizing grid cells into larger rectangles. Location updates are handled only when objects move into the vicinity of queries and hence system throughput is improved. 21 2.4.2 Skyline Queries Borzsonyietal:[BKS01]proposedthestraightforwardnon-progressiveBlock-Nested-Loop (BNL) and Divide-and-Conquer (DC) algorithms. The BNL approach recursively com- pares each data point with the current set of candidate skyline points, which might be dominated later. BNL does not require data indexing and sorting. The DC approach di- vides the search space and evaluates the skyline points from its sub-regions, respectively, followed by merge operations to evaluate the ¯nal skyline points. Both algorithms may incur many iterations and are inadequate for on-line processing. Tan et al: [TEO01] pre- sentedtwoprogressiveprocessingalgorithms: thebitmap approachandtheindex method. Bitmap encodes dimensional values of data points into bit strings to speed up the domi- nance comparisons. The index method classi¯es a set of d-dimensional points into d lists, which are sorted in increasing order of the minimum coordinate. The index scans the lists synchronously from the ¯rst entry to the last one. With the pruning strategies, the search space is reduced. The nearest neighbor (NN) method [KRR02] indexes the data set with an R-tree. NN utilizes nearest neighbor queries to ¯nd the skyline results. The approach repeats the query-and-divide procedure and inserts the new partitions that are notdominatedbysomeskylinepointintoato-dolist. Thealgorithmterminateswhenthe to-do-list is empty. The branch and bound skyline (BBS) algorithm [PTFS03] traverses an R-tree to ¯nd the skyline points. Although BBS outperforms the NN approach, the performance can deteriorate due to many unnecessary dominance checks. Finally, many of the recent techniques aim at continuous skyline support for moving objects and data streams. Lin et al. [LYWL05] present n-of-N skyline queries against the most recent n 22 of N elements to support on-line computation against sliding windows over a rapid data stream. Morse et al. [MPG06] propose a scalable LookOut algorithm for updating the continuoustime-intervalskylinee±ciently. Sharifzadehetal.[SS06]introducetheconcept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, SSQ retrieves those points of P which are not dominated by any other point in P consideringtheirderivedspatialattributeswithrespecttoquerypointsin Q. Formoving query points, a continuous skyline query processing strategy is presented in [HLOT06] with a kinetic-based data structure. However, prompt query response is not considered inthedesign. Asuiteofnovelskylinealgorithmsbasedona Z-ordercurve[GG98]ispro- posed in [LZLL07]. Among the solutions, ZUpdate facilitates incremental skyline result maintenance by utilizing the properties of a Z-order curve. Other related techniques can be found in the literature [CET05, LYZZ07, MPJ07, TWZ + 07, WAEA07]. However, all the aforementioned studies di®er from the main goal of this research which is to support frequent skyline data object updates e±ciently while providing a quick response. Chan et al. [CET05] presented three algorithms for evaluating skyline queries with partially- ordered attributes. Their solution is to transform each partially-ordered attribute into a two-integer domain which allows users to utilize index-based algorithms to compute skyline queries in the transformed space. However, all the techniques proposed by Chan et al. have limited progressiveness and pruning abilities. Sacharidis et al. designed a topological sort based mechanism named Topologically-sorted Skylines (TSS) [SPP09] which is both progressive and exact. TSS introduces a novel dominance check function which eliminates false hits and misses. Nevertheless, the research did not consider the 23 utilization of previously cached query results to further improve the query evaluation performance. 2.5 Query Approximation Arya et al. [AM93] proposed the sparse neighborhood graph, RNG, to process approxi- mate nearest neighbor queries. RNG sets the error variance constant ² and the angular diameter ± to tune the size of divided neighboring areas around one data point. This algorithm reduces the time and space requirements for processing data objects signi¯- cantly. Ferhatosmanoglu et al. [FTAA00] utilized VA + -¯le to solve approximate nearest neighbor queries. This algorithm uses bit vectors to represent a division of the data space. In its ¯rst phase it calculates and compares distance of these spatial cells, then it computes and checks real distance between data points of nearby cells in its second phase. Berranietal.[BAG03]proposedanotheralgorithmtosolveapproximatek nearest neighbor queries. Firstly it clusters data points in space and represents these clusters as spatialspheres. Thenitsetstheerrorvarianceconstant²totuneapproximatespheresfor original clusters. Data points located in the approximate clustering sphere are candidate results for the approximate kNN queries. However, all these algorithms are aimed at high dimensional and ¯xed data points. 24 Chapter 3 E±cient Location Updates for Continuous Queries over Moving Objects 3.1 System Overview and Assumptions To enable a focused discussion we make some explicit assumptions. The communication betweenthecentralizedserverandthemobileunitsarethroughcellularphoneorWiMAX networks. A centralized server is assumed in the environment to process continuous queries. We assume an ideal network environment, that is, no communication delay between the server and moving objects. The mobile units such as vehicles or hand- held devices (e.g., cell phones) consist of a set of dynamic query objects Q and a set of movingobjectsP. Bothqueriesandmovingobjectsareidenti¯edbyauniqueidenti¯erto distinguishtheirtypes. Themobileunitsareabletoprovidetheserverwiththeirpositions from a GPS chip built into the devices and we assume that each mobile unit has enough computational capabilities and memory to carry out the required tasks. We assume no power constraints and virtually unlimited life time of devices. A main-memory grid G is used as the underlying structure to index moving objects because of its simplicity and 25 ease-of-maintenance in a highly dynamic environment. For high performance an event- driven approach is adopted to evaluate continuous queries. To maintain the correctness of the query results, the server monitors registered query objects. Thus, the server can evaluate the queries based on their new locations. Two novel e±cient location update strategies (ASR and PLU) are proposed. In the ASR approach, we assume a trajectory movementmodelintheenvironmentwherewecanutilizethetrajectoryinformationwhich facilitatesthedesignoftheadaptivesaferegions. Inanarbitrarymovementenvironment, since moving objects can freely move to any positions, the ASR can no longer e±ciently handlethequeryrequestsinsuchadynamicandarbitrarymovementmodel. Wepropose the PLU algorithm that encloses relevant query-aware information for data objects in contrast to the traditional safe region design. 3.2 Trajectory Movement Model for Moving Objects Each query object registers its movement trajectory with the server by uploading its starting and ending points (denoted by ¡ ! q j = [q s j ;q e j ]). Furthermore, all the data objects can move in a non-restricted fashion that allows them to move arbitrarily. The query processor evaluates the queries based on the query types in an event-triggered manner. The locations of all queries are monitored by the server. The location updates of a query result point (result point for short) and a non-result point (data point for short) are handled with two di®erent mechanisms. An adaptive safe region (ASR) is computed for each data point. A mobile-initiated voluntary location update is issued when any data point moves out of its safe region. An example (p 8 ) is shown in Figure 1.2 (a). To 26 capture the possible movement of a result point, we use a moving region (MR) whose boundaryincreasesbythemaximummovingdistancepertimeunit. Fortheresultpoints, the location updates are requested only when the server sends server-initiated location probes triggered when the moving regions of the result points overlap with some query regions. Data Set Data Points Query Points Request NNs Order Checks Boundary Expansion I/O Access Request Location Probes Location Probes Query Results Query Processor Result Monitoring B A F G H ASR Computation ASR Assignments I Range Query Evaluation C-kNN Query Evaluation Result Set C D E Figure 3.1: System architecture overview of the ASR approach. Figure3.1showsthesystemframework. Whenarequestarrivesfromadatapoint(A) or from a query point (B) (e.g., a location update, insertion or deletion), the ASR query processor checks whether the point is part of a query result in modules (C) and (D). To incrementally update a query result, prior query results (E) are considered. For a c-kNN query, an NN order check (F) is performed during the query evaluation process. While there are less than k NNs in the result set, a query region expansion (G) is executed. Some server-initiated location probes might be needed to resolve location ambiguities. The points in the result set are monitored (H) through a passive mechanism { this result 27 Symbol Description Q A set of query objects P A set of moving objects G A w£w object grid where objects are hashed to the grid cells based on their locations ± Maximum speed for any object p i :ASR Adpative safe region of object p i p i :MR Moving region of object p i q j :QR Query region of query q j (the radius is denoted by q j :QR:radius) ¡ ! q j Movement trajectory of q j q s j Staring point of the movement trajectory for q j q e j Ending point of the movement trajectory for q j Table 3.1: Symbols and functions for the ASR approach. set is di®erent from the non-result points that voluntarily issue location updates locally determined by the objects. Finally, an updated data point is assigned a new ASR based on the current query information in module (I). Detailed descriptions of the functionality of each component will be given in the following sections. Table 3.1 summarizes the symbols and functions we use throughout the following sections. 3.2.1 Adaptive Safe Region Computation The existing work adopts safe regions to reduce unnecessary location updates such that the communication cost between the server and moving objects is reduced. A safe region in a traditional system is simply an area of maximal size around an object such that no query regions overlap. Figure 3.2 (a) shows an example of two such safe region types (a safe sphere and a safe rectangle) for object p 1 . However, this approach su®ers from many location updates as a result of frequent query movements. When a query moves, the server initiates location probes to the data objects whose safe regions overlap with the query region to ensure the correctness of the query answers. In this thesis, we propose 28 a novel approach to retrieve an adaptive safe region (ASR), which is often smaller than a maximum non-overlapping region and yet is very e®ective in reducing the amortized communicationcostinahighlydynamicmobileenvironment. Thekeyobservationliesin the consideration of some important factors (e.g., the velocity or orientation of the query objects) to reconcile the size of the safe regions. Figure 3.2 (b) illustrates the concept of an ASR. The on-demand location probes are issued as soon as any surrounding queries (q 1 , q 2 , or q 3 ) move. In this example, the distance z is the ASR radius of p 1 , because in the worst case, after both q 3 and p 1 move by distance z and p 1 moves directly toward q 3 , p 1 may become a result point of q 3 . The following lemma establishes the ASR radius based on this observation. x 1 p 2 q y c-kNN range 3 q range 1 q z safe sphere safe rectangle x 2 q y range 3 q range z 2 r 2 r y z 3 r 3 r adaptive safe region c-kNN 1 q 1 r 1 r x 1 p (a) A traditional safe region. (b) An adaptive safe region. Figure 3.2: Traditional safe region v.s. adaptive safe region. Lemma 3.1. p i :ASR:radius=min(CDist(p i ;q j )¡q j :QR:radius);8q j 2Q, where CDist(p i ;q j )= 8 > > < > > : p i f 0 if µ j · ¼ 2 and 9f 0 , or p i q s j if µ j > ¼ 2 or@f 0 29 AsanillustrationofLemma3.1(andtoexplainthesymbolnotation),considerFigure3.3, where the set of queries Q = fq j ;q k g are visited for retrieving the adaptive safe region (the dashed circle) of the data point p i . We measure the Euclidian distance between a query and a data point (CDist in Lemma 3.1) and then deduct the query range. Lemma3.1capturestwocasesofCDist. The¯rstcase(CDist(p i ;q j ))computesadistance p i f 0 = q s j f in the worst-case scenario where both p i and q j move toward each other (under the constraint of the maximum speed). f 0 represents the border point (on the border of q j :QR while q j arrives at f on its movement segment), after which p i would possibly enter the query region of q j . f is the closest point to q s j on the trajectory of q j , which satis¯es the condition that the distance from p i to f is equal to p i f 0 +f 0 f, where f 0 f = q j :QR:radius = r j . Let p i f 0 = x for short. We can obtain the f and f 0 points by computing x ¯rst, which is considered the safe distance for p i with respect to q j . x can be easily computed with the trajectory information of q j by solving the quadratic equation: (x+r j ) 2 =h 2 +(q s j m¡x) 2 (h is the height of triangle4p i q s j m). f on ¡ ! q j exists only when µ j (\p i q s j q e j ) is less or equal to ¼ 2 and (p i q e j ¡q j :QR:radius) < q s j q e j (triangle inequality). If the ¯rst case is not satis¯ed, we consider the second case (CDist(p i ;q k )), which ¯nds the maximum non-overlapping area with q j :QR. Since µ > ¼ 2 in the second case, the query range of q j can never cover p i due to the opposing movement of q j . In this example, the safe distance x (with respect to q j ) is smaller than y (with respect to q k ), so x is chosen as the radius of the adaptive safe region of p i . In our system, since a c-kNN query can be considered an order-sensitive range query, we use the same principle to compute safe regions for each data object with respect to range queries and c-kNN queries. In case of a query insertion or query region expansion of a c-kNN query, the 30 s j q e j q x i p s k q e k q y j T k T x f c c-kNN range f j r h m j r Figure 3.3: An adaptive safe region. adaptive safe regions of the a®ected data objects must be reassigned according to current queries to avoid any missing location updates. 3.2.2 Query Evaluation with Location Probes Theinitialqueryresultsoftherangeandc-kNNqueriesareobtainedusingCPM [HXL05], andlaterthequeryresultsareupdatedinanevent-drivenfashion. Sucheventsincludethe insertionorupdateofaquery. Inthefollowingsections,weproposeourincrementalquery re-evaluation algorithms for both range and c-kNN queries. While updating the query answers, on-demand server-initiated location probes are issued whenever any location ambiguity exists. Speci¯cally, the cost of updating c-kNN queries is usually higher than updating range queries. The reason is that a c-kNN search is an order-sensitive query. The system executes more location updates to ensure the order of the result points. Furthermore, to make sure that at least k result points are found for a c-kNN query, the query region often needs to be enlarged in a situation where both query and data objects aremoving, whichleadstomorelocationprobes. Inourapproach, thestrategytohandle 31 such increasing unnecessary location updates incurred from a c-kNN query is that the query processor computes (k+n) NNs for a c-kNN query instead of evaluating exactly k NNs. This approach helps to reduce the number of future query region expansions to retrievesu±cientNNsforthequeries. Sinceac-kNNqueryistreatedasanorder-sensitive range query, we adopt the same principle that is used for a range query to ¯nd the new answer set in the current query regions ¯rst. A query region is expanded only when there are less than k NNs in the result set. Finally, an order-checking procedure is performed to examine the order of the result points and determine necessary location probes. 3.2.2.1 Query Result Updates for Range Queries The query processor re-evaluates the range queries based on their current positions by the same principles as evaluating the initial query results. The traditional approach adopts the query region itself as the safe region for all the result points in the region to reduce the number of location updates. However, the approach incurs more network messageswhenarangequerychangesitspositionfrequently,becausethesystemneedsto inform the result points of the new position of the query region to avoid missing location updates. An alternative approach basically monitors the entire set of result points to obtain the new correct results. However, such an approach is not scalable when there are large numbers of range queries. We use a moving region (MR) for each result point to estimate the possible movement at the server side. The query processor sends the on- demand location probes to those result points that might move out of the current query regions. A MR is indexed on the grid and the boundary increases at each time step by the maximum moving distance until the result point is probed by the server. Since the 32 number of result points are relatively small, indexing MRs does not signi¯cantly increase the overall server workload. In Figure 3.4 (a), when q 1 moves to q 0 1 , the query processor checks p 1 and p 5 , since their MRs intersect with q 0 1 :QR. 2 p 1 q r r 1 p 3 p 4 p 4 p c ASR MR 2 q 5 p 6 p c 6 p 1 q c ASR 2 p 1 q r r 1 p 3 p 4 p 1 q c 5 p expanded query region ASR (a) Result updates of a range query. (b) Result updates of a c-kNN query. Figure 3.4: Query result updates in the ASR approach. For a data point, in addition to its adaptive safe region, we also consider the current possible moving boundary to serve as an additional indicator for the server to determine a necessary location probe. Continuing the example in Figure 3.4 (a), the gray circle surroundingp 4 isitsASR,andthedashedcirclesrepresentthepossiblemovingboundaries (the radius is equal to the maximum moving distance since the last update of p 4 ) for di®erenttimesteps. p 4 ischeckedbecauseitsp 4 :ASR overlapswithq 0 1 :QR. However, the server does not need to issue a location probe since the current moving boundary does not overlap with q 0 1 :QR. p 0 6 is a newly updated (p 6 moves out of its ASR) data point. The system also needs to check whether its current position is in the query region of q 0 1 . Algorithm1showsthepseudocodeoftherangequeryevaluation,whereq 0 j istheupdated query of q j . Lines 1-7 remove previous result points that are not in the the current query region q 0 j :QR. Lines 2 and 4 compute the mindist and maxdist between a query point 33 and a result point, respectively. If a result point with a MR is completely contained in the query range, a location probe is ignored. In Line 10, if p i is a data point, the server uses the radius of ASR or the maximum moving distance since the last update, which ever is less to estimate its possible moving distance. Algorithm 1 RangeQuery-Update(q 0 j ) 1: for (each d2q j :RangeNN) do 2: if (dist(d;q 0 j )¡d:MR:radius)>q 0 j :QR:radius) then 3: remove d 4: else if (dist(d;q 0 j )+d:MR:radius)>q 0 j :QR:radius) then 5: probe d and remove d if its current position is outside of q 0 j :QR 6: end if 7: end for 8: for (each c2G, which overlaps with the q 0 j :QR) do 9: for (each object p i which resides in c or whose (1) ASR, or (2) MR overlaps with it) do 10: letr =p i :MR:radius,ifp i isaresultpoint;elseletr =min(p i :ASR:radius;±¢t) 11: if (dist(p i ;q 0 j )¡r <q 0 j :QR:radius) then 12: if(dist(p i ;q 0 j )+r <q 0 j :QR:radius), insert p i into q 0 j :RangeNN 13: else probe the position of p i and insert p i into q 0 j :RangeNN, if p i is within q 0 j :QR. 14: end if 15: end for 16: end for 3.2.2.2 Query Result Updates for c-kNN Queries A c-kNN query is more complicated since it is order-sensitive. An intuitive solution enlarges a query region that covers at least all the previous result points (¯rst k NNs) to retrieve new result points. This approach greatly increases the number of location updates since such an expansion (the query region is expanded by the moving distance of the query and result points) often results in more location probes, even though in reality only a small fraction of queries and data objects move. Therefore, in our design for 34 the c-kNN queries, we propose a server-initiated update strategy with an event-triggered update mechanism. Furthermore, the query processor retrieves (n + k) NNs to avoid immediate and successive query region expansions. We relax the de¯nition of the query region, that is, a query region does not necessary include exact k NNs only. The query regionremainsunchangeduntilac-kNNquerydoesnotcontainenoughNNsintheregion. We summarize the following steps to update a c-kNN query result incrementally: Step 1: Assume that q 0 j is a c-kNN query after it moves from q j position. Initially, set q 0 j :QR:radius = q j :QR:radius. Perform a range query update (as described in the previous section) to update result points in q 0 j :QR. If the number of NNs in q 0 j :QR is equal or larger than k, proceed to Step 3. Otherwise, continue to Step 2. Step 2: Expandq 0 j :QRuntilthereare(k+n)NNs. Updateq 0 j :QR:radiustothedistance between q 0 j to the (k+n) th NN. Step 3: Sort the order of the result points and issue the necessary location probes. Step1ensuresthatq 0 j :QRcoversatleastkresultpoints. Notethatduringtheprocess, some discarded objects that are not in q 0 j :QR might be useful in Step 2, because these objects are oftenverycloseto q 0 j :QR and mightbealreadyprobedbythe server. Finding new NNs from these points ¯rst in Step 2 helps the query processor to avoid expanding the safe region to a farther level of cells. In Step 2, while expanding the query region to cover (k+n) result points, a location update is required from any data object p i whose saferegion overlapswiththequery regionof q 0 j . AnewASR iscomputedfor theupdated p i , if p i is still a data object. We use the same approach (query region expansion) to handleaqueryinsertion. InStep3, sortingtheorderoftheresultpointsdoesnotrequire 35 the current positions of the entire result points. The processor performs an OrderCheck procedure that examines the possible actual moving distance of two consecutive NNs to determine the order of the NNs, and issues a location probe only if there is a location ambiguity. Figure 3.4 (b) shows a query region expansion where k = 3 and n = 1. In Step 1, since p 1 and p 5 (probed during the process) are not in q 0 1 :QR, they are removed from the answersetandinsertedintoabu®erfor\recycling"later. Step2isperformedsincethere are only two result points in q 0 1 :QR. The query processor checks the data points in the bu®er ¯rst, so the ¯rst two objects (sorted by the mindist toq 0 1 ) are considered. The new q 0 1 :QR:radius(thebluearea)issettothedistancebetween q 0 1 andp 1 toincludeatleast4 (k+n)objects. p 4 ischeckedlatersincethesaferegionoverlapswithq 0 1 :QR. Algorithm2 shows the detailed process of a c-kNN query update. In Line 2, the RangeQuery-Update procedure inserts the discarded objects into bu®er B sorted by mindist in the ascending order. Line 4 computes the number (v) of NNs missing in the current query region. Line 12 executes CPM to further expand the query region by checking the surrounding cells only when the bu®er is empty. The OrderCheck procedure in Line 16 is performed after all the su±cient NNs are found. In the OrderCheck procedure, to determine a necessary location probe for kNN result points, we observe the following lemma. A proof of correctness is presented subsequently. Lemma3.2. Let q 0 j be the last reported position of the query object q j , and let `=±¢t be the maximum moving distance since the last update of q j , where ± is the maximum speed and ¢t is the time period from the last update time to the current time. 8i = 1 to k, a 36 Algorithm 2 c-kNN-Update(q 0 j ) 1: let B = Á be a bu®er 2: perform RangeQuery-Update(q 0 j ), which ¯nds new NNs in the current query region and inserts discarded objects into B, if any 3: if (q 0 j :KNN:size<k) then 4: v =k+n¡q 0 j :KNN:size 5: while (v >0) do 6: if (B:size>0) then 7: set q 0 j :QR:radius = dist(q 0 j ;V), where V is the v th NN in B, if B:size >= v. Otherwise, set dist(q 0 j ;L), where L is the last object in B. 8: empty B 9: performRangeQuery-Update(q 0 j )thatinsertsun-visited,discardedobjectsinto B, if any 10: v =k+n¡q 0 j :KNN:size 11: else 12: perform CPM(q 0 j ) that checks the objects in the surrounding cells of q 0 j :QR, until (k+n) objects are ful¯lled, and terminate the loop. 13: end if 14: end while 15: end if 16: sort q 0 j :KNN by performing OrderCheck(q 0 j :KNN) that issues necessary location probes. result point p i (the i th result point sorted by the mindist to q 0 j ) needs to issue a location update when the following condition is satis¯ed: `¸(mindist(q 0 j ;p i+1 ) - mindist(q 0 j ;p i ))£ 1 2 Proof: The proof is straightforward, since when the order of p i and p i+1 changes, mindist(p i ;q 0 j ) ¸ mindist(p i+1 ;q 0 j ). When considering the worst case that p i moves in an opposing direction from q 0 j and p i+1 moves toward q 0 j directly, the following inequality holds true: mindist(p i ;q 0 j )+`¸mindist(p i+1 ;q 0 j )¡` 37 1 p 2 p 3 p 1 q c x y 1 r 2 r 3 r 2 p c 1 p c 1 q Figure 3.5: The order checks of a c-kNN query. Therefore,weconcludethattheorderofp i andp i+1 mustchange,when`¸(mindist(q 0 j ;p i+1 ) ¡ mindist(q 0 j ;p i )) £ 1 2 . It is necessary for the server to probe both locations of p i and p i+1 . In Figure 3.5, the result set of q 0 1 is fp 2 ;p 1 ;p 3 g sorted by the distance between q 0 1 and their positions at the server since the last updates. The OrderCheck procedure ¯rst checks p 2 and p 1 . Since dist(q 0 1 ;p 2 )+r 2 >dist(q 0 1 ;p 1 )¡r 1 , the order of p 2 and p 1 might need to be switched. The system needs to probe p 2 and p 1 . After the location probes, the order of the NNs becomesfp 0 1 ;p 0 2 ;p 3 g. Thus, the procedure checks the next pair of p 0 2 and p 3 . Since dist(q 0 1 ;p 0 2 )<dist(q 0 1 ;p 3 )¡r 3 , the location probe of p 3 is not necessary. 3.2.3 Experimental Evaluation We evaluated the performance of the proposed framework that utilizes ASRs and com- pared it with the traditional safe region approach [HXL05, PXK + 02] and a periodic update approach (PER). The periodic technique functions as a baseline algorithm where each object issues a location update (only uplink messages are issued in this approach) every time it moves to a new position. We extended the safe region approach (SR*) 38 to handle dynamic range and c-kNN queries where the result points are monitored the samewayasinASR.Wepreservethetraditionalsaferegioncalculations(maximumnon- overlappingarea)fortheSR* approach. Thesimulationstepsandthedetailedsimulation results are described in the following sections. 3.2.3.1 Simulation Steps Weuseamainmemorygridastheunderlyingindexstructureforallthethreeapproaches. Ourdatasetsaregeneratedonaterrainservicespaceof[0;1024] 2 . Weassumeamaximum speed for any moving object in the range of [0:48;1:25]. The mobility (the percentage of objects that move from time step to time step) for the objects is set in a range from 10% to 50%. The length q len of a range query is set in the range of [1,10] and k for the a kNN query is set from 5 up to 20. In the simulations, the main measurement is the cost of the communication overhead which includes uplink messages (e.g., a mobile-initiated location update) and downlink messages (e.g., a server-initiated location probe). The communicationcostismeasuredbyassumingthatthecostofanuplinkmessage(c up =2) is twice as costly as a downlink message (c down = 1). Table 3.2 summarizes the default parameter settings in the following simulations. Parameter Default Range Number of objects (P) 100K 50K, 100K, 150K, 200K Number of queries (Q) 100 50, 100, 150, 200 Mobility rate 50% 10%, 20%,30%, 40%, 50% Number of NNs (K) 10 5, 10, 15, 20 Query length for range queries (q len ) 5 1, 5, 10 Table 3.2: Simulation parameters for the ASR approach. 39 3.2.3.2 Number of Extra NNs First, we test the e±ciency of using extra NNs (n) for c-kNN queries by varying the number of n, since this factor greatly a®ects the number of downlink messages. The choice of the number of extra NNs is a trade-o®. If n is too large, the query processor evaluatesmoreNNsforaqueryandthesystemismorelikelytoissuemorelocationprobes since a larger query region might overlap with more data objects for location probes. If n is too small, there are more query expansions which might also cause location probes. Figure 3.6 shows the number of overall communication cost (measured in thousands of messages) as a function of the number of extra NNs ranging from 0 to 20. When n is set to more than 10, the performance of ASR is degraded in terms of the communication cost. Therefore, we chose n=10 for the rest of our experiments as this setting results in reduced communication cost. 7 7.5 8 8.5 9 0 5 10 15 20 Communication Cost (K) Number of extra NNs (n) ASR Figure 3.6: Extra NNs v.s. communication cost. 40 3.2.3.3 Cardinality We examined the e®ect of the query and object cardinality assuming that all query and object sets move with a mobility rate of 50%. Figure 3.7 (a) shows the communication overheadofASR,SR* andPER withrespecttotheobjectcardinality. ASR outperforms SR* and PER. The di®erence increases as the number of objects grows. Since an ASR reconciles the surrounding moving queries, a query movement does not incur many un- necessary location probes from the surrounding objects. SR* on the other hand, triggers many location probes from the objects whose safe regions overlap with a query region once the query changes its position. As the density of objects increases, there are more objects in the vicinity area of a query region. Hence SR* incurs an increasing number of location updates as the cardinality increases. Figure 3.7 (b) shows the impact of the numberofqueries. Ouralgorithmachievesabout50%reductioncomparedwith SR* and 90% reduction compared with PER. 20 40 60 80 100 120 50k 100k 150k 200k Communication cost (K) Number of objects ASR SR* PER 10 20 30 40 50 60 70 80 90 100 50 100 150 200 Communication cost (K) Number of queries ASR SR* PER (a) P v.s. communication cost. (b) Q v.s. communication cost. Figure 3.7: Object and query cardinality. 41 3.2.3.4 Query Coverage The query coverage varies with the number of NNs (for kNN queries) and the query length (for range queries). Figure 3.8 (a) shows the communication cost as a function of the number of NNs and Figure 3.8 (b) illustrates the e®ect of the query length. Overall, the communication cost increases as a function of k and q len . However, since ASR and PER utilize the OrderCheck procedure to reduce the number of location probes from the objectswhichdonotviolatetheorderofresultsets,thecommunicationoverheadremains stablewhenk increases. Thiscon¯rmsthefeasibilityoftheOrderCheck procedureaswell asthec-kNNupdatemechanismsofourapproach. ThePER approachbasicallymonitors all the moving objects. Therefore, the number of k is irrelevant to the communication cost; however, PER is not scalable when there is high query coverage. 10 20 30 40 50 60 70 80 90 100 5 10 15 20 Communication cost (K) k of c-kNN queries ASR SR* PER 10 20 30 40 50 60 70 80 90 100 1 5 10 Communication cost (K) Lenth of range queries ASR SR* PER (a) k v.s. communication cost. (b) q len v.s. communication cost. Figure 3.8: E®ect of query coverage with k and q len . 42 3.2.3.5 Mobility Finally, we evaluated the impact of the mobility rate. Figures 3.9 (a) and (b) show the communicationcostasafunctionoftheobjectandquerymobility,respectively. TheASR approach achieves a reduced location update rate compared to the other two approaches for all mobility rates. PER and SR* have worse performance in terms of communication cost when the mobility rate is high. The degradation is caused by the location probes due to query movements. 10 20 30 40 50 60 70 80 90 100 10 20 30 40 Communication cost (K) Mobility of objects ASR SR* PER 10 20 30 40 50 60 70 80 90 100 10 20 30 40 Communication cost (K) Mobility of queries ASR SR* PER (a) Object mobility v.s. communication cost. (b) Query mobility v.s. communication cost. Figure 3.9: Object and query mobility. 3.2.4 Summary We have designed an ASR-based framework for trajectory movement environments. The novel concept of an adaptive safe region is introduced to provide a mobile object with a reasonable-sized safe region that adapts to the surrounding queries. Hence, the com- munication overhead resulting from the query movements is greatly reduced. To further decrease network tra±c caused by c-kNN query region expansions to cover su±cient NNs 43 for the result sets, our approach caches extra NNs. An incremental result update mecha- nism that checks only the set of a®ected points to refresh the query answers is presented. Experimental results demonstrate that our approach scales better than existing tech- niques in terms of the communication cost and the outcome con¯rms the feasibility of the ASRs approach. However, the ASR algorithm can not handle the query requests e±ciently in a highly dynamic and arbitrary environment. 3.3 Arbitrary Movement Model for Moving Objects Todescribewhatmotivatesourwork,letus¯rstillustratehowthebestcurrenttechniques operatewithFigure3.10servingasanexample. Thegrayareasrepresentthesaferegions oftwomovingobjectsp 1 andp 2 . Atraditionalsaferegioniseitherarectangleorasphere which is determined by the set of surrounding queries [PXK + 02]. When an object moves outside of its safe region, it incurs a location update. From the example we can observe that, as p 1 or p 2 moves out of its safe region (in the direction of the arrow), it issues an unnecessary update because of the limited safe region information. Furthermore, the safe region of a moving object is determined based on its current location. When a query moves to a new location or a new query is inserted, the server triggers a location probe to the a®ected moving objects and re-calculates new safe regions for them. When receiving the location probes (downstream) from the server, the moving objects need to send their locations (upstream) back to the server. Once the server completes the safe region computations, it sends the safe regions (downstream) to those moving objects. Hence a total of three network messages are sent back and forth between the server and 44 each mobile client. As illustrated, the safe region approach incurs signi¯cant network tra±c in this scenario. In contrast, we propose a partition-based technique by de¯ning a grid-like LIT (also shown in Figure 3.10) which provides a moving object with a detailed view of the sur- rounding query locations across the terrain. As an additional advantage, a LIT is de- termined without referring to the locations of moving objects. If a query is inserted, the server can send the new LIT with the added query information (downstream) to the a®ected moving objects directly, and only a fraction of the mobile clients that receive the updated LIT must issue location updates (upstream) back to the server ({ namely if they are part of the new query result). Therefore, the number of network messages is reduced to at most two. The overall PLU process is discussed in detail in the subsequent chapters. Mobile Client 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 9 q 10 q 1 p 2 p LIT view Mobile-side LIT Traditional safe regions Figure 3.10: Illustration of concepts for the PLU approach. 45 3.3.1 Partition-based Lazy Location Updates 3.3.1.1 Data Structures We use the following data structures in the system, also shown in Figures 3.11 (a) and (b). P Object Grid serv LIT -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 1 1 0 0 0 0 -1 0 0 0 1 1 1 1 1 1 1 1 3 2 2 2 2 3 2 2 2 -1 0 1 1 1 1 2 3 3 3 ) , ( j i M Matching 3 2 1 0 4 0 1 2 3 4 2 L 1 L 5 5 0 L (a) Object grid and server-side LIT. (b) Levels. Figure 3.11: LIT data structures. ² Object Grid (G): A w£w object grid G is used to index the moving objects. We adopt a grid structure that is similar to the one proposed in [MHP05]. During an execution interval, each cell c(i;j) maintains a moving object list denoted by M(i;j). ² LIT serv : A server-side LIT consists of n£n cells of uniform side length ±. Each LIT serv cell maintains a pair of values hTBound;LIT serv :valuei where TBound is the actual coordinates of the boundary that the LIT serv cell corresponds to the terrain (Figure 3.11 (a) shows a boundary matching example) and LIT serv :value stores an integer value to indicate the query boundaries. Locating a LIT serv cell 46 for a moving object p with coordinates (x,y) can be done by calculating the index LIT serv (i;j) where i=b x ± c and j =b y ± c. ² LIT mob : Amobile-sideLIT isasubsetm£mtableextractedfromthelatestserver- side LIT with m· n. When a mobile object p obtains its mobile-side LIT which is extracted from LIT serv , we term that p references to LIT serv . Let (i;j) be the residingcellindexofpontheserver-sideLIT.Themobile-sideLIT ofpisextracted fromtheareaof[i¡`:::i+`;j¡`:::j+`]ontheserver-sideLIT where`isde¯ned as a level of the mobile-side LIT. Therefore, the size of a mobile-side LIT m is 2` + 1. Figure 3.11 (b) shows a two-level (`=2), 5£5 LIT mob for objects in cell (2;2); the levels are also illustrated in di®erent gray scales. ² MovingObjectSet(P): P isacollectionofmobileunits,eachofwhichisrepresented by a tuplehID;LocXY;updateTime; LIT:valuei where ID is the object identi¯er, LocXY is the latest reported positions, updateTime is the location update time, and LIT:value is the LIT value retrieved from the server-side LIT table created at the time when the object issues a location update. ² Query List (Q): Queries are organized via an in-memory sequential list Q. Each query entry is of the form hID;LocXY; InsTime;ANSi, where InsTime is the query insertion time, and ANS is a set of the query results. The ¯ve data structures presented above support the operation of our PLU technique as follows. Initially, startup mobile-side LITs are generated by the server and sent to each registered moving object. With the possession of an LIT, each moving object can locally determine a location update and send it only when the new location may a®ect 47 a query result. When an object issues a location update, it receives the latest mobile LIT in response with newly reported positions of query boundaries. An LIT mob assigned to a moving object is in general a subset table of the server-side LIT due to memory limitations of moving objects and to reduce communication costs. 3.3.1.2 LIT Details We¯rstdescribetheserver-sideLIT details. ALIT serv isgeneratedinitiallyattheserver and updated when one of the following two events happen: (1) an existing query changes its location or (2) a new query is registered with the system. The general attributes described in this section for the sever-side LIT are also applicable to the mobile-side LITs extracted from it. A mobile-side LIT simply inherits all the attributes and query boundary information from the server-side LIT. However, each moving object maintains (i.e., updates) the mobile-side LIT locally after receiving it from the server based on a speci¯c event. ALIT:valueforLIT serv (i;j)storesanintegernumberthatrepresentsasafe distance. The safe distance for LIT serv (i;j) is de¯ned as the minimal linear distance in cells from the LIT serv (i;j) cell to the nearest query boundary. We distinguish two cases when assigning a value to LIT serv (i;j): LIT:value ¸ 0, if LIT serv (i;j) does not overlap a query boundary; and LIT:value = ¡1, if LIT serv (i;j) is covered by a query boundary. Figure 3.12 (a) shows an object grid with a set of registered queries and moving objects on the terrain at time t 0 . The corresponding server-side LIT created at t 0 is illustrated in Figure 3.12 (b). 48 3 2 1 0 4 0 1 2 5 6 3 4 5 6 1 q 3 p 2 q 1 p 2 p 4 p 6 p 7 p 5 p 3 2 1 0 4 0 1 2 5 6 3 4 5 6 0 0 0 -1 -1 0 0 -1 -1 0 0 0 0 0 0 1 1 1 1 2 1 1 2 2 2 3 3 0 0 -1 -1 -1 -1 -1 0 0 1 1 0 0 1 2 1 2 2 1 -1 -1 -1 L1 L2 (a) Object grid at t 0 . (b) LIT serve at t 0 . Figure 3.12: The object grid and a server-side LIT example. In this example we assume that the server-side LIT size is the same as the object grid. The LIT values of the cells that overlap the boundaries of query q 1 and q 2 are set to -1. We de¯ne two types of cell zones: a border zone (LIT value = -1) and a zero zone (LIT value=0). Aborder zone consistsofcellsthatoverlapwiththeboundariesofsome queries. A zero zone is essentially a prediction zone which might be covered by nearby moving queries as time proceeds. Since a zero zone has a safe distance equal to zero, it is more likely to be covered by a moving query, say q 1 , soon. Both border and zero zones are important indicators for a moving object to decide on a location update. In order to predict the moving query locations, each moving object updates its local LIT and marks the new prediction cells as zero zones. The detailed update mechanism for mobile- side LITs will be described later. Algorithm 3 presents the pseudo code of generating a server-side LIT. Lines 2-4 assign -1 to the cells which are covered by a query boundary ¯rst and lines 5-11 calculate the LIT value for the rest of the cells. To compute a LIT value for LIT serv (i;j), the algorithm checks its surrounding cells level by level by calling 49 GetCellsAtLevel(LIT serv (i;j);l), where l is a level number. The procedure terminates theloopwhenacellwithaLIT valueequalto-1appears. ForexampleinFigure3.12(b), we obtain 1 as the LIT value for LIT(5;5), since -1s appear at level 2 where the loop is terminated. Algorithm 3 CreateLIT(G, Q) 1: Let LIT be a n£n table and initialize the value of each cell to1 2: for (every q2Q) do 3: Set the value equal to -1 for each LIT cell that is covered by the query boundary of q 4: end for 5: for (every cell LIT serv (i;j), which has the LIT value6= -1) do 6: Let LIT serv (i;j).value =0 and l =1 7: while (GetCellsAtLevel(LIT serv (i;j), l) returns C AND C6=Á) do 8: if (any cell in C has LIT value equal to -1) then break the loop 9: else Increment LIT serv (i;j):value and l by one 10: end while 11: end for 3.3.1.3 Mobile-Side Processing Each moving object independently performs the following two major tasks to achieve the desired location update tra±c reduction: progressive revision of the mobile-side LIT and determinationofwhentosendlocationupdates. Eachtimeamovingobjecttransmitsits location to the server, an up-to-date mobile-side LIT will be sent to the moving object. However, since we consider dynamic queries in this approach, the LITs are subject to change whenever the queries change their locations during the course of the execution. Instead of sending a new mobile-side LIT with the latest query locations to each moving objectrepeatedly, weproposeaperiodic LIT updatemethodtoindependentlyadjustthe mobile-sideLIT tore°ectallthepossiblequerymovementswhileensuringthecorrectness 50 ofthequeryresults. We¯rstdiscusshowamovingobjectupdatesitslocal LIT andthen describe the mechanisms for triggering a location update based on the mobile-side LIT. Mobile-side LIT Updates: Since the maximum speed ¸ for mobile units is lim- ited, we can estimate the possible query locations in the mobile-side LIT. Continuing the example shown in Figure 3.12, each moving object p is given a mobile-side LIT by the server as shown in Figure 3.12 (b). Without loss of generality we assume that the size for an object grid, server-side LIT and mobile-side LIT are the same in this example. Fig- ure3.13(a)illustratesthecurrentlocationsofmobileunitsattime t 1 andFigure3.13(b) shows the mobile-side LIT updated by p at t 1 . One can observe that in the worst case, by considering that a query may move to its surrounding cells in any direction, the area between two dashed rectangles shows all the possible coverage of the query boundary with such movements. Since a border zone may overlap more than one query boundary anywhere within the zone, the two solid rectangles represent the outermost query bound- aries of the zone. For simplicity, we draw two dashed rectangles inwardly (shrunk) and outwardly (expanded) by extending the solid rectangle by the length of the maximum moving distance  (= ¸£¢t) for every time instance. The cells that are newly covered by the area between the dashed rectangles become zero zones. As a ¯nal step, the LIT values of the remaining cells need to be updated by decrementing the LIT values by one when the surrounding cells become new zero zones. Location Update Check: The event-driven procedure for deciding on a location update is performed by the moving object only when it moves to a new location. We continue with the example of Figure 3.13 (a) that shows the new locations of queries and moving objects at time t 1 . Referring to the mobile-side LIT in Figure 3.13 (b), p 2 in 51 LIT mob (3;3) steps into a zero zone in LIT mob (4;2), so p 2 might overlap with a query at this moment. p 4 was in a border zone and it changed its location since the latest update totheserver, soitmayexitorenteraqueryboundary. Therefore, both p 2 andp 4 haveto issue a location update at t 1 . We describe the location update checking procedure in this section. Let LIT mob (i;j) be the cell in which an object originally resided since the last update to the server, and LIT mob (i 0 ;j 0 )(6= LIT mob (i;j)) be the cell where the object is currently located some time after it updated the server. To determine a location update request for p, we perform the following steps: Step 1: Check if p originally resided in a border zone LIT mob (i;j) since its last update to the server. If this is the case, p issues a location update and exits the checking steps. Otherwise, continue to Step 2. Step 2: Check if p currently stepped into a border zone or zero zone LIT mob (i 0 ;j 0 ). If this is the case, p issues a location update and exits the checking steps. Otherwise, continue to Step 3. Step 3: Check if p currently stepped into a cell LIT mob (i 0 ;j 0 ) which is outside of the mobile-side LIT boundary; p issues a location update and terminates the checking steps. In Step 1, we monitor the objects that are in the border zones. Since these moving objectsareclosetosomequeryboundaries,theyaremorelikelytoalteranyqueryresults. If a moving object moves into a border or zero zone as described in Step 2, it may enter a range query. Therefore, it has to report its current location to the server. In Step 3, when the moving object moves out of its local LIT boundary, it issues a location update. 52 3 2 1 0 4 0 1 2 5 6 3 4 5 6 1 q 3 p 2 q 1 p 2 p 4 p 6 p 1 q c 7 p 5 p 3 2 1 0 4 0 1 2 5 6 3 4 5 6 0 0 0 1 1 1 1 2 1 1 2 2 2 3 3 -1 -1 -1 -1 -1 1 1 1 2 2 2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 F F -1 -1 -1 -1 -1 -1 -1 3 2 1 0 4 0 1 2 5 6 3 4 5 6 3 p (a) Moving objects at t 1 . (b) Mobile-side LIT at t 1 . (c) One-level LIT mob . Figure 3.13: The object grid and mobile-side LITs. Figure 3.13 (c) shows a one-level 3£3 LIT for p 3 and after some time instances at t 3 , p 3 moves out of the LIT boundary to LIT mob (4;3), and therefore it must issue a location update at t 3 . The above procedure is called push mode, since the location updates are issued from the moving objects. We also propose a pull mode that is executed by the server to request a location update from the mobile clients. To take advantages of the pull mode approach, we describe the following scenario ¯rst. In Figure 3.13 (a), p 7 was located in a zero zone LIT mob (4;4) at t 0 and moves to LIT mob (5;3) at t 1 . At this time p 7 does not issue a location update since LIT mob (5;3) is neither a border nor a zero zone. However, the latest reported location of p 7 is still in LIT mob (4;4) at the server and it might be evaluatedasaqueryresultbecauseLIT mob (4;4)isazerozoneindicatinganearbymoving queryboundarymightcoverit. Therefore, p 7 needstoinformtheserveraboutitscurrent location for further validation. As a result, it incurs many unnecessary location updates from the moving objects as time proceeds (many zero zones appear due to the mobile- side LIT updates). We propose a server-side detection procedure ObjProbe to handle 53 this case. Since the server has the latest locations of the queries, it is able to determine whether LIT mob (4;4) is actually covered by some query. If this is true, the server sends thelatest LIT topandrequestsforitscurrentlocation. Inthisexample, sincetheactual new position of q 1 at t 1 does not cover LIT mob (4;4), the server does not need to send the latest LIT to p. 3.3.1.4 Server-Side Processing We implement an event-driven approach to handle the requests from mobile objects. There are two major server-side procedures: ObjProbe, which was discussed in the pre- vious section for handing query updates and QurInsert for processing query insertions. We focus on the QurInsert procedure in this section. When a new query q is inserted, instead of informing the entire registered moving objects population (that lack this new query boundary information) of the changes, the serverperformstheQurInsert proceduretodetermineasetofmovingobjectsO thatmay enter the new query boundary. Then it sends the latest mobile-side LIT to these objects only. First, QurInsert checks each moving object p in the set of LIT cells C, where the objects have the mobile-side LIT overlapping with R (the set of LIT cells covered by the new query boundary). Let c r 2 R be the nearest LIT cell of p. The procedure computes the minimum distance x between p and c r and the distance y between c r and the nearest border zone to R (denoted by c i ). If x > y, the server does not have to inform p of the query insertion. This is because through the mobile-side LIT updates on p, theareacoveredbyR willbecomezerozonesbeforepmovesintothatarea. Therefore, the query insertion will not cause any missed location updates. To estimate the distance 54 y for object p on the server side, the procedure checks the LIT value of c r , because it represents the nearest distance (in cells) to the border zone. Since the server does not retainthepreviousserver-side LITs inordertoachievememorye±ciency, weobservethe following lemma to estimate a LIT value for c r based on the p:LIT:value stored on the server. Lemma 3.3. Let (i;j) and (i 0 ;j 0 ) be the cell indexes of the cell c p where p resides and c r respectively and let k = max(ji¡i 0 j, jj¡j 0 j) be the distance in cells between c p and c r . Since LIT values are correlated, The LIT value for c r can be set as follows: LIT(i 0 ;j 0 ):value= 8 > > > > > > < > > > > > > : LIT(i;j):value¡k (1), the best case, or LIT(i;j):value (2), or LIT(i;j):value+k (3), the worst case We consider the worst case to ensure the correctness of query results; therefore, we use Equation (3) to estimate the LIT value of c r . Finally, we obtain y by multiplying the LIT value of c r by ± to convert the LIT value into an actual distance. When q moves to a new location later, the system needs to perform the QurInsert procedure again to inform the moving objects that have neither been informed of the query insertion by QurInsert previously nor been assigned a new LIT with q's location. The pseudo code of the QurInsert procedure is shown in Algorithm 4. Consider the following example. A new query q registers with the system at t 1 . Assume that each moving object is assigned a 3£3 (level ` = 1) mobile-side LIT. Figure 3.14 shows an object p 2 C with its LIT mob . The new query q covers the gray area 55 Algorithm 4 QurInsert(q i ) 1: O =Á 2: Let R be the set of LIT cells covering by the range of query q i . 3: Let C be the set of LIT cells where the objects' mobile-side LIT overlapping with R 4: for (each object p2C) AND p has not been noticed of q i yet do 5: Let c r 2R be the closest cell to p 6: Let v be the LIT value of c r estimated by Equation (3) 7: Let x=mindist(p;c r ) 8: Let y = v£± 9: if (x·y) then 10: Insert p into O 11: end if 12: end for 13: Return O R. Since a one-level mobile-side LIT is assigned to each moving object, the area C is one-level larger than R. Assume that p:LIT:value = 1, so the estimated LIT value of the closest cell c r at(2;3) is 2. Therefore, the border zone on p:LIT mob could be two cells apart from c r . Since x < y, p may reach c r before c r becomes a zero zone through p's mobile-side LIT updates. Therefore, the server needs to inform p of the new insertion. 0 6 6 1 2 3 4 3 2 1 0 4 5 5 q R y x C p mob LIT r c i c p c Figure 3.14: A query insertion example. The pseudo code of Algorithm 5 represents the complete server-side event-driven system procedure. In Line 7, the QurInsert procedure determines a set of objects that 56 may be a®ected by a newly inserted query. Line 9 sends mobile-side LITs directly to objects O (downstream), because a mobile-side LIT is extracted from a server-side LIT, which is obtained based on the query locations only. The system can send the latest mobile-side LITs to O without probing their locations ¯rst. The mobile-side LITs here are extracted based their latest reported locations on the server. Therefore, each object o 2 O can determine a location update based on the new mobile-side LIT. Any object o in O issues a location update (upstream) only when it is part of the result of the new query. Likewise,ObjProbe inLine13obtainsasetofobjectswhoselastreportedlocations at the server are covered by existing queries. Line 15 sends the latest mobile-side LIT to those objects and at the same time requests for the current locations (downstream). The moving objects then send back their current locations to the server (upstream). By applying these techniques our goal of reducing the overall network tra±c is achieved. 3.3.1.5 Spatial Data Compression for Mobile-side LITs Each moving object obtains a m£m mobile-side LIT. While a large value for m provides more location information to a moving object, the data streams for the mobile-side LIT need to be broken up into more packets that adversely a®ect performance. We use the Internet standard for the largest amount of data packet payload size (MTU) equal to 1500 bytes. In some prior techniques, such as the safe region approach, basically only a pair of x- and y-coordinates is sent to each moving object. Hence such information can be easily packed into a single packet. Our goal is to use existing data compression techniques to condense a LIT enough to ¯t into one packet. We apply three lossless data compression methods: quantization, run-length encoding (RLE) and Hu®man encoding. 57 Algorithm 5 Server-side Main Procedure 1: while (there is a request from a mobile unit r) do 2: Let B be a bu®er 3: if (the request is a moving object insertion) then 4: Insert r to B 5: else if (the request is a query insertion) then 6: Insert r into the query index 7: Call QurInsert(r) that returns a set of objects O 8: Perform CreateLIT(G,Q) to generate a new server-side LIT. 9: Send each o2O the latest mobile-side LIT 10: else if (the request is a moving object or query deletion) then 11: Remove r from the system 12: else if (the request is a query update) then 13: Call ObjProbe(r) that returns a set of objects Z 14: Perform CreateLIT(G,Q) to generate a new server-side LIT. 15: Send each z2Z the latest mobile-side LIT and request for z's current location 16: Insert Z to B 17: if (r is an inserted query during the execution) then 18: Call QurInsert(r) that returns a set of objects O 0 19: Send each o2O 0 the latest LIT 20: end if 21: else if (the request is a moving object update) then 22: Insert r to B 23: end if 24: end while 25: Update/insert b's location and cell index,8b2B 26: Call RangeQuery(Q) to retrieve query results 27: For each b 2 B, Sent the latest mobile-side LIT to b if it does not have the latest LIT First,wede-correlatetheLIT valuesbysubtractingpairsofadjacentLIT numbers. This is e®ective since the LIT values are highly correlated or spatially redundant. As a result, the di®erence values are repetitive and smaller than the original LIT values. Secondly, RLE isutilizedtotakeadvantageofthelargeamountofspatialredundancyina LIT.An extra marker bit is used to distinguish the count bit and number values. For both steps, we use a Hilbert curve as the data scanning path along which we subtract pairs of LIT numbers and count repeated numbers recently. Finally, we performed Hu®man encoding 58 which is based on the frequency of occurrence of a data item and uses a lower number of bits to encode the data that occur more frequently. Hu®man encoding following RLE is a natural choice since the result of RLE is a set of symbols representing run lengths that occur with varying frequencies. Overall, our experimental result shows the combination of these methods can reduce the size of a LIT by up to 79% from its original size. To pack a mobile-side LIT within a 1500 bytes packet, we suggest a maximum table size of 64£64 for a mobile-side LIT. 3.3.2 Experimental Evaluation 3.3.2.1 System Implementation Inthissection, wedescribethe PLU algorithmandtheaspectsofserver-sideandmobile- side processing. Figure 3.15 illustrates the detailed system °ow chart. Any updates from query or data objects (A) are handled di®erently by the ObjUpdate Operation based on the event types. If the request is a query update (B) or a new query registration (C), an ObjProbe Operation or a QurInsert Operation is performed, respectively, to determine a set of candidate data objects that may become new query answer points. After the data objects are re-indexed (D), the system proceeds to (E) which triggers the LIT Gener- ation and Update Operation. A new mobile-side LIT with an update °ag is extracted, compressed and sent to each candidate object (F). The system then performs a Query Evaluation (G) to compute new query results after receiving the location updates from the probing objects. In case that the request is a data object update, a new, compressed LIT is sent directly to the object after (object) indexing. On the mobile side, ¯rst, a mo- bile unit retrieves its current location from its GPS tracker (H). Together with the Query 59 Object Movement Prediction module to estimate possible surrounding query movements (I), the mobile unit determines whether a location update is necessary through the Loca- tion Update Check procedure. In the following sections, we describe the details of LITs and the aspects of the server and mobile task modules in PLU system. Compressed mobile-side LIT with an update flag Query Results Mobile Task Module GPS Location Tracker Location Update Check Query Object Movement Prediction ObjProbe Operation Mobile-side LIT Extraction & Data Compression LIT Generation & Updates Query Evaluation ObjUpdate Operation a query update Server Task Module a query insertion QurInsert Operation Object Storage object indexing update or registry A B C F E G H I D Figure 3.15: PLU system °ow chart. 3.3.2.2 Simulation Steps We evaluate the performance of the PLU algorithm by comparing it with the safe region approach [HXL05, PXK + 02] and the traditional periodical update approach (PER). For a fair comparison, we extended the safe region approach to handle dynamic queries. To process a query insertion, the safe-region approach probes a set of objects whose safe regions overlap with the boundary of the new query. In the case when a query changes its location, it is treated as a query insertion. We implement the extended safe region approach with safe rectangles (SR*-Rec) and safe spheres (SR*-SP). We use a main memory 100£100 grid as the underlying index structure for all the approaches. Our data sets are generated on a terrain service space of [0;1024] based on 60 the random walk mobility model [McD99]. Each object moves with a constant velocity until an expiration time. The velocity is then replaced by a new velocity with a new expiration time. For the range query, the query boundary is a square and the side length q len is in the range of [1,10]. The number of queries is up to 1k. The maximum moving distancepertimestepforanymovingobjectisintherange[0:48;1:25],whichcorresponds to a speed of 35 to 90 miles per hour. The mobility f move (the percentage of objects that move within a time step) for the objects is set in a range from 10% to 100%. We select an optimal size n for the sever-side LIT from [64, 512] per side and choose the level ` from [1,10] for a mobile-side LIT. The main measurement in the following simulations is the number of network messages sent between the server and moving objects. At the server side, we count the number of messages (downstream) from probing an object's location and sending an LIT to an object; at the mobile client side, we count the number of messages (upstream) from issuing a location update to the server. The communication cost is measured by assuming that the cost of a upstream message (c up = 1) is twice as costly as a downstream message (c down = 0:5). Experiments are conducted with a Pentium 2.4 GHz CPU and 1 GByte of memory. The query results are evaluated in an event-drivenapproach. Ourexperimentsuseseveralmetricstocomparethesealgorithms. Table 3.3 summarizes the default parameter settings in the following simulations. 3.3.2.3 LIT Size First, we measure the overall number of network messages including upstream and down- stream directions of the PLU algorithm by varying the server-side LIT size. The choice of the server-side LIT size is a trade-o® between the number of network messages and 61 Parameter Default Range P 100k - Q 1000 300, 500, 700, 1000 f move 50% 10%, 30%,50%, 70%, 100% ¸ 1.25 0.48(35mph)-1.25(90mph) q len 5 1, 5, 10 n 256 64, 128, 256, 512 ` 5 1, 5, 10 Table 3.3: Simulation parameters for the PLU approach. the server performance. Figures 3.16 (a) and (b) show the number of overall network messages and CPU overheads v.s. the LIT sizes ranging from 64£64 to 512£512, re- spectively. When the LIT size is set to more than 512 per side, the performance of PLU is degraded in terms of the number of network messages and CPU time because it incurs more LIT value calculations for all the LIT cells. The LIT size 256£256 constitutes a good tradeo® between the number of network messages and CPU time. Therefore, 256£256 is chosen as the server-side LIT size for the rest of our experiments. Next we examine the size for a mobile-side LIT. Figure 3.16 (c) measures the e®ect of varying the size of the mobile-side LIT from level 1 (3£3) to level 10 (21£21) in terms of network messages. The size of a mobile-side LIT signi¯cantly a®ects the number of networkmessages. Whenamobile-sideLIT issmall,amovingobjectissuesmorenetwork messages because it has more chance to move out of the LIT boundary; when a mobile- side LIT is large, it also incurs more network messages from the query insertion process since the procedure needs to check more objects from a larger area where the moving objects have the mobile-side LITs overlapping with the new query boundary. We choose ` = 5 as the mobile-side LIT size for the remaining experiments, because it achieves better performance in terms of the network messages. 62 16 18 20 22 24 26 28 512 256 128 64 Network Messages (k) Number of Cells (Per axis) PLU 4 6 8 10 12 14 512 256 128 64 CPU Time (Sec) Number of Cells (Per axis) PLU (a) Server-side LIT size (n). (b) Server-side LIT size (n). 15 16 17 18 19 20 10 5 1 Network Messages (k) LIT level PLU (c) Mobile-side LIT size (`). Figure 3.16: Performance v.s. LIT size. 3.3.2.4 Query Coverage The query coverage on the terrain is a crucial factor in the performance of continuous query algorithms. The query coverage varies with the number and side length of the queries. Figure 3.17 (a) shows the network messages as a function of the number of queries and Figure 3.17 (b) illustrates the corresponding communication cost. Overall, the number of network messages and communication cost increase as a function of the number of queries, because the chance of moving into the query boundaries for a moving 63 10 20 30 40 50 60 70 80 1000 700 500 300 Network Messages (k) Number of Queries PLU SR*-SP SR*-Rec PER 2 4 6 8 10 12 14 1000 700 500 300 Communication Cost Number of Queries PLU SR*-SP SR*-Rec PER (a) Q v.s. network messages. (b) Q v.s. communication cost. 20 40 60 80 100 120 10 5 1 Network Messages (k) Side Length of Queries PLU SR*-SP SR*-Rec PER (c) q len with Q=1000. Figure 3.17: E®ect of query coverage with Q and q len . object is high. PLU achieves a signi¯cant reduction in the number of updates compared to the other techniques. For the PER approach, since the server does not perform any computations regarding the location update reduction, we only count the number of networkmessagessentfromthemobileclients. PER approachisindependentofthequery coverage, becausethenumberofupdatesdependsonthemobilityonlyin PER approach. Therefore, the network messages remain the same in this simulation. In Figure 3.17 (c), we evaluate the side length of queries with the values [1, 5, 10]. Obviously, when the length of queries increases, SR*-Rec and SR*-SP incur more updates, because SR*-Rec 64 and SR*-SP perform server-side probes to those objects which have the safe regions overlapping with the queries. When the length of the queries increases, the server needs to probe more moving objects when queries change to new locations or when new queries are inserted. When q len = 10, SR*-SP has the same network messages as the baseline PERapproach. Thesimulationresultscon¯rmtheimportanceofadoptingPLU approach whichsigni¯cantlyreducesthenetworkmessagesandhencedecreasesthecommunication cost. 3.3.2.5 Mobility 20 40 60 80 100 120 140 160 100 70 50 30 10 Network Messages (k) Object Mobility PLU SR*-SP SR*-Rec PER 0 5 10 15 20 25 30 100 70 50 30 10 Comminication Cost Object Mobility PLU SR*-SP SR*-Rec PER (a) f mov v.s. network messages. (b) f mov v.s. communication cost. 0 10 20 30 40 50 60 70 80 90 100 70 50 30 10 CPU Time (Sec) Object Mobility PLU SR*-SP SR*-Rec PER (c) f mov v.s. CPU time. Figure 3.18: Performance v.s. object mobility. 65 Finally,weevaluatetheimpactofthemobilityrate. Figure3.18(a)showsthenumber of network messages as a function of the object mobility and the communication cost is also shown in Figure 3.18 (b). The PLU approach achieves a higher location update reductionthantheotherthreeapproachesforallmobilityrates. Figure3.18(c)illustrates theCPUtimev.s.theobjectmobility. AlthoughPLU appliesmoreserver-sideprocedures (e.g.DectObj and QurIns) to reduce the network messages. PLU still has a competitive CPU performance with SR*-SP. However, SR*-Rec has the worst performance in terms ofnetworkmessages/communicationcostandCPUoverheads. Thedegradationiscaused by the expensive calculations of safe rectangles. SR*-Rec in general computes larger safe regions for moving objects than SR*-SP, so SR*-Rec incurs many server-side probes to the moving objects when queries change their locations. 3.3.3 A Message-E±cient Prototype for Location-Based Applications The PLUS system is designed to e±ciently track moving object locations on a road network and execute continuous spatial queries in support of location-based services. PLUS implements the novel lazy position update (PLU) mechanism that signi¯cantly reducesthecommunicationoverheadandserverindexingloadrelatedtofrequentlocation updates in moving object and moving query scenarios. PLUS maintains a Location InformationTable (LIT)oneachmobiledevice. ALIT isagriddatastructurewhereeach cell stores a value that represents its distance to the closest query boundary. Figure 3.19 shows an example of a mobile-side LIT cached in the local memory of a mobile client, where the cell elements with zero value represent the area overlapping with the three example query boundaries. 66 Mobile Client 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 9 q 10 q 2 p LIT view Traditional safe region 11 q 13 q 14 q 1 p Mobile-side LIT 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 -1 -1 -2 -1 0 0 Figure 3.19: Traditional safe regions and LITs. In the example shown in Figure 3.19, rectangles are a set of moving queries and the gray area represents the traditional safe region of moving object p 1 . As p 1 moves out of its safe region (in the direction of the arrow) to a new location that is irrelevant to the query results, it issues an unnecessary update because of the limited safe region informa- tion. Furthermore, the safe region of a moving object is determined based on its current location. In some cases the server must probe a client object about its current location. Inresponsetosuchaprobe(whichconsistsofonedownstreammessage)theclientreplies with its current position (one upstream message) and the server determines a new safe regionwhichitsendsbacktotheclient(onedownstreammessage). Henceatotalofthree network messages are sent back and forth between the server and each mobile client. In particular, the PLUS system possesses the following distinguishing characteristics: ² Partition-based lazy update continuous query processing. PLUS performs and visualizes a novel partition-based lazy update continuous query algorithm with a large number of mobile users. 67 ² Scalability. PLUS mobile users utilize Location Information Tables to index queriestoreduceupdatemessagesandhenceimprovesystemscalability. ThePLUS demonstration system computes and displays comparative results of the LIT-based approachandtraditionalsaferegiontechniques[HXL05,PXK + 02]toillustrateper- formance bene¯ts. ² Realistic movement on road networks. The movement of mobile users in PLUS is based on underlying real-world road networks from the TIGER/Line data set. Mobile users in PLUS automatically travel on road segments and the velocity of the movement is determined by the speed limit of each road segment. 3.3.3.1 System Architecture Mobile unit Server & Spatial Database Cached LIT Location update to Server Mobile unit Cached LIT Query Result & LIT update to Mobile unit Figure 3.20: PLUS system infrastructure. Figure 3.20 illustrates the infrastructure of the PLUS system. We assume that the communication between the centralized server and the mobile units are through cellular or WiMAX networks. The mobile units such as vehicles or hand-held devices (e.g., cell 68 phones and PDAs) are able to provide the server with their positions from a built-in GPS locator. PLUS consists of two major components: the Server Task Module and the Mobile Task Module. The server module supports an event-driven mechanism to handle all the object requests such as object location updates or new query insertions. Furthermore,themodulereducesthedatasizeofaLIT beforetransmittingittoamoving object in order to ¯t the information into a single network packet. For the mobile units, we assume that each device has enough computational capabilities and memory space to carry out the required tasks. On top of an underlying road network, a mobile unit can move arbitrarily without exceeding a predetermined maximum speed limit. For the demonstration purpose, a mobile-side application is provided to emulate a real mobile client. To observe the scalability of the system, a set of virtual mobile units is generated. Therefore, the queryevaluationis performed onboth thevirtual and real mobileclients. 3.3.3.2 Server and Mobile Task Modules The Server Task Module supports an event-driven mechanism to handle a request from a mobile unit which it dispatches to perform a speci¯c operation based on the event type. There are four operations implemented in the system: Object Update (object location update event), Object Position Probing (query update event), Query Insertion (query insertion event), and Data Compression. ² An Object Update (ObjUpdate) Operation re-indexes the location of a data or query object on the server based on its transmitted coordinates. ² An Object Position Probing (ObjProbe) Operation triggers smart on-demand location 69 probes to reduce the number of location update messages when the request is a query update. TheoperationdeterminesasetofobjectsandsendsthemthelatestLIT encoded with an update °ag to notify the mobile objects to respond with their current locations. ² A Query Insertion (QurInsert) Operation determines a set of a®ected objects without causing any missed location updates when a new query is inserted. A new LIT with a \delay" update °ag is then sent to each a®ected object, which replies to the server with its current position only when objects fall into the query boundary. As a result, only a fraction of the a®ected objects must issue location updates. ²ADataCompressionOperation compressesamobile-sideLIT toreducethedatastream size. Whileamobile-sideLIT providesmoredetailedqueryboundaryinformationthana saferegion, thedatatransmissionofapotentiallylarge LIT needstobebrokenintomore packetswhichmayadverselya®ectperformance. Inthe PLUS demonstrationsystem, we use the Internet standard for the largest data packet payload size (MTU) equal to 1500 bytes. We apply three consecutive lossless data compression methods: delta encoding, run-lengthencoding(RLE)andHu®manencoding. First, wede-correlatethe LIT values bysubtractingpairsofadjacentLIT numbers. Second, RLE isutilizedtotakeadvantage ofthelargeamountofspatialredundancyinaLIT andweuseaHilbertcurveasthedata scanning path along which we count repeated numbers. Finally, we performed Hu®man encoding which is based on the frequency of occurrence of a data item and uses a lower number of bits to encode the data that occur more frequently. To complete, the process invokes the mobile-side LIT extraction. The compressed LIT,combinedwithanupdate°agissenttothesetofmovingobjectsdeterminedbythe previous operation. Finally, the query evaluation procedure is executed to retrieve the 70 query answers, and the result is sent back to the query objects. The Mobile Task Module includes two major functions: (1) query object movement prediction, and (2) location update check to determine whether a location update is necessary. The ¯rst function predicts the possible movements of queries by updating the LIT cells to prediction zones which might be covered by nearby moving queries as time proceeds. Next, each mobile unit determines a location update by referring to its current location detected by the GPS location tracker and the revised mobile-side LIT. If an object steps into a query boundary zone or a prediction zone indexed in its LIT, it issues a location update. 3.3.3.3 System Demonstration Figure 3.21: PLUS main interface. 71 Figure3.21showsthePLUS server-sideinterface. Wealsoimplementedawell-known safe region update scheme [PXK + 02] using a sphere shape (SR*-SP for short) and a periodic scheme (PERIODIC). The top panel on the interface visualizes the server view of the object movements on the real-world road segments in the Los Angeles area. In the ¯rst tabbed panel (the lower part of the frame), the user can compare the location update frequency of PLUS, SR*-SP, and PERIODIC by observing a dynamic line chart which shows the number of location updates for the three approaches over time. The CPU performance corresponding to the three approaches is shown in the second tabbed panel. Mobileunitsaredistinguishedbasedontheirtypes(aqueryormovingobject)and status (e.g., issuing location updates) by marking them in di®erent colors. In addition, a query is shown with a rectangle as the query boundary. For example, query no. 37 contains one answer point (no. 30). PLUS is controlled via user-selectable parameters (e.g., maximum speed, number of query and data objects). In particular, a user can specify the number of mobile-side and server-side LITs to explore the impact of the LIT size. Figure 3.22 illustrates the interface of the spatial data compression module (using a Hilbert curve as the data scanning path) for a LIT. The user can upload an existing LIT and execute the data compression methods selected from the provided options. The resulting compression reduction is shown in the status area. We also provide a mobile-side interface (PLUS-client) shown in Figure 3.23 for a mobile user to interact with PLUS. The PLUS-client is a web-based application where the geographic region generated from the TIGER/Line is the same as the road network imported to the server-side PLUS. The mobile client can specify the starting and ending points of a path and then we adopt the A ¤ search algorithm to ¯nd the travel routes for 72 Figure 3.22: The data compression module. the client. The average moving speed, query range and the trajectory information are sent to PLUS. The rectangle in the ¯gure represents the query boundary and the points of interests shown in the rectangle are the query answer points sent back from PLUS. Figure 3.23: PLUS-client interface. 73 3.3.4 Summary We have designed a partition-based lazy update approach for highly dynamic environ- mentswheremobileunits(includingqueries)mayfreelychangetheirlocations. Thenovel concept of a Location Information Table is introduced to provide a mobile object with information about queries, hence enabling it to estimate query movements and transmit a location update to the server only when it a®ects the query results. To further reduce network tra±c the server uses smart on-demand location probes. Finally, the proposed mechanism e±ciently determines the set of objects that are a®ected by a query insertion, improving scalability. The data structure for a LIT is larger than for a safe region and wepresenta spatialdatacompressionmethodtoreduceitssizeand ¯titintoanInternet packet. Experimental results demonstrate that our approach scales better than existing techniques as it reduces the update message tra±c by 10% to 28% compared with the extended safe sphere method, and from 10% to 70% for the periodic approach. These results are even more noteworthy as we compare our technique with an enhanced version of the safe region approach that we improved to handle moving queries. 74 Chapter 4 E±cient Query Result Updates for Skyline Computations 4.1 System Overview and Assumptions The formal de¯nition of skyline points in totally-ordered (TO) d-dimensional space is a distinct object set P, where any two objects p i = (x 1 ;:::;x d ) and p j = (y 1 ;:::;y d ) in the set satisfy the condition that if for any k;x k <y k , there exists at least one dimension of m· d that satis¯es x m > y m . We say p i dominates p j (p i ` p j for short), i® x k < y k , 8k (1·k·d). Price Distance 1 p 2 p 4 p 3 p 5 p 6 p 7 p Figure 4.1: A skyline query example. 75 For example, Figure 4.1 shows a skyline in two-dimensional (distance and price at- tributes) data set. All the points of the skyline consisting of p 1 , p 2 , p 4 , and p 7 . are not dominated by any points. p 3 is not a skyline point because both its price and distance are worse than p 2 . Therefore, p 2 dominates p 3 . In many applications, some data dimensions { for example in the form of hierarchies, intervals and preferences { are partially-ordered (PO), where some data lack preferences (e.g., non-speci¯ed data nodes). In Figure 4.2 (a), domains d 1 and d 2 are totally-ordered whereas domain d 3 , with the options (or values) of A, B, C, and D, is partially-ordered. Oneachpartially-ordereddomain,everyuser(i.e.,aquery)maydeclareauser preference pro¯le which describes the preference order among the options. Figure 4.2 (b) shows the userpreferencepro¯lesg 1 andg 2 ofthecorrespondingqueriesq 1 andq 2 . Suchanordering may, for example, represent the preferences that a frequent traveller has with regards to °ying with di®erent airlines. We use a directed acyclic graph (DAG) to represent a user preference pro¯le and each node in the graph denotes a value in a partially-ordered domain. Pro¯le g 1 declares a preference on all the options in the domain, while the pro¯le g 2 speci¯es an ordering only on options A and B (i.e., the associated query has no preferenceonoptionsC andD). NodesC andDareunspeci¯ednodes. Withdi®erentuser preference pro¯les, the results of the skyline queries are di®erent. For preference pro¯le g 1 , point p 5 dominates p 8 , because the dimensional data in both the totally-ordered and the partially-ordered domains for p 8 are equal or worse than for p 5 (recall that query q 1 prefers A to C). Note that p 6 cannot dominate p 9 (although all the TO attributes of p 6 are better than the TO attributes of p 9 ), since the pro¯le has an equivalent preference betweenB andD (i.e.,B andD aresiblingnodes). Forg 2 ,sincetheuserdoesnotspecify 76 any preferences for the nodes C and D, the query processor allows the dominance of two data tuples with the same PO values. For example, p 10 dominates p 8 , and p 9 dominates p 7 . Therefore, the rest of the non-dominated data tuples with PO values equal to C or D must be preserved as skyline points. id d 1 d 2 d 3 p 1 1.8 0.3 A p 2 2.0 0.3 A p 3 1.8 0.3 B p 4 1.2 1.0 B p 5 1.4 1.0 A p 6 1.0 1.0 B p 7 2.0 1.0 D p 8 1.8 1.0 C p 9 1.5 1.0 D p 10 0.8 1.0 C User Preference Profile Skyline Results g 1 p 1 , p 5 , p 6 , p 10 g 2 p 1 , p 5 , p 6 , p 9 , p 10 (a) Sample data set. (b) DAGs and query results. Figure 4.2: Partially-ordered skyline query example. The traditional methods to execute queries over totally-ordered domains cannot ef- ¯ciently handle data sets with partially-ordered domains. Current solutions ([CET05, SPP09]) convert each partially-ordered domain data column into integer intervals that enable the traditional index-based skyline algorithms (e.g., BBS) to handle such queries. The TSS [SPP09] method enhances the pruning ability and progressiveness of this idea further by applying topological sorts on the user preference pro¯les. Skyline query computations with partially-ordered domains are very computationally complex in higher dimensions. The cost of the query evaluation process increases as both 77 the number of options for a partially-ordered domain or the number of partially-ordered domains increase. Therefore, existing systems are often unable to provide up-to-date query results with quick response times. To address this challenge we propose a novel approachtermedCachingSkylinesforE±cientSkylineComputations (C-SKY,forshort). ThemaincontributionofC-SKY isthatitcachespreviousquerieswithboththeirresults and user preference pro¯les such that the query processor can rapidly retrieve a skyline result set for a new query from a set of existing candidate queries with compatible user preference pro¯les. One of the innovations of the approach lies in our proposed similarity functionthatmeasuresthedegreeofclosenessbetweentwouserpreferencepro¯les. Since thequeryprocessordirectlyaccessesarelativelysmallcandidateresultsettoretrievethe skyline points for a new query, the response time of the skyline computation can be greatly reduced. To respect the generally limited cache space, we also propose a novel cachemanagementapproachthatonlyreservesthemostpopularuserpro¯lesandreduces the number of false hits. In this dissertation, two novel e±cient skyline update approaches (ESC and C-SKY) are proposed for totally-ordered domains and partially-ordered domains in Chapter 4.2 and Chapter 4.3, respectively. 4.2 Skylines with Totally-ordered Domains Thegeneralsetupoftheproblemconsistsofasetofdynamicqueryanddataobjectswith d dimensions. Moving objects can freely maneuver in an unrestricted and unpredictable fashion, meaning that their parameters x k may arbitrarily change their values. The 78 major challenging issue of a continuous skyline query is to avoid unnecessary dominance checking on irrelevant data points for skyline query result updates. After observing the BBS algorithm [PTFS03], we deduced that when evaluating the skyline query result, a set of second skyline (S2) points can always be obtained with little extra work while retrieving the ¯rst skyline (S1) points. We refer to the traditional skyline query result as the ¯rst skyline, consisting of S1 =fs 1 1 ;:::;s m 1 g. The second skyline S2 =fs 1 2 ;:::;s k 2 g is de¯ned as follows: De¯nition 4.1. A data point p is a second skyline point i® p 2 (P ¡S1) and @p 0 2 (P ¡S1¡p), p 0 ` p. Informally, all S2 points are dominated by S1 and the rest of the data points (P ¡S1¡S2) are dominated by both S1 and S2. When a S1 point s i 1 is removed or at least one value of its dimensions changes, the S2 points are naturally considered as new S1 point candidates to \substitute" s i 1 . The features of a S2 set are as follows: (1) it is a pre-computed set that covers all the new S1 candidatepoints, and(2)S2isarelativelysmalldataset. Therefore, withtheknowledge of S2, the query processor can e±ciently update the query result and provide a quicker response time to the query requester. An example is shown in Figure 4.3. If the S1 point s 2 1 moves to Region I, the search space for ESC to update the query result only involves the S1 and the S2 sets. In this case, s 2 1 remains a S1 point, but it dominates s 1 1 . ESC needs to remove s 1 1 from the S1 set and s 1 1 becomes a new S2 point, since no existing S2 point can dominate it. Due to the movement of s 2 1 , ESC searches for new S1 points from the S2 set. Since s 2 2 (an exclusive data point) is left un-dominated, s 2 2 becomes a new S1 point and is removed from the S2 set. 79 1 1 2 8 7 6 5 4 3 3 1 S 5 1 S S1 2 1 1 S 9 8 7 6 5 4 3 4 1 S Region I 9 edge of the universe S2 1 2 S 2 1 S 2 2 S Region II Region III 6 p Figure 4.3: S1 and S2 sets. The ESC algorithm delegates the necessary S2 maintenance (an independent pro- cedure from S1 updates) to the query processor after S1 updates are completed. For example, new S2 points must be retrieved to substitute s 2 2 . To avoid scanning through theentiredatapointsinRegionIII fornewS2points, weproposean approximate exclu- sive data region (AEDR) computation in contrast to a traditional exclusive data region (EDR) computation. Based on our observation and analysis, we provide the lemmas for incrementally updating the skyline query results in the following sections. Table 4.1 summarizes the symbols and functions we use throughout the following sections. 4.2.1 E±cient Updates for Continuous Skyline Computations 4.2.1.1 Second Skyline Computation The existing work [PTFS05, WAEA07] performs time-consuming exclusive data point computationsfortheskylinequeryresultupdates. InFigure4.4, thegrayareasrepresent the traditional EDRs that contain exclusive data points. An EDR is not usually pre- computed because of the complexity of the calculation. In contrast, since the S2 points 80 Symbols Descriptions P Number of data objects d Number of dimension S1 First skyline point set (traditional skyline query result set) S2 Second skyline point set DataRtree Disk-based Rtree for indexing P S1Rtree Main-memory Rtree for indexing S1 points S2Rtree Main-memory Rtree for indexing S2 points EDR(p) A set of data points in the exclusive data region AEDR(p) A set of data points in the approximate exclusive data region W(p) A set of skyline points in the dominance area of p p.DomArea The dominance area of p Table 4.1: Symbols and functions for the ESC approach. (new S1 candidates) can be easily computed before any S1 point issues an update, the query processor is able to satisfy a query request with the latest query result and with a quicker response time. To further reduce the search space of visiting S2 points to update the skyline query result, we introduce and de¯ne a dominance set for each S1 point s i 1 . A dominance set contains a group of S2 points which are dominated by s i 1 (denoted by D(s i 1 )) to substitute a removed or moving s i 1 point when the dominance relationship has changed. For example in Figure 4.4 the dominance set of s 2 1 includes s 2 2 . If s 2 1 is removed, ESC onlycheckstheS2pointsinD(s 2 1 ), insteadoftheentire S2points. Inthisexample, s 2 2 becomes a new S1 point, so it is removed from S2. We formally de¯ne a dominance set and establish Lemma 4.1 which states that a dominance set must contain all the necessary S1 candidate points as follows: De¯nition 4.2. Dominance Set: D(s i 1 )) Adominancesetofaskylinepoints i 1 (denotedbyD(s i 1 )=fs r 2 ;:::s v 2 g)isaS2subsetwhere 8s w 2 2 D(s i 1 );s i 1 ` s w 2 , and 0· (s w 2 :mindist¡s i 1 :mindist)· (s w 2 :mindist¡s t 1 :mindist), 81 1 1 2 8 7 6 5 4 3 5 1 S 2 1 1 S 9 8 7 6 5 4 3 4 1 S 9 2 1 S 3 1 S 1 2 S 2 2 S 3 2 S 4 2 S 1 p exclusive data region Figure 4.4: Dominance set v.s. EDR set. 8s t 1 2 (S1¡ s i 1 ). Each D(s i 1 ) is exclusive from any other dominance set; therefore, S2=D(S1), where D(S1)=D(s 1 1 )+:::+D(s m 1 ) and m is the size of S1. Lemma 4.1. Given a dominance set D(s i 1 ). Let A be the skyline points extracted from EDR(s i 1 ). D(s i 1 ) must contain A (A is a subset of D(s i 1 )). Proof: (By contradiction) Let p 2 A be a point not included in D(s i 1 ). This is a contradiction, since p is only dominated by s i 1 . Therefore, it must be in D(s i 1 ). It follows that D(s i 1 ) must contain all points in A. In Figure 4.4, D(s 2 1 ) =fs 1 2 ;s 2 2 g contains two S2 points in the set which is a superset of A = fs 2 2 g. One can observe that some non-exclusive S2 points (e.g., s 1 2 and s 4 2 ) can be assigned to di®erent dominance sets. Intuitively, the S1 point with the minimal mindist to the query point (which has the largest dominance area) may dominate the most S2 points. Thus, it might produce a load imbalance problem because the query processor needs to perform many dominance checks when a skyline point with a short mindist moves. Toensurethateachdominancesetcontainsevenlydistributed S2points, 82 the ESC algorithm inserts a non-exclusive S2 point s w 2 into D(s j 1 ), where s j 1 has the minimal value of (s w 2 :minsit¡s j 1 :mindist) among all other S1 points. In our algorithm, we utilize the BBS approach to initially compute the skyline query results. Along with the query evaluation, S2 points and the dominance set of each S1 point are computed during the execution of the modi¯ed BBS dominance-checking procedure which runs a window query to determine a set of candidate skyline points. Let e be the next discarded entryduringtheprocessofthedominance-checkingprocedure(eisdominatedbysomeS1 point). Therefore, the algorithm proceeds to insert e into a dominance set and examines whether e is a S2 point. Given is a heap H = fs i 1 :::s k 1 g that represents the set of the existing skyline points whose entries intersect with e. Since BBS always visits entries in the ascending order of their mindist, we have 8s2 H, s:mindist < e:mindist. With the sorting of H by the mindist in descending order, 9s j 1 2 H, s j 1 ` e and the value of (e:minsit¡s j 1 :mindist) > 0 is minimal among all other S1 points. Next, Lemma 4.2 is provided to prove the correctness of the S2 extraction. Lemma 4.2. Given a point p which is dominated by S1 0 =fs i 1 :::s j 1 g, where S1 0 ½S1. If 8s t 2 2D(S1 0 );s t 2 0p, p must be a S2 point. Proof: Since p is not dominated by (S1¡S1 0 ), p can never be dominated by any S2 point in D(S1¡S1 0 ) either, by transitivity. Therefore, if p is not dominated by any S2 point in D(S1 0 ), p is guaranteed to be a ¯nal S2 point. The pseudo code is shown in Algorithm 6, where the additional conditions (Lines 10-16 and 19-27) are inserted into the dominance-checking code for retrieving S2 points and determining the dominance sets. Line 4 sorts the heap in descending order of the 83 mindist such that the skyline points with larger mindist are examined ¯rst. Line 12 obtains thedominating skylinepoint e r forp whichis inserted into D(e r ) later. Based on Lemma 4.2, Lines 13{15 check whether p is a S2 point. Lines 20{23 ensure that each S2 is a data point. If e is an intermediate node, BBS is performed to retrieve local skyline points from the entry. Lines 23 and 25 insert the ¯nal S2 points O 0 into S2 and update the S2 set by deleting those S2 points that are dominated by O 0 . To ¯nd such a set, the algorithm performs S2Rtree.W(O 0 ), which is a window query that ¯nds the S2 points in the dominance areas of O 0 . 4.2.1.2 Description of the ESC Algorithm The main procedures of the ESC algorithm include S1Evaluation for the S1 updates and S2Evaluation fortheS2setmaintenance. ESC delegatesmostofexpensivecomputations that are irrelevant to S1 query results to S2Evaluation. To improve the performance of S2Evaluation, we introduce the concept of an approximate exclusive data region (AEDR) that helps to reduce the amortized cost of the S2 updates. When d = 2, the traditional EDRisaregularrectangle. However,anEDRhasanirregularshapeinhigherdimensions. For example, in Figure 4.5 (a), s i 2 is a skyline point to delete. The EDR is an irregular rectangleafterdeletingtheoverlappingareawiththedominanceareaofs k 2 ands v 2 . Based on this observation, we can obtain a regular shaped EDR only when we consider the skylinepointswhichhaveavaluex i largerthanthatofs i 2 inonlyonedimension. Because these points are completely \outside" of the EDR, they can trim the entire areas that represent the upper dimensional value x i . 84 Algorithm 6 ESC dominance-check(p) 1: insert all entries of the root R in the heap 2: isDominated = false, e r =Á 3: while heap not empty do 4: remove top heap entry e ==the heap is sorted in descending order of mindist. 5: if (e is an intermediate entry) then 6: for (each child e i of e) do 7: if (e i intersects with p) then insert e i into heap 8: end for 9: else 10: if (e`p) then 11: isDominated = true; 12: let e r =e, if e r is not empty ==e r : the ¯rst S1 point dominating p 13: for (each S2 skyline point s i 2 2D(e)) do 14: if (s i 2 `p) then set p as a regular data point and return isDominated 15: end for 16: end if 17: end if 18: end while 19: if (isDominated) then 20: if (p is an intermediate entry) then 21: perform DataRtree.BBS(p) that returns a skyline point set O 22: let O 0 2O be the data set that is not dominated by S2. 23: S2=S2+O 0 ¡S2Rtree:W(O 0 ) and insert O 0 into D(e r ) 24: else 25: S2=S2+p¡S2Rtree:W(p) and insert p into D(e r ) 26: end if 27: end if 28: return isDominated De¯nition 4.3. (AEDR) Lets i 2 =(x 1 ;x 2 ;:::;x d ),ands j 2 =(y 1 ;y 2 ;:::;y d ). AEDR(s i 2 )=s i 2 :DomArea¡(s i 2 :DomArea \ s j 2 :DomArea), 8s j 2 2(S2¡s i 2 ), there exists exactly one x k <y k , 1·k·d For example, in Figure 4.5 (b), s i 2 is the skyline to delete and the solid rectangle box isanAEDR,whichisaregularshaperesultingfromtrimmingtheoverlappingdominance areas of s i 2 and s j 2 . ESC utilizes the AEDR to search for the new S2 points by traversing 85 the R-tree. Each MBR e extracted from the heap is checked whether it intersects with the AEDR. If true, ESC checks whether e is dominated by the existing S2 points. k S 2 i S 2 (12, 7, 9) (16, 3, 18) v S 2 (21, 1, 13) j S 2 i S 2 (12, 3, 9) (4, 1, 17) (a) 3-d EDR example. (b) AEDR example. Figure 4.5: Traditional EDR v.s. AEDR. When a S1 point p is newly inserted into the system or when it moves, ESC needs to re-group a new dominance set for p. A simple solution is to check every S2 point whichcurrentlybelongstoadominancesetofsome S1pointandmigratetheS2pointto the dominance set of p if necessary. Instead, we provide FindDomSet, (the pseudo code is presented in Algorithm 7) applying the following Lemma that presents a heuristic to avoid checking the entire S2 set. Lemma 4.3. Given a new S1 point s k 1 , re-group the points in D(s i 1 ), only where 8s i 1 2 (S1¡s k 1 ), s i 1 :mindist ·s k 1 .mindist. Proof: Proof by de¯nition. Let s w 1 be a S1 point that has the value of (s w 1 .mindist >s k 1 .mindist). 8p2D(s w 1 ), the value of (p:mindist¡s w 1 :mindist) must be smaller than the value of (p:mindist¡s k 1 :mindist). Therefor, p must remain in the same dominance set of s w 1 . Hence, it is not necessary to re-group these points in D(s w 1 ). 86 Algorithm 7 FindDomSet(s k 1 ) 1: for (each p2D(s i 1 ), where s i 1 2(S1¡s k 1 ) and s i 1 .mindist <s k 1 .mindist ) do 2: if (s k 1 `p) then 3: D(s i 1 ):remove(p) 4: D(s k 1 ):insert(p) 5: end if 6: end for The ESC algorithm is implemented in an event-driven fashion to handle the skyline queryupdates. ThemainproceduresincludeS1Evaluation (Algorithm8)andS2Evaluation (Algorithm 9). When the query processor receives a request (from a point in S1, S2, or a regular data point), it ¯rst performs S1Evaluation to examine whether the request a®ects the S1 set (the query result) and outputs the updated S1 points if the set has beenmodi¯ed. Then S2Evaluation processestherestofnon-S1-relatedcomputations. In the S1Evaluation procedure, Line 6 performs the S1Rtree.dominace-descending function where the dominance checks access the S1Rtree in the descending order of the mindist of the entries. We use the same principle of the ESC dominance-check algorithm (discussed in Section 4.2.1.1) to ¯nd the dominating S1 point s k 1 (Line 7) for a request point p. IfpbecomesanewS2pointevaluatedbyS2Evaluation,pisinsertedintoD(s k 1 ). Lines 9¡10 update the S1 set if p is a new S1 point and delete the I set, which is an existing S1 set dominated by p. I is obtained by executing a window query S1Rtree.W(p), using the dominance area of p as the range on the S1Rtree. Line 11 inserts the new S1 point p into f S1andS1Evaluation willlaterpassthissettoS2Evaluation whereFindDomSet( f S1) is performed to ¯nd a S2 set for D(p). Since all the points in I become new S2 points (inserted into f S2 in Line 12), the S2 set is updated later in S2Evaluation by adding the f S2 set. 87 Algorithm 8 S1Evaluation(p) 1: let f S1=Á be a new S1 point set 2: let f S2=Á be a new S2 point set 3: let S2=Á be the existing S2 points to remove 4: p 0 be the last-updated point of p 5: S1=S1¡p 0 , if p was a S1 point 6: isDomByS1 = S1Rtree.dominace-descending(p) 7: let s k 1 be the S1 point with the minimal (p:minsit¡ s k 1 :mindist) value among all other S1 points 8: if (isDomByS1 == false) then 9: I = S1Rtree.W(p) 10: S1=S1 + p¡I 11: f S1:insert(p) 12: f S2:insert(I) 13: D(p):insert(i),8i2I 14: end if 15: if (p was a S1 point) then 16: for (each o2D(p 0 )) do 17: if (S1Rtree.dominace-descending(o) == false) then 18: S1=S1+o 19: D(p):remove(o) 20: f S1:insert(o) 21: S2:insert(o) 22: end if 23: end for 24: end if 25: outputtheupdatedS1setandcontinueS2Evaluation(p,isDomByS1,s k 1 , f S1, f S2,S2) procedure Lines15¡24basicallycheckalltheS2points2D(p 0 )whethertheyarestilldominated by p after p moves or is removed from the system. In Line 18, since o (a new S1 point after p moves) can never dominate any S1 point, o is added to the S1 set directly. This is because o is an exclusive data point, and therefore o must not dominate any existing S1 points. S2Evaluation is a more expensive procedure than S1Evaluation, because it involves AEDR computations to ¯nd a set of new S2 points to substitute a moving or removed S2 point. Lines 6¡7 are processed if p is a new S2 point. The insertion of p may dominate some existing S2 points; therefore, Line 6 ¯nds the dominated S2 points 88 Algorithm 9 S2Evaluation(p, isDomByS1, s k 1 , f S1, f S2, S2) 1: Let p 0 be the last-updated point of p 2: S2.insert(p 0 ), if p was a S2 point 3: if (isDomByS1 == true)) then 4: isDomByS2 = S2Rtree.dominace(p) 5: if (isDomByS2 == false)) then 6: S2=S2+p¡ S2Rtree.W(p) 7: D(s k 1 ):insert(p)andD(s k 0 1 ):remove(p), wheres k 0 1 (6=s k 1 )wasthedominatingpoint of p 8: end if 9: end if 10: S2=S2+ f S2¡D( f S2) 11: A = DataRtree-AEDR(S2), where A is a regular data set and is not dominated by S2 points. 12: S2=S2¡S2+A ==A substitutes S2 13: FindDomSet( f S1) (S2Rtree.W(p)) and removes them from the S2 set. Similarly, in Line 10, since each point in f S2 was originally a S1 point, the D( f S2) set is directly removed from the S2 set without performing a window query to look for the dominated points. The deletion of the S2 point set S2 is executed in Lines 11¡12 and A contains the substitute S2 points, after S2 is removed from the S2 set. Finally, FindDomSet is performed to ¯nd a group of S2 points for each point in f S1. 4.2.2 Experimental Evaluation 4.2.2.1 System Implementation Figure 4.6 shows the framework of the ESC system. The query processor initially com- putes the ¯rst and second skyline points. Any updates (A) performed on the data set are also submitted to the query processor. First, Task (B) examines whether the update re- quest (e.g., inserting or removing a data point) a®ects the ¯rst skyline set. If the request 89 point becomes a new S1 point, Task (B) inserts the new S1 point into the current S1 set and removes the current skyline points that are dominated by the new S1 point. These discardedS1points(newS2points)areprocessedbyTask(D)latertoupdatetheS2set. In case that an update request stems from a removed or moving S1 point, some exclusive points are left un-dominated. The query processor searches for new substitute S1 points only from the S2 set. The query results (C) are immediately output as soon as Task (B) is completed. The processing time of the sequence of Tasks (A)(B)(C) is the system response time to a skyline query update. Task (D) maintains the S2 points when any S2 point is inserted or removed. To enhance Task (D), which involves the expensive compu- tation of determining exclusive data points where (D) searches for new or substitute S2 points from the rest of the data set, we also propose an approximate exclusive data region computation with lower amortized cost than existing techniques [PTFS05, WAEA07]. ESC Query Processor S1 Skyline Evaluation S2 Skyline Evaluation Data Set Request Skyline Query Results B D Discarded points New S1 points C A I/O Access Data Updates Figure 4.6: ESC system framework. 90 4.2.2.2 Simulation Steps WeevaluatedtheperformanceoftheESC algorithmbycomparingitwiththewell-known BBS approach [PTFS05] and the DeltaSky algorithm [WAEA07]. For the EDR compu- tations in BBS, we adopt the ABBS (Adaptive Branch-and-Bound Search) [WAEA07] to avoidcomplexirregular-shaped EDR computations. ABBS basicallytraversestheR-tree and determines whether an intermediate MBR e i intersects with the dominance area of a skyline to delete. If this is true, it further checks whether any existing skyline dominates e i . AllofthesealgorithmsutilizeR-treesastheunderlyingstructureforindexingthedata and skyline points. We use the Spatial Index Library [Lib] for the R-tree index. A page size of 4Kbytes is deployed, resulting in node capacities between 94 (d = 5) and 204 (d = 2). S1 and S2 sets are indexed by a main-memory R-tree to improve the performance of the dominance checks. Our data sets are generated on a terrain service space of [0;1000] 2 with the random walk mobility model [McD99]. Each object moves with a constant ve- locityuntilanexpirationtime. Thevelocityisthenreplacedbyanewvelocitywithanew expiration time. We generated from 100,000 to 1,000,000 normal distributed data points with dimensions in the range of 2 to 5. The object update ratio is set in a range from 1% to 10%. Experiments are conducted with a Pentium 3.20 GHz CPU and 1 GByte of memory. The query results are evaluated in an event-driven approach. Therefore, the query processor calls di®erent procedures based on each speci¯c event type. The main measurementinthefollowingsimulationsistheresponseCPUtime(fromreceivingadata updaterequesttotheS1updatecompletiontimeortheevaluationtimeofS1Evaluation) and the overall CPU time (the evaluation time of S1Evaluation plus S2Evaluation). For 91 Parameter Default Range P 100,000 100,000, 500,000, 1,000,000 d 5 2, 3, 4, 5 f update 10% 1%, 5%, 10% Table 4.2: Simulation parameters for the ESC approach. ABBS andDeltaSky theoverallCPUtimealsorepresentstheresponsetime. Ourexperi- ments use several metrics to compare these algorithms. Table 4.2 summarizes the default parameter settings in the following simulations. 4.2.2.3 Update Ratio 0 100 200 300 400 500 600 10 5 1 Response CPU Time (sec) ESC DeltaSky ABBS 0 100 200 300 400 500 600 10 5 1 Overall CPU Time (sec) ESC DeltaSky ABBS (a) Response CPU time. (b) Overall CPU time. 0 50 100 150 200 250 10 5 1 I/O Cost ESC DeltaSky ABBS (c) I/O cost. Figure 4.7: Performance v.s. update ratio (P =100k, d=5). 92 First, we evaluated the impact of the update ratio. Figures 4.7 (a) and (b) show the response time and overall CPU time as a function of update ratio, respectively, and Figure4.7(c)illustratestheI/Ocostforthethreemethods. We¯xthedatacardinalityat 100,000 and dimensionality at 5. The ESC approach achieves a better performance than ABBS and DeltaSky for all update rates. The degradation of DeltaSky is caused by the expensive Maximum Coverage computations scanning over the projection lists and the increase of skyline point size which incurs bigger projection lists. ESC also outperforms both methods in terms of the overall CPU time, since the amortized cost of the AEDR computations and exclusive data evaluation is lower than the other two methods. 4.2.2.4 Dimensionality Next we report on the impact of the dimensionality on the performance of all three methods. Figures 4.8 (a), (b) and (c) show the CPU overheads and I/O cost v.s. the dimensionality ranging from d= 2 to 5, respectively. When d increases, the performance of all methods is degraded because the exclusive data point computations are complex and R-trees fail to ¯lter out irrelevant data entries in higher dimensions. From all the ¯gures we can see that ESC outperforms ABBS and DeltaSky in terms of the CPU time and I/O cost. 4.2.2.5 Cardinality Figures 4.9 (a) and (b) show the response and overall CPU time as a function of the number of data points, respectively, and Figure 4.9 (c) illustrates the corresponding I/O cost. Overall, the CPU overheads increase as a function of the number of data points. 93 0 100 200 300 400 500 600 5 4 3 2 Response CPU Time (sec) ESC DeltaSky ABBS 0 100 200 300 400 500 600 5 4 3 2 Overall CPU Time (sec) ESC DeltaSky ABBS (a) Response CPU time. (b) Overall CPU time. 0 50 100 150 200 250 5 4 3 2 I/O Cost ESC DeltaSky ABBS (c) I/O cost. Figure 4.8: Performance v.s. dimensionality (P =100k, f update =10%). ESC achieves a signi¯cant reduction in terms of the response CPU time compared to ABBS and DeltaSky. ESC takes advantage of the pre-computed S2 points retrieved by the latest S2Evaluation procedure and quickly locates relevant new S1 candidates for substituting a removed or moving S1 point. As we can see from the experimental results, the adoption of AEDR helps ESC to achieve better overall CPU performance and competitive I/O cost with DeltaSky. 94 0 1000 2000 3000 4000 5000 6000 7000 1000k 500k 100k Response CPU Time (sec) ESC DeltaSky ABBS 0 1000 2000 3000 4000 5000 6000 7000 1000k 500k 100k Overall CPU Time (sec) ESC DeltaSky ABBS (a) Response CPU time. (b) Overall CPU time. 0 200 400 600 800 1000 1200 1400 1600 1000k 500k 100k I/O Cost ESC DeltaSky ABBS (c) I/O cost. Figure 4.9: Performance v.s. cardinality (d=5, f update =10%). 4.2.3 Summary We propose an incremental skyline update approach. Our ESC algorithm achieves a faster response time and better overall CPU performance. With the adoption of the pre- computed S2 sets, ESC can e±ciently update the skyline query results and delegate the mostcomplexcomputationstoaseparateprocedurethatexecutesaftertheupdatesofthe query results are completed. An approximate exclusive data region (AEDR) is proposed and our experiments con¯rm the feasibility of AEDR which has a low amortized cost of 95 the exclusive data evaluation in high dimensional and dynamic data environments. The S1Evaluation procedure¯rstexaminesalltheincomingdatarequestsandupdatesthe S1 result if necessary and the S2Evaluation procedure integrates our lemmas and heuristics to achieve a low CPU overhead and reduced I/O cost. 4.3 Skylines with Partially-ordered Domains 4.3.1 Caching Support for Skyline Query Processing with Partially- Ordered Domains Consider a partially-ordered domain where a user q (note that we will use the terms user and query interchangeably in this study) declares speci¯c preferences for some data dimensions. Skyline query results vary with di®erent user preferences (as is illustrated in the examples of Figure 4.2) and the computation is very costly. Our conjecture is that query results that were previously obtained with a user preference pro¯le similar to the pro¯le of the query currently under consideration may contribute useful candidate result points. We will ¯rst introduce some terminology to help formally describe the problem. A user preference pro¯le (denoted by g =(V;E)) can be represented by a directed acyclic graph (DAG), which consists of a set V of option nodes and a set E of edges. The node set V includes a unique arti¯cial root node r with no predecessor together with the actual entry node(s) as successor(s). A primitive relation (or preference) in g from node v i to node v j is denoted by v i ! v j , and the edge e 2 E (with a solid arrow) is directly connected from v i to v j . A transitive relation is denoted by v i 99K v j , where the edge (with a dashed arrow) does not exist in g, and hence there is at least one additional node 96 B A C D r r : A*: B*: C : D : A* B* D C D C D D (a) DAG g. . (b) Transitive closure g + of g. Figure 4.10: Sample user preference DAG and its transitive closure. between nodes v i and v j . When v i and v j have equivalent preferences, such a relation is denoted by v i $ v j , which indicates that the user does not prefer one over the other. To enable a quantitative comparison between two DAGs, we de¯ne a numeric similarity measure as the aggregate contribution of the preference relations that compare pairs of nodes from both DAGs. We adopt an adjacency list to represent a DAG g and then compute g's transitive closure g + =(V;E + ) consisting of all the primitive and transitive relations ing, suchthat forall v i andv j inV, there exists a non-nullpath (either v i !v j or v i 99K v j ) in E + . A transitive closure list contains at most n sub-lists (n equals the maximum number of options allowed for a user preference pro¯le), each of which starts with an intermediate node in the DAG. An actual entry node of a DAG is marked with an asterisk ( ¤ ) to distinguish it from other intermediate and leaf nodes. Figures 4.10 (a) and (b) illustrate a DAG g and its corresponding transitive closure. The arti¯cial root node is r, whereas A ¤ and B ¤ with a asterisk mark each are the actual entry nodes. Figure 4.11 shows the overall framework of the C-SKY system. The query proces- sor initially computes the skyline query results for a query request (A) and caches the 97 result set with the associated preference pro¯les (C). When a new query request q en- ters the system, Task (B) performs the similarity measure by computing the similarity values between the new query q and the cached queries. Upon its completion, Task (B) forwards a sorted list of candidate queries to Task (D), which in turn selects a set of candidates from the list. Next, Task (F) computes q's skyline results based on the result sets of these candidate queries. If the new query q is not answerable from the cache, the Data Restoration component accesses the whole data set (E) to perform less expensive constraint queries to restore all of the possibly missing answer points. Finally, the query processor evaluates the result based on the preference pro¯les of the new query to re¯ne the¯nalanswer. Sincecachespaceislimited,Task(G)purgesthecachebypreservingthe most popular preferences; i.e., the strategy is to eliminate queries with the least recently used preference pro¯les from the cache. Similarity Measure Cached Query Set Query Evaluation B F Query Set Maintenance G Cache updates Skyline Queries A Skyline query results Request Data Set CSKY Query Processor C Candidate Cached Query Selection & Data Restoration E I/O D Figure 4.11: C-SKY system framework. Before we describe each task in detail we explain the main intuition behind our work. The key concept of the C-SKY algorithm involves preference ¯ltering. Let ¹ V k be the unspeci¯ed preference nodes for a partially-ordered domain in dimension k (PO k ), where 98 ¹ V k µ V k of g k is used by a query q. V k is a node set of all the possible node values allowed in the system for the PO k dimension. V k consists of the unspeci¯ed node set ¹ V k and speci¯ed node set V k indicated in a user pro¯le DAG. We de¯ne a data tuple set T(P; ¹ V), where each tuple contains at least one unspeci¯ed preference node in one of the PO dimensions. T(P; ¹ V) is a potential skyline point set for q. For the data tuples with unspeci¯ed preference values, a data point p i = (TO 1 ;:::;TO d ;PO 1 ;:::;PO n ) can dominatep j =(TO 1 ;:::;TO d ;PO 1 ;:::;PO n ),ifp i :TO k <p j :TO k andp i :PO k 0 =p j :PO k 0, where k = 1:::d and k 0 = 1:::n. The query processor directly retrieves the actual skyline points by using the traditional skyline computation without considering the user preference pro¯les (as the preference example g 2 shows in Figures 4.2 (a) and (b)). The following two equations describe the key intuition in this appraoch. P =T(P; ¹ V)+T(P;V) (4.1) P 0 =T(P; ¹ V)+T(D;V) (4.2) To retrieve a complete result set for q, the query processor focuses on the data tuples T(P;V)=P ¡ T(P; ¹ V), where P is the entire data set in the system (Eqn. 4.1). In this study, the C-SKY algorithm handles a small data set P 0 µ P, where D is a candidate result set obtained from the results of the cached queries (Eqn. 4.2). Therefore, the preference ¯ltering operation enables C-SKY to (a) e±ciently retrieve the skyline points with partially-ordered domains by reducing the search space to the speci¯ed data tuples only, and (b) ease the complexity of the similarity measure, which targets the speci¯ed nodes only for the similarity comparisons. The details of each task in Figure 4.11 are 99 Symbol Description q A user query with a user pro¯le set g =fg 1 ;g 2 ;:::;g n g, where n is the number of the partially-ordered domains. P The entire data set in the system. D Thecandidateresultsetobtainedfromtheresultsofthecachedqueries. V k =V k + ¹ V k The maximum option nodes (or values) of the partially-ordered domain PO k . V k ( ¹ V k ) Thespeci¯ed(unspeci¯ed)optionnodesofthepartially-ordereddomain PO k . g k = (V k ;E k ) AuserpreferenceDAG comprisingasetV k ofthespeci¯edoptionnodes withasetE k ofedgesbetweennodesforthepartially-ordereddimension k. g = fg 1 ;g 2 ;:::;g n g The user preference pro¯le set for all partially-ordered domains. G k ThecachedpreferenceDAGs maintainedbythesystemforthepartially- ordered dimension k. ^ G k ThecachedpreferenceDAGs sortedindescendingorderofthesimilarity values for the partially-ordered dimension k. T(P; ¹ V) The data set with at least one of the unspeci¯ed preference nodes in ¹ V A!B A primitive preference where node A is directly connected to B A99KB A transitive preference where node A is indirectly connected to B A$B A user has an equivalent preference for A and B p`p 0 p dominates p 0 S(g k ;g 0 k ) Similarity function that returns a numeric number to measure the sim- ilarity between g k and g 0 k in the partially-ordered domain PO k Table 4.3: Symbols and functions for the C-SKY approach. described in the following sections. Table 4.3 summarizes the symbols and functions we use throughout the following sections. 4.3.1.1 User Preference Pro¯le Similarity Measure To enable a quantitative comparison among preferences, we de¯ne a similarity function that returns the aggregate contribution of all preference pairs between two compared DAGs. The similarity function S(g;g 0 ) measures the correlation between g (associated withanewqueryq)andg 0 (associatedwithacachedquery)andreturnsasimilarityvalue 100 ranging from a negative value up to 1, e.g., when g and g 0 are identical, the similarity valueequals1. S(g;G)returnsthesortedquerylist ^ Gindescendingorderofthenumeric similarity values for g with respect to all DAGs in G. In the best case, the system will ¯ndaperfectly similar preference DAGg ¤ 2G. Inthatscenario,thecorrespondingquery associatedwithg ¤ containsthecompleteskylineresultsfor q. Therefore, thesystemdoes not have to search further to retrieve the ¯nal query result for q. A perfectly similar preference DAG has the following properties. Property 4.1. (Violation-Free): A perfectly similar preference DAG g ¤ has no viola- tions that contradict the preferences in g. Aviolationexistswhentwopreferencescon°ict. Wewillgiveamoreformalde¯nition below. Consider the examples in Figure 4.12, where g is the preference DAG of a new query and g 1 is the DAG of a cached query. In DAG g, g:A ! g:B, while g 1 :A and g 1 :B hold an equivalent preference (i.e., g 1 :A is a sibling node of g 1 :B). Therefore, there is no preference violation and, importantly, we can observe that the structure of a perfectly similar preference DAG does not have to be identical to g. A violation refers to a preference con°ict that occurs when the comparison of two corresponding relation pairs contradict each other. The formal de¯nition of a violation can be stated as follows: De¯nition4.4. (Violation): Given g and g 0 , a pair of preference nodes are in violation with each other if either g:v i ! g:v j or g:v i 99K g:v j holds, and at the same time either g 0 :v j ! g 0 :v i or g 0 :v j 99K g 0 :v i exists, 8 i;j < n. Additionally, if g:v i $ g:v j is true, a violation occurs when g 0 :v i =g 0 :v j holds. 101 C B A r r B A C D r A E D B C r C D E A (a) g. (b) g 1 :S(g;g 1 )= -1:5. (c) g 2 :S(g;g 2 )= -2:33. (d) g 3 :S(g;g 3 )= -3:0. Figure 4.12: Example DAGs and query indexing structure. From this de¯nition we can clearly observe that if g 0 :v i ! g 0 :v j (g 0 :v i 99K g 0 :v j ), a match occurs with g:v i ! g:v j . A match holds when both g and g 0 have an identical relation from v i to v j (either a primitive or a transitive relation), or when one has a primitive and the other has a transitive relation. A special case of a match exists when g has a primitive or transitive relation from v i and v j and g 0 has an equivalent preference to v i and v i . We can restate De¯nition 1 and say that a violation condition does not hold if a match is true. Since g 0 :v j ! g 0 :v i or g 0 :v j 99K g 0 :v i contradicts g:v i ! g:v j or g:v i 99K g:v j , there is a violation. This is illustrated in the examples (a) and (b) in Figure 4.12, where g 1 :B!g 1 :C violates g:B$g:C. Property 4.2. (Inclusion): The query result of a perfectly similar preference DAG g ¤ with respect to g is a minimal super query result set of query q using g. 102 Let R ¤ be the query result of q ¤ using g ¤ . Since a perfectly similar preference pro¯le is free of violations, R ¤ must be an inclusive super result set of q usingg. For example, in Figure 4.2, the skyline result of the query q 2 using g 2 is a super result set of the query q 1 using g 1 , since g 2 is a user pro¯le that is compatible with g 1 . We can retrieve a complete resultsetforq 1 fromtheresultsetofq 2 . Weearlierdescribedthede¯nitionofaviolation, and we now provide proof for its properties. Given g and g 0 , a violation occurs when a more constraint preference is used in g 0 compared to the corresponding preference used in g. The query results of q 0 using g 0 may exclude some data points which would likely be result points of q using g. Therefore, R ¤ must contain a complete answer set for the corresponding query q, because g¤ is free of violations. Furthermore, the preferences of g and its perfectly similar preference DAG g ¤ must possess the closest degree of similarity, suchthatR ¤ doesnotcontainmanyirrelevantresults,whichwoulda®ecttheperformance of the query evaluation. An optimal perfectly similar preference DAG g ¤ of g must share identical preferences. However, in reality, such an optimal perfectly similar preference DAG rarely occurs, hence the system needs to choose an existing available DAG with a high degree of similarity to be as close as possible to an optimal selection. To enable quantitative measurements for the degree of similarity, we de¯ne a similarity function in the following section. 4.3.1.2 Similarity Functions Forsimilaritycomparisons twostates, match andviolation, areexpressed byQ g;g 0(v i ;v j ), which compares the two corresponding pairs of relations between v i and v j from both 103 g and g 0 , where g is the user preference pro¯le of a new query q and g 0 is the user preference pro¯le of a cached query q 0 . Given g and g 0 (transitive closure forms), the function Q g;g 0(v i ;v j ) returns the matching contribution, which is either a match or a violation, each of which provides a di®erent contribution to the similarity. The similarity function S(g;g 0 ) returns a real number that aggregates the matching contributions and is computed as shown in Equation 4.3. S(g;g 0 )= P Q g;g 0(v i ;v j ) jE + g j ; (4.3) Q g;g 0(v i ;v j )= 8 > > < > > : 1 (i) a match, or ¡jE + g j (ii) a violation HerejE + g j denotes the total number of edges of the transitive closure g + . For all valid relations (v i ,v j ) in g 0 , v i is a speci¯ed, intermediate node in g 0 , and v j is a speci¯ed node ing. Sincetheunspeci¯eddatatuplesareprocessedseparatelyinanotherprocedurefrom the regular skyline computations for the speci¯ed data set, the similarity measurement ignorestherelation(v i ,v j )ifv j isanunspeci¯ednodeing. However,ifv i isanunspeci¯ed node in g, the measurement still counts such a relation as if it was a violation with g. Q g;g 0(v i ;v j ) returns a similarity value 1 for a match (case (i)). A violation incurs ¡jE + g j as a penalty (case (ii)), which reduces the accumulation. The maximum similarity value is 1 in the case when there are matches for all the comparisons. For example, in Figure 4.12 (c), Q g;g 2 (A;C) returns 1 as a match even though g 2 :A99Kg 2 :C whileg:A! g:C. Foraviolationincase(ii), wedeductamaximumsimilarityvalueof¡jE + g jtocause the current summation to become negative. Consequently, user preference pro¯les that 104 arefreeofviolationsarealwaysrankedhigherthanuserpreferencepro¯leswithviolations. For example, g 1 :D ! g 1 :C violates g because C is a speci¯ed node in g and there is no relation among (g:D;g:C) de¯ned in g. Therefore, the similarity value is negative if at leastoneviolationoccurs. Therelationof(g 1 :B,g 1 :D)isignoredduringthemeasurement process since D is not a speci¯ed node in g. Since there is one match (Q g;g 1 (A;C)), and twoviolations(Q g;g 1 (B;C)andQ g;g 1 (D;C)), thesimilarityvalueequals-1.5. Theoverall similarity algorithm is outlined in Algorithm 10. Algorithm 10 S(g;g 0 ) 1: let V and V 0 be the speci¯ed node sets of g k (used by a new query) and g 0 k (used by a cached query), respectively. 2: let n be the number of the partially-ordered domains 3: FinalScores = 0; 4: for k =1 to n do 5: scores =Á; 6: for (every v i 2V 0 ) do 7: for (every node v j 2V;v i 6=v j ) do 8: if (g 0 k :hasEdge(v i ;v j ) then /*A valid relation in g 0 k */ 9: if (g k :hasEdge(v i ;v j ) then /*A match*/ 10: scores += 1; 11: else 12: scores +=¡jE + g k j; 13: end for 14: end for 15: FinalScores += scores=jE + g k j 16: end for 17: return FinalScores; /*return ¯nal scores*/ 4.3.1.3 Cached Query Selection A perfectly similar preference pro¯le can rarely be found among the cached queries, es- pecially as the maximum number of options allowed per user preference pro¯le increases and the users are more likely to specify very di®erent preference pro¯les. For example, if 105 the query processor accesses the top cached query and it produces a negative score (i.e., indicating a preference violations), this would imply that the system cannot retrieve a complete result set for the new query from the existing cached queries (since all of them will have equal or worse negative scores). To address this challenge, we introduce a novel approach in this study to select a minimum set of queries G 0 , from which the query pro- cessor can ¯nd a complete set of skyline results by combining the results of each query in G 0 . Lemma 4.4. Given are a new query q and two cached queries q i and q j , both with the user preference pro¯les g k , where k = 1:::n for all partially-ordered domains. If all preferencerelationsing k ofq,8k =1:::n, arecompatiblewiththerelationsde¯nedinthe corresponding PO dimension of either q i or q j , the results of q are completely retrievable from the candidate result set D =fq i .result [ q j .resultg. The query list (in this case q i and q j ) that produces the candidate result set D is referred to as a complementary query list. Proof: By de¯nition. Since a complete result set for q is computed, guided by the preference relations in g, and since the cached queries q i andq j consist of all the relations declared in g, the result set union of q i and q j must contain the complete set of skyline result points. To ¯nd the complementary query list for a new query q, we start from the cached query with the highest similarity value. However, in the worst case, such an operation is still expensive when none of the high-ranked cached queries have compatible relations 106 withq. Therefore, aheuristicthresholdparameter(±)isintroducedtostoptheoperation of continuously merging query results. Furthermore, since the top-ranked cached query q base has the fewest violations with the new query, we adopt it as a baseline query when searching for violated relations E with respect to the new query. We then select the cached queries q 0 for the complementary query list, if q 0 has compatible relations with a corresponding violated edge E in at least one partially-ordered domain. For a violated relation inE, we relax the de¯nition of a relation (v i 99Kv j ), which is a broader notation that covers the relation of (v i !v j ). We establish Lemma 4.5 as follows: Lemma 4.5. Given is e = (v i 99K v j ), a violated relation inE for the partially-ordered domain PO k with the baseline query, that is the top-ranked cached query. Let q 0 be a selected query for the complementary query list. Furthermore, g 0 k has no violated rela- tions e 0 = (v s 99K v j ) (v s is an any speci¯ed node, including v i in g 0 k ). In other words, g 0 k has compatible relations with regards to e in dimension PO k . Then the data tuples T(q 0 :result;v j ) retrieved from the result set of q 0 must contain the candidate skyline result points for the new query. Proof: By de¯nition. Let P 1 and P 2 be the data tuples for which the dimensional data value in PO k is equal to v i and v j , respectively. Let all the TO w attribute values of P 1 , 8w =1:::d be better than the TO w attribute values of P 2 and the rest of PO c attribute values of P 1 , 8c = 1:::n;c6= k be better than the PO k attribute values of P 2 . Assume that both the user pro¯les g k of the new query q and g 0 k of the cached query q 0 in PO k contain a relation edge (v t 99K v j ), where v t 6= v s . Since g 0 k has compatible relations in PO k (no violated (v s 99K v j ) relations), P 1 cannot dominate P 2 . Furthermore, the 107 eliminated P 2 data tuples, due to the (v t 99Kv j ) relation for g 0 k , must also be eliminated forg. Therefore, all the P 2 tuples that are considered as skyline points for the new query, must be preserved in the results of g 0 k . To obtain a candidate result set for the new query q, we ¯rst insert the result tuples withnodevaluesde¯nedintheuserpreferencepro¯lesofq. Foreachselectedqueryforthe complementary query list, we only combine the missing tuples that might be eliminated by the violated relations. For example, q 0 is violation free with respect to the violated relation (v i 99Kv j ). We insert T(q 0 :result;v j ) into the candidate result set only as all the result tuples in q 0 :result are the missing data tuples for the result of the new query. For example, inFigure4.12, g 1 isthebaselinecachedquery, sinceithasthehighestsimilarity value. The algorithm starts to search for the violated edges, if any exist. Pro¯le g 1 has two violated edges, (g 1 :B!g 1 :C) and (g 1 :D!g 1 :C) with respect to g. Since the result set of g 1 does not contain a complete result set for q, the system searches for the next cached query. In pro¯le g 2 , the user does not specify any preference over C except for a compatible relation (g 2 :A!g 2 :C matches g:A!g:C). The complementary query list of g must contain both g 1 and g 2 , since the violations (g 1 :B ! g 1 :C and g 1 :D! g 1 :C) do not hold in g 2 . Therefore, the skyline query result can be found from the results of the corresponding queries using g 1 and g 2 without accessing the entire data set. The algorithm of ¯nding a candidate result set for g is outlined in Algorithm 11. In Line 2, D is a candidate result set. Initially, the result tuples with speci¯ed node values in V of g used by the new query are inserted into D. In Line 3, vioEdges is a container which stores the violation edges (with respect to g) returned by the ¯ndViolatedEdges 108 function. Line 3 checks the baseline query (g 1 , the ¯rst query with highest similarity value in list G) for any violated relations. Line 5 mainly checks whether the current candidate result set is larger than a threshold (±) or whether vioEdges set is empty. Line 6 performs a removeViolatedEdges function, which deletes the violated relation(s) E in vioEdges, ifg i hascompatiblerelationswithregardstoE. Thesetscontainstheresultof the corresponding q i using g i . In Lines 5{12, if vioEdges is not empty after checking the user preference pro¯les in the cache, q is not an answerable query by the current selected cached queries. In this case, the query processor performs constraint queries to restore the missing data points. In Lines 8{9, by using the preference ¯ling function, only the relevant missing tuples (with the corresponding PO attributes equal to any node value in N) of the new query result are inserted into the candidate result set. The details of handing unanswerable queries are discussed in the next section. Algorithm 11 FindCandidateResultSet(G, ±) 1: let Q fq 1 ;q 2 ;:::;q n g be the sorted cached query list and G = fg 1 ;g 2 ;:::;g n g be the their corresponding sorted user pro¯le list in ascending order of the similarity value with respect to g 2: let D =T(q 1 .result,g:V 1 ) be the initial candidate data set. 3: vioEdges = ¯ndViolatedEdges(g,g 1 ) 4: i=2 /* index of the user pro¯les in G */ 5: while (vioEdges 6=Á AND D <± AND i<n) do 6: (s,E) = removeViolatedEdges(vioEdges,g i ); 7: if (s is not empty) then 8: let N be the node set of v j , from the each relation pair of (v i ;v j ) inE 9: insert T(s;N) into D; 10: end if 11: i=i+1 12: end while 13: return (vioEdges,D) 109 4.3.1.4 Unanswerable Queries A new query q is termed unanswerable if the selected cached queries do not contain a complete result set for q. This may occur when all the relations of the cached user preference pro¯les violate the relations speci¯ed in q. However, even in this case some optimization can be achieved. Instead of accessing the entire data set to retrieve the skyline results, C-SKY performs less expensive constraint queries to restore the missing data tuples which were eliminated because of the violated relations of the cached queries. Let SkylineQuery be a function that embodies the non-caching algorithm TSS [SPP09] to evaluate a skyline query. In Section 4.3.1.3, we describe the removeViolatedEdges function, which removes the violated relations when at least one of the cached queries (other than the baseline query) hascompatiblerelationsinthecorrespondingPOdomains. Whenthefunctionterminates and there are still unremovable violated relations (when vioEdges is not an empty set in the Algorithm 11, Line 13), the following operations are necessary to restore the missing data tuples which might have been eliminated by such violated relations. The ¯rst step is to create a new DAG g k with only the violated relations e in the PO k domain, if e originally violated g k in the PO k domain. We use the same user pro¯les de¯ned in the selected query { which contain the violated relations e with the new query { for the rest of the PO domains, such that SkylineQuery can restore only the data tuples which were eliminated through the violated relations of their own partially-ordered domain. The following lemma and proof of correctness underlie the process. 110 Lemma 4.6. Given is e = (v i 99K v j ), an unremovable violated relation in E for the partially-ordered domain PO k . Create a new g = fg 1 ;g 2 ;:::;g n g and let g k be one of the user pro¯les with only one relation e, where 1 · k · n. Let P 1 and P 2 be the data tuples in which the dimensional data value in PO k is equal to v i and v j , respectively. Let P 0 2 µ P 2 be the eliminated data tuples during the skyline evaluation. We can conclude that P 0 2 must contain all the missing data due to the violated relation e. Proof: By de¯nition. Since the set P 0 2 contains the eliminated data tuples based on the new user pro¯le g ¤ , all the attributes of each data tuple in P 0 2 must be worse than all the attributesofatleastonedatatupleinP 1 . Foreachnon-dominatedtupleinP 2 ¡P 0 2 ,there must exist at least one PO or TO attribute that is not worse than the corresponding attribute of all P 1 tuples. The data tuples (P 2 ¡ P 0 2 ) are originally preserved in the corresponding query results which are inserted into the candidate result set. Therefore, P 0 2 must be exactly the eliminated data set due to e=(v i 99Kv j ). We summarize the steps below to perform such constraint queries for each violated relation (v i !v j ) of a partially-ordered domain PO k . Step 1: Let g i contain relation (v i !v i ), g j contain relation (v j !v j ), and g ij contain relation (v i !v j ) in the PO k domain, respectively. Step 2: Let S i = SkylineQuery(T i , g i ), where T i = Select ALL from dataTable where PO k =v i . Step 3: Let S j = SkylineQuery(T j , g j ), where T j = Select ALL from dataTable where PO k =v j . 111 Step 4: Let S ij = SkylineQuery(T ij , g ij ), where T ij =S i [S j . Step 5: Return T ij ¡S ij Step 2 (and also Step 3) ¯nds a skyline result set, which excludes the data tuples eliminated by the constraint v i ! v i (respectively v j ! v j ). Thus, if a data tuple p 1 dominates p 2 in all totally-ordered domains (p 1 :TO x < p 2 :TO x , 8x = 1:::d), and the partially-ordered domains (p 1 :PO y is preferable to p 2 :PO y , 8y = 1:::n, except for k (1· k · n), we can eliminated p 2 , since p 1 :PO k equals p 2 :PO k . Next, in Step 4, S ij is the skyline point set retrieved from S i [S j . S ij only contains data tuples where PD k equals v i or v j , which are not eliminated in Steps 2 and 3. Finally, Step 5 retrieves the data tuples that are only eliminated due to the v i ! v j constraint. Let us continue the example in Figure 4.12 and consider the data set sample in Figure 4.2 (a). Assume that g 2 and g 3 do not exist in the cache. Since the only cached user preference pro¯le has two violated relations (B ! C and B ! D), the query processor cannot retrieve a complete skyline result for q using g. For example, for violated edge B ! D, we ¯rst perform SkylineQuery(T B ,B!B), where T B =fp 3 ;p 4 ;p 6 g and S B =fp 3 ;p 6 g. The approachperformsthesecondSkylineQuery(T D ,D!D)operation,whereT D =fp 7 ;p 9 g and S D = fp 9 g. SkylineQuery(T BD ,B ! D) returns S BD = fp 3 ;p 6 g, where T BD = S B [S D = fp 3 ;p 6 ;p 9 g. Now, the missing data eliminated by the B ! D constraint is T BD ¡S BD =fp g g. Considering the data sets S i and S j in Steps 1 and 2, we can observe that if a cached query q 0 with v i as the entry node is de¯ned in the PO k domain, T(q:result;v i ) must be an identical set to S i . Because v i as an entry node cannot be dominated by any other 112 node,theresulttuplesofq 0 mustcontainallthedatatupleswiththeattributevalueequal to v i in PO k . Hence, the performance of the data restoration process through constraint skyline query computations is improved as compared to using the whole data set. 4.3.2 Cache Management and Replacement Naturally, thenumberofcachedqueriesprogressivelyincreasesaftersystemstartupuntil thecacheisfull. Sincethecachespaceislimited,e±cientcachemanagementandreplace- mentstrategiesareneededtoenablelong-termoperations. InSection4.3.1.3, athreshold ± is used to avoid caching a query with a large result set, which might not help the query processor to reduce the complexity of the skyline computations (e.g., the time reduction of the skyline evaluation might not be signi¯cant when compared with the computation time of a non-cached skyline evaluation). Furthermore, using a threshold excludes such unhelpful queries from being cached, which results in more free space for other queries. Asthenumberofcachedqueriesincreases, thequeryprocessorrequiresmoretimeto¯nd arelevantsetofcachedquerieswithrespecttothenewqueryforsimilaritymeasurements and query selection. Accessing all the cached queries is time consuming. We provide a query indexing structure to e®ectively locate a set of cached queries relevant for the new query. The goal of the replacement algorithm is to keep track of usage information in order to improve the \hit rate." For this we adopt the concept of the least frequently used (LFU) cache replacement algorithm. The details regarding the query indexing and replacement algorithms are described in the next Sections 4.3.2.1 and 4.3.2.2. 113 4.3.2.1 Query Indexing We propose a query indexing structure for the query processor to e®ectively locate a set ofrelevantcachedqueries. TheC-SKY algorithmusesthissettoperformsimilaritymea- surementandqueryselectionoperationstodetermineacandidateresultsetforthequery processor. It is advantageous to determine a high-level similarity to the new query before performingactualsimilaritymeasurementsandqueryselectiononthecachedqueries, be- cause these two operations are time-consuming. For example, a PO domain withV=10 distinct nodes has at most 2 10 = 1024 lattices corresponding to all possible nodes. We use the entry node(s) of a cached query q 0 as the key(s) and store each group (at most h groups, where h is the maximum size of a PO domain), because if a cached query q 0 and the new query q have the same entry node sets, the violations are possibly fewest. Therefore, therelevanceoftwoqueriesareapproximatelymeasuredbytheentrynode(s). 1 q 2 q 3 q : : : : A B C D E F Figure 4.13: Query indexing. For example, in Figures 4.12 (b), (c), and (d), the keys of g 1 used by q 1 are A and B; the keys of g 2 used by q 1 are A, D, and E; and the key of g 3 used by q 3 is C. When a new query q using g is requested, the system ¯nds q 1 and q 2 as the relevant queries 114 to q. Hence, p 3 is ignored since it is less relevant to q. Figure 4.13 shows the indexing structure. An additional table is used and each key points to the corresponding cached queries with the key as the entry node. 4.3.2.2 Cache Replacement In C-SKY, we preserve the queries with the most popular user preference pro¯les since they are more likely to be used later. We assign each cached query a counter, which describes how often the cached query is utilized. If a cached query contributes its total (or partial) results during an execution run, we increment the counter by one. Once the cache is full, the system must discard some cached queries to free up space for newly arriving queries. We adopt a similar concept to the least recently used (LRU) algorithm. The least used query (or queries) are replaced by the new query. Figure 4.14 illustrate the cache structure. Each cached query, which is directly mapped by the corresponding indexes in the main memory, is assigned a counter. The index of a replaced query is removed from both the main memory and the cache memory. Cache Meory Main Memory Index k i j : (index i, cr) : : : : (index j, cr) (index k, cr) : Figure 4.14: Cache structure. 115 4.3.3 Description of the C-SKY Algorithm Finally, the overall algorithm of C-SKY is illustrated in Algorithm 12. Lines 4{5 are executed when the ¯rst skyline query is performed or when no relevant cached queries areselected. InLines8{9,thehighest-rankedcachedqueryhasnoviolationswiththenew query and contains a complete result set for the new query. Hence, the system directly performs the skyline query based on this result set. Otherwise, the algorithm performs the alternate list of operations (e.g., FindCandidateResultSet) to ¯nd a candidate result set for the new query. Algorithm 12 CSKY(q) 1: let g =fg 1 ;g 2 ;:::;g n g be the user preference pro¯les used by a new query q; 2: let G be a set of cached queries with the same entry nodes as g for all PO domains 3: let R be a container for the result set 4: if (G is empty) then 5: return SkylineQuery(T(P;g:V);g), where P is the entire data set and V is the speci¯ed node set of g 6: else 7: let ^ G be a sorted query list in descending order of S(g;g i );8g i in G 8: if (the top element g 0 2 ^ G used by q 0 has a similarity value >0) then 9: R = SkylineQuery(T(q 0 :result;g:V), g) 10: else 11: (vioEdges,D) = FindCandidateResultSet(G, ±) 12: if (vioEdges is not empty) then 13: R = SkylineQuery(D, g) /* q is answerable */ 14: else 15: performconstraintqueriestorestoreeliminateddatatuplesRbyeachviolated relation in vioEdges; 16: R = SkylineQuery(D[R, g) 17: end if 18: end if 19: end if 20: return R /* the skyline points */ 116 4.3.4 Experimental Evaluation WeevaluatedtheperformanceoftheC-SKY algorithmbycomparingitwiththestate-of- the-artTSS approach[SPP09], whichhandlespartially-ordereddomains. Unlike C-SKY, TSS consultstheentiredatasetwheneveritexecutesanewskylinequeryrequest. C-SKY adopts TSS as the baseline algorithm to evaluate the skyline results for partially-ordered domains and adds its own caching mechanisms. Therefore, the CPU execution time for the ¯rst query is identical to the TSS approach. Subsequently, as the cache takes e®ect, performance gains are achieved. We utilize R-trees as the underlying structure for indexing the data and skyline points. Speci¯cally, we use the Spatial Index Library [Lib] for the R-tree index. A page size of 4 KBytes is deployed, resulting in node capacities between 78 (d = 6) and 204 (d = 2). The skyline result points are indexed by a main- memory R-tree to improve the performance of the dominance checks. Our data set for the a totally-ordered domain is in the range of [0;1000) and we generated up to 100,000 normal distributed data points with dimensions in the range of 2 to 4. For a partially- ordered domains, we generate a PO value for each data dimension from 2 to 10, which is the maximal number of distinct options for a user preference pro¯le in the system. The height of a DAG is the maximum length of any path in the graph. The lattice node size for a DAG is determined by a height from 2 2 to 2 10 and a density ratio 0:6. We set the threshold ± as a percentage of the data set. ± is used for the query selection operation, which avoids caching a query with a result set size larger than ±. The cache size », on the other hand, is the percentage of the maximum result size for all the queries (equals ±£the number of queries). Experiments are conducted with a Pentium 3.20 GHz CPU 117 Parameter Default Range Data cardinality (P) 20,000 20,000, 40,000, 60,000, 80,000, 100,000 Query cardinality (Q) 100 1 to 100 Number of TO domains (jTOj) 2 2, 3, 4 Number of PO domains (jPOj) 2 1, 2 DAG height (h) 6 2, 4, 6, 8, 10 DAG density (d) 0.6 - Cache threshold (±) 0.8% 0.05%, 0.1%, 0.2%, 0.4%, 0.8%, 1.6% Cache size (») 3% 1%, 2%, 3%, 4%, 5% Table 4.4: Simulation parameters for the C-SKY approach. and 1 GByte of memory. The query results are evaluated with an event-driven approach. The main measurement in the following simulations is the CPU time (for evaluating each skyline query) and the I/O cost. Our experiments use several metrics to compare these algorithms. Table 4.4 summarizes the default parameter settings used in the following simulations. 4.3.4.1 Cache Threshold (±) First, we measured the CPU execution time by varying the cache threshold size. The choice of a threshold size is critical for the performance of the system. If the threshold is too small, more queries with small result sets are cached. Such queries with small result sets often have intricate user preference pro¯les. The system might busily perform con- straintskylinequeriestorestoredmissingdatatuplesduetoalargenumberofviolations. Furthermore, less queries would be quali¯ed to be cached in the system, which results in less chances to utilize the cached queries with compatible user pro¯les. Therefore, the performance degrades. If the threshold is too high, more queries with a large result set 118 arecached. Consequently,morecachespaceisoccupiedandthecacheismorelikelytobe full. Hence, the system has to perform cache replacement operations more often. How- ever, since the C-SKY algorithm involves preference ¯ltering to facilitate the retrieval of a small candidate result set, a cached query with a large result set does not necessarily reduce the gain for the skyline evaluation time. A small size of the candidate result set results in an e±cient skyline evaluation time, which is the most time-consuming portion oftheoverallperformance. Figures4.15(a), (b)and(c)showtheCPUoverheadandI/O cost as a function of the threshold size ranging from 0.05% to 1.6% of the entire data set. When the threshold size is set to 0.05% or below, the performance of C-SKY is degraded in terms of the CPU execution time and I/O cost. A threshold size > 0:8% provides a better performance in terms of the CPU time and I/O cost. Therefore, we chose 0.8% as the default threshold size for the rest of our experiments. 0 2 4 6 8 10 1.6 0.8 0.6 0.4 0.2 0.1 0.05 CPU Time (sec) Threshold CSKY TSS 10 12 14 16 18 20 22 24 1.6 0.8 0.6 0.4 0.2 0.1 0.05 I/O cost Threshold CSKY TSS (a) CPU time. (b) I/O cost. Figure 4.15: Performance as a function of the cache threshold ±. 119 4.3.4.2 Data Cardinality Figures 4.16 (a) and (b) show the CPU execution time and I/O cost as a function of the number of data points, respectively. Overall, the CPU overhead increases with the numberofdatapoints. C-SKY achievesasigni¯cantreductionintermsoftheCPUtime compared with TSS. This is indicative of how C-SKY takes advantage of the results of a set of cached queries with compatible user preference pro¯les. Since the TSS approach considers the entire data set when evaluating the skyline result for each new query, the CPU overhead is signi¯cant with a large data set, especially as a result of the R-tree constructions. In C-SKY, since the system only has to construct R-trees on a small candidate result set, the overall CPU time is reduced. The experimental results con¯rm the bene¯ts of the C-SKY approach that adopts caching and therefore achieves better CPU performance and lower I/O cost than the TSS technique. 0 10 20 30 40 50 100,000 80,000 60,000 40,000 20,000 CPU Time (sec) Number of Data (P) CSKY TSS 20 40 60 80 100 100,000 80,000 60,000 40,000 20,000 I/O cost Number of Data (P) CSKY TSS (a) CPU time. (b) I/O cost. Figure 4.16: Performance as a function of data cardinality. 120 4.3.4.3 Query Cardinality Next, we report on the impact of the query cardinality on the performance of the two approaches. Figures 4.17 (a) and (b) show the CPU overhead and I/O cost versus the query cardinality as it ranges from 1 to 100, respectively. When starting the system, the CPU overheads of both approaches for evaluating the ¯rst skyline query are identical. As time progresses, the C-SKY system caches more queries and hence the algorithm can utilizeandretrieveacandidateresultsetthatisasubsetoftheentiredataset. TheCPU performance is improved as more relevant queries are accessed by new queries. However, as the number of queries increases (the cache is likely full), the improvement of the C- SKY approach slows as the system handles more cached queries and more similarity comparisons are performed. For cache management, since the cache is more likely full, replacement operations are executed more frequently. However, overall we can see that C-SKY still outperforms TSS in terms of the CPU time and I/O cost. 2 4 6 8 10 12 100 90 70 50 30 10 1 CPU Time (sec) Number of Query (Q) CSKY TSS 10 12 14 16 18 20 22 24 100 90 70 50 30 10 1 I/O cost Number of Query (Q) CSKY TSS (a) CPU time. (b) I/O cost. Figure 4.17: Performance as a function of query cardinality. 121 4.3.4.4 User Pro¯le Cardinality In this experiment we investigate the e®ect of the DAG height associated with the PO domains. In Figures 4.18 (a) and (b), we vary the DAG height from 2 to 10. Both algorithms incur an increasing CPU load and I/O cost as the DAG height increases. When the total number of lattice nodes of a DAG increases, C-SKY mainly su®ers from higher computation costs of the similarity measurements, since the system has to check a large number of lattice nodes (or relations) for similarity comparisons. Furthermore, t- dominance operationsareperformedintensively,becausethequeryprocessormightaccess intricate user pro¯les composed of more lattice nodes. Consequently, the skyline result points are often large such that the performance of the dominance checks is degraded. The performance of the TSS approach remains relatively stable, albeit at a worse level than C-SKY. 0 2 4 6 8 10 12 14 10 8 6 4 2 CPU Time (sec) DAG Height (h) CSKY TSS 0 5 10 15 20 25 30 35 10 8 6 4 2 I/O cost DAG Height (h) CSKY TSS (a) CPU time. (b) I/O cost. Figure 4.18: Performance as a function of the DAG height. 122 4.3.4.5 Dimensionality Next, weinvestigatetheimpactofthedimensionalityontheperformanceofthe TSS and C-SKY techniques. Figures 4.19 (a), (b) and (c) illustrate the CPU overhead and I/O cost versus the PO and TO dimensionality in pairs of (size of PO, size of TO), ranging from 2 to 4 for the TO domains and 1 to 2 for the PO domains, respectively. When the dimensionality increases, the performance of all methods is degraded because the R-trees failto¯lteroutirrelevantdataentriesinhigherdimensions. Thesystempossiblyoutputs more skyline points that in turn incur more dominance checks. From all the ¯gures we can see that C-SKY outperforms TSS slightly when the dimensionality is high. Because the skyline result sets are often large and signi¯cant dominance checks are required, the cached queries cannot contribute much in this case. 0 5 10 15 20 25 4,2 4,1 3,2 2,2 3,1 2,1 CPU Time (sec) Dimensionality (|TO| : |PO|) CSKY TSS 0 5 10 15 20 25 4,2 4,1 3,2 2,2 3,1 2,1 I/O cost Dimensionality (|TO| : |PO|) CSKY TSS (a) CPU time. (b) I/O cost. Figure 4.19: Performance as a function of the dimensionality. 123 4.3.4.6 Cache Size Weinvestigatethee®ectofthecachesizeinFigure4.20. Thesizeofthecacheisimportant in terms of its overall impact on improving the performance of the system. If the cache size is too small, the system su®ers from more disk I/Os because less useful queries are cached. On the other hand, if the cache size is too large, the C-SKY algorithm must processalargesetofrelevantcachedquerieswithrespecttoeachnewquery. Speci¯cally, many similarity measurement operations need to be executed to retrieve the candidate data set. Figure 4.20 nicely illustrates the tradeo® as a cache size of between 3% to 4% seems to result in optimal performance. 0 2 4 6 8 10 12 14 5 4 3 2 1 CPU Time (sec) Cache Size CSKY TSS 10 15 20 25 30 5 4 3 2 1 I/O cost Cache Size CSKY TSS (a) CPU time. (b) I/O cost. Figure 4.20: Performance as a function of the DAG density. 4.3.5 Summary We have introduced a novel approach, termed C-SKY, to process skyline queries with partially-ordered domains by caching the query results with their unique user preference pro¯les. The query response time of a new query is signi¯cantly reduced by retrieving 124 its result from the cached result sets with compatible speci¯cations. Our similarity mea- sure enables the query processor to ¯nd the minimum set among the candidate results. In case a query result cannot be fully computed from the cache, we propose the use of less expensive constraint skyline queries to restore missing data tuples. Finally, to lower the space overhead, we propose a cache management scheme where only the most popu- lar speci¯cations are preserved. Our experimental evaluation demonstrates that C-SKY improves existing methods. 125 Chapter 5 Approximate Continuous K-Nearest Neighbor Queries for Moving Objects 5.1 System Overview and Assumptions We assume a set of moving objects including the query object q = [s q ;e q ] and data objectso2O;o=[s o ;e o ], eachmovingonapre-de¯nedpathwithvectorvelocity ¡ ! v. For simplicity, we assume the trajectories of moving objects are straight lines, which can be easily extended to polylines or curved lines, re°ecting reality more closely. The AC-kNN algorithm computes a set of split points for each moving object in advance, and later, duringthemovementofthequeryobject,theAC-kNNalgorithmreturnstheapproximate result of the kNN query when the object passes a split point on its path. The split point set for an object o is denoted SL o =fo s i ;o s i+1 ;:::;o s i+n g, where o s i =s o and o s i+n =e o . In addition, since moving objects may be located in di®erent spatial regions, we need to convert data trajectories into a transformed space, the relative distance space, where we compute the Euclidian distance between all points of a data trajectory and the segment of the query trajectory corresponding to the data trajectory. As an example consider 126 Figure5.1(a), withasetofdataobjectsfa;b;cgandq asthequeryobjectintheoriginal space. a b c q y x (a) Original space. j s a k s b 1 k s b 1 j s a 2 j s a l s c 1 l s c 2 k s b dist q (b) Relative distance space. i s q 1 i s q 2 i s q 4 i s q 5 i s q 3 i s q j s a 5 . 0 j s a k s b 1 k s b 5 . 1 k s b 1 j s a 2 j s a l s c 1 l s c 6 i s q 2 k s b dist (c) Split points. Figure 5.1: The data set and the query object. Figure 5.1 (b) shows its relative distance space and Figure 5.1 (c) illustrates four sets of split points for each moving object: SL q = fq s i ;q s i+1 ;:::;q s i+6 g for q, SL a = fa s j ;a s j+0:5 ;a s j+1 ;a s j+2 gfora,SL b =fb s k ;b s k+1 ;b s k+1:5 ;b s k+2 gforb,andSL c =fc s l ;c s l+1 g for c, all of which divide the moving trajectories into segments. In case of a data object 127 b, its split points are where there are updates on a segment basis for b. When b passes b s k+1 , b is in the segment [b s k+1 ;b s k+1:5 ]. In case of q, the split points are not only the indicators of the location update for q but also where an AC-kNN result set needs to be updated. At each split point, the query object receives a new set of KNN for the [q s i+t ;q s i+t+1 ] interval and the location of q is updated to [q s i+t ;q s i+t+1 ]. AC-kNN algorithm computes the K nearest neighbor set based on this segment-based location information instead of speci¯c location points in order to reduce the disk access rate. We consider a 2-D environmentandEuclidiandistancetomeasuretheAC-kNNinthisthesis. Theobjective of an AC-kNN query is to retrieve approximate k nearest neighbors of q continuously during its movement to its destination. We de¯ne a disk access as an update request, which is triggered by one of the two following events: (1) a moving object updates its location, velocity, etc., to the server; (2) a query object requests the result updates to maintain the AC-kNN set during its movement. 5.2 Approximation Model 5.2.0.1 Split Point Computations Inmanyexistingsystems(eg.,[LYH04,ISS03]),theserequeststakeplacewheneveramov- ing object changes its velocity over time. In our AC-kNN algorithm however, a moving objectonlysendsanupdaterequestwhenitreachesoneofitssplitpoints. Weusean R*- tree to index each moving object trajectory [BKSS90] by a single bounding rectangle for simplicity, instead of partitioning a trajectory into several bounding rectangles [AGI05]. 128 ˴ ˵ ˶ ˷ 1 r 2 r 3 r 4 r Figure 5.2: The increment points and the candidate set. We observe the following lemmas when generating the split points and the AC-kNN algorithm. Lemma5.1. AlltheAC-kNNresultsareobtainedfromthe candidatetrajectoryset. The candidate trajectory set is de¯ned as the union of trajectories of the kNN retrieved from each r2fr 1 =s q ;r 2 ;:::;r n =e q g where r i is an increment point on the query trajectory from r i¡1 by the length of M which is the minimum distance to cover all trajectories of kNN of r i¡1 . By utilizing the candidate trajectory set, we are able to prune irrelevant data objects instead of considering all data sets as part of the AC-kNN result. In other words, all the trajectoriesinthecandidatesetareconsideredwhencomputingtheAC-kNNquery. Since we use a bounding rectangle to index a trajectory, the point-NN queries invoked here can ¯ndK nearesttrajectories. Forinstance, inFigure5.2where k =1, theincrementpoints are r 1 , r 2 , r 3 , r 4 whose 1-NN are fag,fbg,fcg,fdg respectively. Therefore, the candidate set isfa;b;c;dg. 129 Lemma 5.2. The split points consist of (1) the starting/ending point of trajectories for moving objects, and (2) intersection points between two data trajectories or (3) intersec- tion points between data trajectories and the query trajectory. The data trajectories are the candidate trajectories de¯ned in Lemma 1. We have de¯ned the split points as the points where a moving object needs to update its location on a segment basis or the points where the query object needs to send an AC-kNN update request to the server in order to maintain the AC-kNN set. Since we only consider those objects on candidate trajectories as part of the AC-kNN result set when performing the AC-kNN query, other non-candidate data objects do not need to be partitioned into segments because their updates are not relevant. A split point is used because it is where a kNN set changes, which is called an event. Our observation of determining a split point is based on the following: a starting point of the trajectory is where the moving object starts moving such that it should be considered as part of the AC-kNN result. Similarly, an ending point of the trajectory is where the moving object stops moving. Therefore, an ending point should be checked to see whether it is excluded from the current AC-kNN result. The intersection point of two candidate trajectories or a candidate trajectory and the query trajectory is considered since it is where an order event occurs; that is, one or more of the data objects in kNN set may go further or closer to the query object. The process of generating split points for the query object and data objects is divided into two parts: 1. Relative distance space transformation: Transform all the candidate trajectories into a relative distance space to the query 130 trajectory. In Figure 5.1 (a), it shows the original space of the trajectories for the data set fa;b;cg and the query trajectory q. Figure 5.1 (b) shows its relative dis- tance space. 2. Split point collection: Collect all points from all moving objects using the de¯nition of Lemma 2. For example, in Figure 5.1 (b), based on this principle, we can retrieve the points P =fa s j ;b s k ;a s j+1 (=b s k+1 );a s j+2 ;b s k+2 ;c s l ;c s l+1 g 3. Split point insertion: a. Draw a line l passing each s, s2P and perpendicular to the segment of q. b. Store all the intersection points between l and trajectories. The complete split point sets are as shown in Figure 5.1 (c). Consider Figure 5.1 (b) and Figure 5.1 (c) as an example, a s j+0:5 is stored because l passing the point b s k intersects the trajectory of a at a s j+0:5 . In this way, while the query object is moving between [q s i+1 ;q s i+2 ], it is able to know whether a is moving between [a j+0:5 ;a j+1 ] and whether b is moving between [b s k ;b s k+1 ]. These segment-based location information can greatly reduce the disk accesses and is relied by the AC-kNN algorithm tocomputetheapproximatequeryresult. Thebestwaytorepresentthesegmentlocation information is using a matrix, which we call Segment-based Location Table (SLT). Table 5.1 shows an example of the SLT converted from Figure 5.1 (c). The value of (a;q s i+1 ) is 1 in SLT. It represents that a is between [a s j+0:5 ;a s j+1 ], but is not between 131 [a s j ;a s j+0:5 ], and [a s j+1 ;a s j+2 ] whose values are 0. When the value is 1, it represents that the segment is invalid. The SLT is updated whenever a moving object moves away fromitscurrentsegment. Forexample, while amovesfrom[a s j+0:5 ;a s j+1 ]to[a s j+1 ;a s j+2 ], (a;q s i+1 ) is updated to 0 and (a;q s i+2 ) is updated to 1. Table 5.1: Segment-based location table. q q s i q s i+1 q s i+2 q s i+3 q s i+4 q s i+5 q s i+6 a 0 1 0 1 1 1 1 b 1 1 0 0 1 1 1 c 1 1 1 1 1 1 1 The processing of AC-kNN search is to ¯rst ¯nd a candidate trajectory set where moving objects are considered when computing AC-kNN queries. Second, split points are generated at which events/order events take place in the relative distance space. The split points are converted into a SLT, storing the segment-based location information for each moving object such that the speci¯c locations are not required in order to reduce thenumberofupdates. TheupdateofSLT isrequiredonlywhenamovingobjectmoves away from its current segment. The query object sends an AC-kNN query each time it passes a split point on the query trajectory until it reaches its ending point. However, due to the use of the segment-based location information, the current locations of the moving objects are unknown and the Euclidian distance as the metric to ¯nd the kNN set can not be determined. We introduce a heuristic for solving this problem. Heuristic 5.1. Given two segments p = [s p ;e p ] and q = [s q ;e q ], an average distance between them is estimated by the following equation: avedist(p;q)=dist(midPoint(s p ;e p );midPoint(s q ;e q )) (5.1) 132 The AC-kNN algorithm adopts an incremental approach to ¯rst ¯nd the kNN from its nearest segments. If kNN is not ful¯lled, the algorithm considers the nearby segments until all the kNN are found. The full AC-kNN algorithm is shown as Algorithm 13. Algorithm 13 AC-kNN Algorithm Input: the current split point q s i of q, a SLT, and K Output: the AC-kNN set 1: Let C be the set of segments whose value in SLT is 1 retrieving from the elements (q s i ;O) of SLT, for each o2O, where O is set of all data objects. 2: Let r = maximum(avedist = ([q s i ;q s i+1 ];c)) , for each c2C 3: Find the center point q 0 = midPoint(q s i ;q s i+1 ) and use it to ¯nd a split point set P covered byjq 0 ¡rj tojq 0 +rj, such that P =fq s i¡n ;q s i¡n+1 ;:::;q s i+m g 4: Find all segments S with value 1 in SLT, for each p2P 5: Find the ¯rst k objects in increasing order of avedist([q s i ;q s i+1 ];s) for each s2S and insert them to AC-kNN set 6: if (Number of AC-kNN < k) then 7: Expend P by adding the next split point of q s i+m and the previous split point of q s i¡n to P, then go to Line 4. 8: else 9: Return AC-kNN set. 10: end if Consider the example in Figure 5.1 (c), where k = 3, while q is on [q s i+1 ;q s i+2 ], ini- tially, check with SLT (Table 5.1) and get C =[a s j+0:5 ;a s j+1 ];[b s k ;b s k+1 ]. The maximum distance to the query trajectory is r = avedist([q s i+1 ;q s i+2 ];[a s j+0:5 ;a s j+1 ]) Therefore, as shown in Figure 5.3, the split point set P =fq s i ;q s i+1 ;q s i+2 g. Then check the SLT, the segments with value 1 in SLT are [a s j+0:5 ;a s j+1 ] and [b s k ;b s k+1 ]. Since [b s k ;b s k+1 ] has shorter avedist than that of [a s j+0:5 ;a s j+1 ], b is added to the AC-kNN set as the ¯rst nearest neighbor of q; a is then added as the second nearest neighbor of q. In order to ¯ndthethirdnearestneighborofq, thealgorithmexpandsP tocontainmoresplitpoints for retrieving more moving objects. Finally, c is added later to the AC-kNN set. 133 i s q 1 i s q 2 i s q 4 i s q 5 i s q 3 i s q j s a 5 . 0 j s a k s b 1 k s b 5 . 1 k s b 1 j s a 2 j s a l s c 1 l s c 6 i s q 2 k s b dist Figure 5.3: P =fq s i ;q s i+1 ;q s i+2 g. 5.3 Experimental Evaluation We use a timepiece-wise approach as the baseline. This is a brute-force approach which compares the current locations of all moving objects to ¯nd the K nearest neighbors of thequeryobjectateachtimeunit. ItreturnsthemostcorrectkNNset. Wethencompare theaccuracyanddiskaccessesoftheBLalgorithmwiththoseoftheAC-kNNalgorithm, respectively. Experiments are conducted with a Pentium 4, 3 GHz CPU and 1 Gbyte of memory. We set up a similar experimental environment as BL did. A two-dimensional 1000£1000worldatthescaleof[1:0.02]milesisconstructed,whereobjectsareuniformly distributed and the speed vector of each object is randomly generated ranging from 0 to 5. WeusetheR*-treelibraryimplementedinJavabyPapadias[ioRt]. TheBLalgorithm in our experiment uses four bounding rectangles to decompose each curve and the LFA is set to be two time units. The de¯nition of the disk access rate has been discussed in Chapter 5.1 and for the accuracy of the kNN sets, we use a spatial accuracy metric, Average Distance to Nearest Predicted location(ADNP) [HS03], which is de¯ned as: 134 ADNP = 1 jSj X s2S dist(s;NP(s)) (5.2) By utilizing ADNP, if the correct result kNN set is fa;b;cg and the AC-kNN algo- rithm ¯nds fa;c;eg, ADNP is calculated as (dist(a;a)+dist(b;c)+dist(c;e))=3. Note that the accuracy is 100% when ADNP is 0. We use up to50,000 movingobjects and k =30 for the experiment. In Figure5.4 (a), we compare the ADNPs of the AC-kNN algorithm with those of the BL algorithm as a function of the number of moving objects. From the ¯gure, we can easily observe that both the AC-kNN algorithm and the BL algorithm scale well, and that overall AC-kNN has competitive ADNP rates to the BL algorithm. From the result shown in Figure 5.4 (b), we can see that the AC-kNN algorithm outperforms the BL algorithm in terms of disk accesses. This demonstrates that our approximate approach can greatly reduce disk accesses. 0 20 40 60 80 100 50k 40k 30k 20k 10k ANDP (Miles) Number of Nodes (k) BL AC-KNN 100 1000 10000 100000 1e+006 1e+007 50k 40k 30k 20k 10k Disk Accesses Number of Nodes (k) BL AC-KNN (a) ADNP. (b) Disk Accesses. Figure 5.4: Performance v.s. number of nodes. 135 5.4 Summary We introduce an algorithm to perform an approximate continuous nearest neighbor search, while signi¯cantly reducing the disk access rate compared with existing AC-kNN algorithms that retrieve the exact c-kNN result. By using split points on its trajectory, a moving object only needs to send a segment-based location update to the server when it reaches each split point. The server then uses the approximate segment-based location information of each moving object to compute the AC-kNN for the query trajectory. We de¯neavedist formeasuringtheapproximatedistancebetweentwosegments. Weassume a straight line moving trajectory, which in reality might be a polyline or curve. In our futurework,weintendtoutilizesomeexistingindexstructuressuchasMON trees,which we believe can be applied to handle non-linear trajectories e±ciently. Our experiments show that our algorithm outperforms the Beach Line algorithm in terms of disk access rate while retaining a competitive accuracy. 136 Chapter 6 Conclusions and Future Work 6.1 Summary Thisthesisaddressestwochallengingissuesine±cientqueryevaluationandlowcommuni- cationcostrelatedtofrequentlocationupdates. WeproposeASR andPLU algorithmsto reduce the number of location updates for di®erent movement models. We have designed an ASR-based framework for trajectory movement environments. The novel concept of an adaptive safe region is introduced to provide a mobile object with a reasonable-sized safe region that adapts to the surrounding queries. Hence, the communication overhead resultingfromthequerymovementsisgreatlyreduced. Tofurtherdecreasenetworktraf- ¯c caused by c-kNN query region expansions to cover su±cient NNs for the result sets, our approach caches extra NNs. An incremental result update mechanism that checks only the set of a®ected points to refresh the query answers is presented. The PLU approach is designed for arbitrary movement environment where mobile units may change their locations without any constraints. The novel concept of a Loca- tion Information Table is introduced to provide a mobile object with information about 137 queries, hence enabling it to estimate query movements and transmit a location update to the server only when it a®ects the query results. To further reduce network tra±c the server uses smart on-demand location probes. Finally, the proposed mechanism ef- ¯ciently determines the set of objects that are a®ected by a query insertion, improving scalability. In addition, we present a spatial data compression method to reduce its size and ¯t it into an Internet packet. Experimental results demonstrate that our approach scales better than existing techniques. Another research direction proposed in this dissertation is to reduce the response time of evaluating continuous queries. We design ESC and C-SKY algorithms for sky- line queries with totally-ordered and partially-ordered domains. Our ESC algorithm achieves a faster response time and better overall CPU performance for skyine queries with totally-ordered domains. With the adoption of the pre-computed S2 sets, ESC can e±ciently update the skyline query results and delegate the most complex computations toaseparateprocedurethatexecutesaftertheupdatesofthequeryresultsarecompleted. An approximate exclusive data region (AEDR) is proposed and our experiments con¯rm the feasibility of AEDR which has a low amortized cost of the exclusive data evalua- tion in high dimensional and dynamic data environments. The S1Evaluation procedure ¯rst examines all the incoming data requests and updates the S1 result if necessary and the S2Evaluation procedure integrates our lemmas and heuristics to achieve a low CPU overhead and reduced I/O cost. For skyline queries with partially-ordered domains, we introduce the (C-SKY) algorithm that caches the query results with their unique user preference pro¯les. The query response time of a new query can be greatly reduced by retrieving its result from the caching result set of the caching queries with compatible 138 speci¯cations. Our similarity function facilitates the query processor to ¯nd a small can- didate result set. We propose less expensive constraint skyline queries to restore missing data tuples for unanswerable queries. To lower space overhead, we propose a scheme of cache management where only the most popular speci¯cations are preserved. Finally, in some applications, the exact spatial query results are not required. We introduce an algorithm to perform an approximate continuous nearest neighbor search, while signi¯cantly reducing the disk access rate compared with existing AC-kNN algo- rithms that retrieve the exact c-kNN result. By using split points on its trajectory, a moving object only needs to send a segment-based location update to the server when it reaches each split point. The server then uses the approximate segment-based location information of each moving object to compute the AC-kNN for the query trajectory. We de¯neavedist formeasuringtheapproximatedistancebetweentwosegments. Ourexper- iments show that our algorithm outperforms the Beach Line algorithm in terms of disk access rate while retaining a competitive accuracy. 6.2 Future Work Iwillfocusinmyfutureworkonrelatedpropertiesandapplications. Furthermore, Iplan to extend our work as follows. 6.2.1 Supporting Road Network Data Objects To produce more realistic mobile movements, we will import the road network from the TIGER/Line [TIG] street vector data set available from the U.S. Census Bureau and use thesameprinciplesadoptedin[KZWW06]tointegratetheroadsegments(e.g., freeways, 139 primary highways, secondary and connecting roads, and rural roads) into a complete road network. We plan to study the impact of the road network data sets which are not uniformly distributed in the space and provide solutions to alleviate the performance degradation resulting from skewed data sets. 6.2.2 Handling Large Queries Our current work [HZWK07] and the existing work [HXL05, PXK + 02] have proposed the safe-region-based algorithms to reduce the issues of the location updates from mobile clients. However, thistechniquecannothandlealargesetofqueries. Sincethedensityof queries is high, the chance of entering a query range for a moving object is high as well. Our simulation results from the current work show that the performance deteriorates with increasing queries. We would like to address this issue in our future work as it is critical to enable the system to support better location-based services in the metro cities with very dense subscribers. In this case, the periodic scheme may outperform our approach which incurs extra network messages from informing the moving objects of the LITs. We plan to implement an empirical approach that combines our partition-based lazy update approach with the periodic scheme. We ¯rst partition the space into several manageable subareas. The suggested system is able to determine an appropriate scheme based on the density of the queries in the partition area. If the query density in the area is low, the server adopts the partition-based lazy update approach. Otherwise, the system switches to the periodic method requiring only one-direction messages from moving objects to report the server about their current locations at each time instance. By determining a threshold value for 140 the query density and performing di®erent update schemes at di®erent partition space, our goal is to reduce the total network messages between the server and mobile clients. 6.2.3 The Impact of Communication Delay Due to the limited network resources and battery life of moving objects, there might be communication delay or unexpected disconnection between the server and the mobile objects. The issue of data consistency is an important topic for our study, because it essentially a®ects the correctness of query results. Currently, our work assumes that there is no communication delay. We plan to study and simulate the impact of the communication delay by comparing it with a similar work [HXL05] which has evaluated the impact of communication delay. 141 References [AGI05] Victor Teixeira De Almeida, Ralf Hartmut GÄ uting, and Praktische Infor- matik. Indexing the trajectories of moving objects in networks. GeoInfor- matica, 9:33{60, 2005. [AM93] Sunil Arya and David M. Mount. Approximate nearest neighbor queries in ¯xed dimensions. In SODA '93: Proceedings of the fourth annual ACM- SIAM Symposium on Discrete algorithms, pages 271{280, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics. [BAG03] Sid-Ahmed Berrani, Laurent Amsaleg, and Patrick Gros. Approximate searches: k-neighbors + precision. In CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management, pages 24{31, New York, NY, USA, 2003. ACM. [BD03] L. Barkhuus and A. Dey. Location-based services for mobile telephony: a study of user's privacy concerns, 2003. [BKS01] Stephan BÄ orzsÄ onyi, Donald Kossmann, and Konrad Stocker. The Skyline Operator. In Proceedings of the 17th International Conference on Data En- gineering (ICDE), Heidelberg, Germany, pages 421{430, 2001. [BKSS90] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The r*-tree: an e±cient and robust access method for points and rectangles. In SIGMOD '90: Proceedings of the 1990 ACM SIGMOD inter- national conference on Management of data, pages 322{331, New York, NY, USA, 1990. ACM Press. [CET05] Chee Yong Chan, Pin-Kwang Eng, and Kian-Lee Tan. Strati¯ed compu- tation of skylines with partially-ordered domains. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Balti- more, Maryland, USA, pages 203{214, 2005. [CHC04] Ying Cai, Kien A. Hua, and Guohong Cao. Processing Range-Monitoring Queries on Heterogeneous Mobile Objects. In Mobile Data Management, pages 27{38, 2004. [CyLPL07] Reynold Cheng, Kam yiu Lam, Sunil Prabhakar, and BiYu Liang. An Ef- ¯cient Location Update Mechanism for Continuous Queries Over Moving Objects. Inf. Syst., 32(4):593{620, 2007. 142 [FTAA00] Hakan Ferhatosmanoglu, Ertem Tuncel, Divyakant Agrawal, and Amr El Abbadi. Vector approximation based indexing for non-uniform high dimen- sionaldatasets. InIn Proceedings of the 9th ACM Int. Conf. on Information and Knowledge Management, pages 202{209. ACM Press, 2000. [GG98] VolkerGaedeandOliverGÄ unther. MultidimensionalAccessMethods. ACM Comput. Surv., 30(2):170{231, 1998. [GL04] Bugra Gedik and Ling Liu. MobiEyes: Distributed Processing of Continu- ously Moving Queries on Moving Objects in a Mobile System. In EDBT, 2004. [Gut84] Antonin Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD '84: Proceedings of the 1984 ACM SIGMOD international con- ference on Management of data, pages 47{57, New York, NY, USA, 1984. ACM Press. [HLOT06] Zhiyong Huang, Hua Lu, Beng Chin Ooi, and Anthony K. H. Tung. Contin- uous Skyline Queries for Moving Objects. IEEE Trans. Knowl. Data Eng., 18(12):1645{1658, 2006. [HS03] Tianming Hu and Sam Yuan Sung. Spatial similarity measures in location prediction. Geographic Information and Decision Analysis, 7:95{96, 2003. [HXL05] HaiboHu,JianliangXu,andDikLunLee. Agenericframeworkformonitor- ingcontinuousspatialqueriesovermovingobjects. In SIGMOD Conference, pages 479{490, 2005. [HZWK07] Yu-Ling Hsueh, Roger Zimmermann, Haojun Wang, and Wei-Shinn Ku. Partition-based lazy updates for continuous queries over moving objects. In ACMGIS, 2007. [ioRt] Dimitris Papadias: Java implementation of R*-tree. http://www.rtreeportal.org. [ISS03] GlennS.Iwerks,HananSamet,andKennethP.Smith. Continuousk-nearest neighbor queries for continuously moving points with updates. In VLDB, pages 512{523, 2003. [JLO04] Christian S. Jensen, Dan Lin, and Beng Chin Ooi. Query and Update E±- cient B+-Tree Based Indexing of Moving Objects. In VLDB, 2004. [KRR02] Donald Kossmann, Frank Ramsak, and Ste®en Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In Proceedings of 28th Inter- national Conference on Very Large Data Bases (VLDB), Hong Kong, China, pages 275{286, 2002. [KZWW06] Wei-Shinn Ku, Roger Zimmermann, Chi-Ngai Wan, and Haojun Wang. Maple: A mobile scalable p2p nearest neighbor query system for location- based services. In ICDE, page 160, 2006. 143 [Lib] Spatial Index Library. http://www.research.att.com/ mar- ioh/spatialindex/index.html. [LYH04] YifanLi,JiongYang,andJiaweiHan. Continuousk-nearestneighborsearch for moving objects. In SSDBM, pages 123{126, 2004. [LYWL05] Xuemin Lin, Yidong Yuan, Wei Wang, and Hongjun Lu. Stabbing the Sky: E±cient Skyline Computation over Sliding Windows. In Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, pages 502{513, 2005. [LYZZ07] Xuemin Lin, Yidong Yuan, Qing Zhang, and Ying Zhang. Selecting Stars: The k Most Representative Skyline Operator. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, pages 86{95, 2007. [LZLL07] Ken C. K. Lee, Baihua Zheng, Huajing Li, and Wang-Chien Lee. Approach- ing the Skyline in Z Order. In Proceedings of the 33rd International Con- ference on Very Large Data Bases (VLDB), University of Vienna, Austria, pages 279{290, 2007. [MA05] Mohamed F. Mokbel and Walid G. Aref. Gpac: generic and progressive processing of mobile queries over mobile data. In MDM '05: Proceedings of the6thinternationalconferenceonMobiledatamanagement,pages155{163, New York, NY, USA, 2005. ACM Press. [McD99] A. Bruce McDonald. A mobility-based framework for adaptive dynamic cluster-based hybrid routing in wireless ad-hoc networks, 1999. [MHP05] Kyriakos Mouratidis, Marios Hadjieleftheriou, and Dimitris Papadias. Con- ceptual Partitioning: An E±cient Method for Continuous Nearest Neighbor Monitoring. In SIGMOD Conference, 2005. [MPBT05] KyriakosMouratidis,DimitrisPapadias,SpiridonBakiras,andYufeiTao. A Threshold-Based Algorithm for Continuous Monitoring of k Nearest Neigh- bors. IEEE Trans. Knowl. Data Eng., 17(11):1451{1464, 2005. [MPG06] Michael D. Morse, Jignesh M. Patel, and William I. Grosky. E±cient Con- tinuousSkylineComputation. InProceedings of the 22nd International Con- ference on Data Engineering (ICDE), Atlanta, GA, USA, page 108, 2006. [MPJ07] Michael D. Morse, Jignesh M. Patel, and H. V. Jagadish. E±cient Skyline Computation over Low-Cardinality Domains. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), University of Vienna, Austria, pages 267{278, 2007. [MXA04] Mohamed F. Mokbel, Xiaopeng Xiong, and Walid G. Aref. Sina: Scalable incremental processing of continuous queries in spatio-temporal databases. In SIGMOD Conference, pages 623{634, 2004. 144 [PJET05] Jian Pei, Wen Jin, Martin Ester, and Yufei Tao. Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, pages 253{264, 2005. [PTFS03] Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. An Optimal and Progressive Algorithm for Skyline Queries. In Proceedings of the 2003 ACMSIGMODinternationalconferenceonManagementofdata,pages467{ 478, New York, NY, USA, 2003. [PTFS05] Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. Progressive Skyline Computation in Database Systems. ACM Trans. Database Syst., 30(1):41{82, 2005. [PXK + 02] Sunil Prabhakar, Yuni Xia, Dmitri V. Kalashnikov, Walid G. Aref, and Susanne E. Hambrusch. Query Indexing and Velocity Constrained Index- ing: Scalable Techniques for Continuous Queries on Moving Objects. IEEE Trans. Computers, 51(10):1124{1140, 2002. [SJLL00] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. IndexingthePositionsofContinuouslyMovingObjects. InSIGMOD Conference, 2000. [SPP09] Dimitris Sacharidis, Stavros Papadopoulos, and Dimitris Papadias. Topo- logically sorted skylines for partially ordered domains. In Proceedings of the 25th International Conference on Data Engineering (ICDE), Shanghai, China, 2009. [SRF87] Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. The r -tree: A dynamic index for multi-dimensional objects. In The VLDB Journal, pages 507{518, 1987. [SS06] Mehdi Sharifzadeh and Cyrus Shahabi. The spatial skyline queries. In Pro- ceedings of the 32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, pages 751{762, 2006. [TEO01] Kian-Lee Tan, Pin-Kwang Eng, and Beng Chin Ooi. E±cient Progressive Skyline Computation. In Proceedings of the 27th International Conference onVeryLargeDataBases(VLDB),pages301{310,SanFrancisco,CA,USA, 2001. Morgan Kaufmann Publishers Inc. [TFPL04] Yufei Tao, Christos Faloutsos, Dimitris Papadias, and Bin Liu. Prediction andindexingofmovingobjectswithunknownmotionpatterns. InSIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Managementofdata,pages611{622,NewYork,NY,USA,2004.ACMPress. [TIG] TIGER/Line. http://www.census.gov/geo/www/tiger/. 145 [TPS03] Yufei Tao, Dimitris Papadias, and Jimeng Sun. The TPR*-Tree: An Op- timized Spatio-Temporal Access Method for Predictive Queries. In VLDB, pages 790{801, 2003. [TWZ + 07] Li Tian, Le Wang, Peng Zou, Yan Jia, and Aiping Li. Continuous Monitor- ing of Skyline Query over Highly Dynamic Moving Objects. In Sixth ACM InternationalWorkshoponDataEngineeringforWirelessandMobileAccess (MobiDE), Beijing, China, pages 59{66, 2007. [WAEA07] Ping Wu, Divyakant Agrawal, Ä Omer Egecioglu, and Amr El Abbadi. Deltasky: Optimal Maintenance of Skyline Deletions without Exclusive Dominance Region Generation. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), The Marmara Hotel, Istanbul, Turkey, pages 486{495, 2007. [XMA05] Xiaopeng Xiong, Mohamed F. Mokbel, and Walid G. Aref. SEA-CNN: Scal- able processing of continuous k-nearest neighbor queries in spatio-temporal databases. In ICDE, pages 643{654, 2005. [XMA06] XiaopengXiong,MohamedF.Mokbel,andWalidG.Aref. LUGrid: Update- tolerant grid-based indexing for moving objects. In MDM, 2006. [YPK05] Xiaohui Yuu, Ken Q. Pu, and Nick Koudas. Monitoring k-nearest neighbor queries over moving objects. In ICDE, pages 631{642, 2005. 146
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
MOVNet: a framework to process location-based queries on moving objects in road networks
PDF
Scalable processing of spatial queries
PDF
Ensuring query integrity for sptial data in the cloud
PDF
Efficient reachability query evaluation in large spatiotemporal contact networks
PDF
Location-based spatial queries in mobile environments
PDF
Spatial query processing using Voronoi diagrams
PDF
Generalized optimal location planning
PDF
Efficient indexing and querying of geo-tagged mobile videos
PDF
Enabling spatial-visual search for geospatial image databases
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
Efficient crowd-based visual learning for edge devices
PDF
Query processing in time-dependent spatial networks
PDF
From raw sensor data to moving object trajectories at right resolution, quality, and abstraction
PDF
Edge indexing in a grid for highly dynamic virtual environments
PDF
Spatio-temporal probabilistic inference for persistent object detection and tracking
PDF
Inferring mobility behaviors from trajectory datasets
PDF
Privacy-aware geo-marketplaces
PDF
A function approximation view of database operations for efficient, accurate, privacy-preserving & robust query answering with theoretical guarantees
PDF
From matching to querying: A unified framework for ontology integration
PDF
Supporting multimedia streaming among mobile ad-hoc peers with link availability prediction
Asset Metadata
Creator
Hsueh, Yuling
(author)
Core Title
Efficient updates for continuous queries over moving objects
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/04/2009
Defense Date
03/25/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
location-based services,moving object processing,OAI-PMH Harvest,scalable continuous query processing,spatial data indexing,spatio-temporal databases
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Zimmermann, Roger (
committee chair
), Kuo, C.-C. Jay (
committee member
), Shahabi, Cyrus (
committee member
)
Creator Email
hsueh@usc.edu,yulinghsueh@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2469
Unique identifier
UC1478587
Identifier
etd-Hsueh-3091 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-180899 (legacy record id),usctheses-m2469 (legacy record id)
Legacy Identifier
etd-Hsueh-3091.pdf
Dmrecord
180899
Document Type
Dissertation
Rights
Hsueh, Yuling
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
location-based services
moving object processing
scalable continuous query processing
spatial data indexing
spatio-temporal databases