Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
MOVNet: a framework to process location-based queries on moving objects in road networks
(USC Thesis Other)
MOVNet: a framework to process location-based queries on moving objects in road networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MOVNET: A FRAMEWORK TO PROCESS LOCATION-BASED QUERIES ON MOVING OBJECTS IN ROAD NETWORKS by Haojun Wang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2009 Copyright 2009 Haojun Wang Dedication This dissertation is dedicated to my beloved parents, Yimi Wang and Yifen Liu, who gave me everything to realize my dream. ii Acknowledgements First, I would like to express my most profound gratitude to my advisor, Profes- sor Roger Zimmermann, for his guidance and support. Working with him is an invaluable experience in my life. Professor Zimmmermann is a brilliant computer scientist and a great man with a gentle heart. It has been a huge honor for me to be his student. Second, I would like to extend my appreciation to my other dissertation com- mittee members, Professor Cyrus Shahabi of the Department of Computer Science, and Professor Jean-Pierre Bardet of the Department of Civil and Environmental Engineering, for reviewing my work. Additionally, I want to thank Professor Vik- tor Prasanna of the Department of Electrical Engineering, Professor Alexandre R. J. Fran¸ cois of the Department of Computer Science, and Professor Elaine Chew of the Department of Industrial and Systems Engineering for providing thoughtful suggestions for this dissertation. My appreciation also goes to all my deal friends at the University of South- ern California. Particularly, I would like to thank my colleagues in the Database Management Research Lab for their help and collaboration. iii Last but not least, I sincerely appreciate my parents. Their love has been the largest support for me all these years. It was them who gave me the courage to pursue my dream no matter how hard and how far to realize it. None of my achievements would be possible without their unreserved love. iv Contents Dedication ii Acknowledgements iii Abstract xii Chapter 1: Introduction 1 1.1 Motivation ..... ........... ........... ..... 1 1.2 Overview ...... ........... ........... ..... 4 1.3 Contributions ... ........... ........... ..... 6 1.4 Outline ....... ........... ........... ..... 8 1.5 TerminologyDefinitions......... ........... ..... 9 Chapter 2: Background and Related Work 11 2.1 GlobalPositioningSystem(GPS) ... ........... ..... 11 2.2 Euclidean-distance-based Spatial Query Processing on Stationary POIs 13 2.3 Network-distance-based Spatial Query Processing on Stationary POIs 15 2.4 IndexDesignonMovingPOIs ..... ........... ..... 17 2.5 Euclidean-distance-based Spatial Query Processing on Moving POIs 19 2.6 Network-distance-based Spatial Query Processing on Moving POIs . 20 Chapter 3: System Assumptions and Problem Statements 22 3.1 NetworkModelingandAssumptions . . ........... ..... 22 3.2 ProblemStatements ........... ........... ..... 25 Chapter 4: Index Design 27 4.1 Dual-indexStructureDesign ...... ........... ..... 27 4.2 The Minimum and Maximum Bounds on the Number of Cells Over- lappingwithanEdge .......... ........... ..... 31 Chapter 5: Snapshot Query Processing 36 5.1 SnapshotRangeQueryAlgorithm ... ........... ..... 36 5.2 Snapshot kNearestNeighborQueryAlgorithm ....... ..... 42 Chapter 6: Performance Analysis 48 v 6.1 The Number of Overlapping Cells with regard to a Range Query . . 48 6.2 AnalysisofMNDR ........... ........... ..... 50 6.3 AnalysisofMKNN ........... ........... ..... 51 Chapter 7: Continuous Query Processing 53 7.1 DataStructureDesign ......... ........... ..... 53 7.2 ContinuousRangeQueryAlgorithm .. ........... ..... 57 7.2.1 InitialResultComputationinC-MNDR ....... ..... 57 7.2.2 MonitoringObjectUpdatesinC-MNDR....... ..... 62 7.2.3 QueryObjectUpdateProcessinginC-MNDR.... ..... 65 7.2.4 OverviewoftheContinuousRangeQueryProcessing ..... 71 7.3 Contiuous kNearestNeighborQueryAlgorithm ...... ..... 73 7.3.1 InitialResultComputationinC-MKNN ....... ..... 73 7.3.2 MonitoringObjectUpdatesinC-MKNN....... ..... 79 7.3.3 Overview of the Continuous kNNQueryProcessing. ..... 82 Chapter 8: Experimental Evaluation 85 8.1 SimulatorImplementation ....... ........... ..... 85 8.2 SnapshotQueryProcessingSimulationResults ....... ..... 88 8.3 ContinuousQueryProcessingSimulationResults ...... ..... 98 8.3.1 ObjectUpdateCostinMOVNet ........... ..... 99 8.3.2 ConnectingVertexDistributioninMOVNet ..... ..... 100 8.3.3 PerformanceStudyofC-MNDR ........... ..... 101 8.3.4 PerformanceStudyofC-MKNN ........... ..... 110 Chapter 9: Conclusion and Future Work 120 Appendix A Distributed Continuous Range Query Processing on Moving Objects . . . 124 A.1 Introduction .... ........... ........... ..... 124 A.2 RelatedWork ... ........... ........... ..... 126 A.3 SystemDesignandComponents .... ........... ..... 127 A.3.1 SystemInfrastructureandAssumptions ....... ..... 127 A.3.2 ServerDesign........... ........... ..... 129 A.3.2.1 ServiceZones ...... ........... ..... 129 A.3.2.2 GridIndex ....... ........... ..... 131 A.3.3 QueryProcessingonMovingObjects......... ..... 135 A.4 ExperimentalEvaluation ........ ........... ..... 136 A.4.1 SimulatorImplementation.... ........... ..... 137 A.4.2 SimulationResults ........ ........... ..... 138 A.5 ConclusionsandFutureDirections ... ........... ..... 140 Reference 141 vi List of Tables 1.1 Summaryofsymbolicnotations .... ........... ..... 10 2.1 Propertiesofseveralqueryprocessingmethods ....... ..... 21 7.1 Running steps of an example of C-MKNN.......... ..... 76 8.1 Snapshotqueryprocessingsimulationparameters...... ..... 88 8.2 Continuousqueryprocessingsimulationparameters .... ..... 99 A.1 Messagetypesinqueryprocessing.... ........... ..... 135 vii List of Figures 1.1 The differences of kNN query results when applying with the network distancefromtheEuclideandistance. . ........... ..... 3 3.1 Anexampleofroadnetworkanditsmodelinggraph..... ..... 23 4.1 An example of road network and its modeling graph data structure . 28 4.2 An example network indexed by the grid index and its data storage. 29 4.3 Computing the length of edges with regard to the number of affected cells......... ........... ........... ..... 32 5.1 MobileNetworkDistanceRangequeryexample. ...... ..... 40 5.2 MobileNetworkDistancek-NNqueryexample. ....... ..... 44 6.1 the Relationship Between the Number of Overlapping Cells with re- gardtoaRangeQuery ......... ........... ..... 49 7.1 An example network indexed by a 2 ×2gridindex. .... ..... 55 7.2 AnexampleofthedatastructureofC-MNDR ....... ..... 55 7.3 The distances of vertices in an example of initial continuous range queryprocessing. . ........... ........... ..... 60 7.4 AnexampleoftheSD-tree ....... ........... ..... 64 viii 7.5 The update of SD-tree when the query point moves to an edge that isrecordedintheoriginalSD-tree. ... ........... ..... 66 7.6 The transformation of the SD-tree when q moves to an edge that is notrecordedintheoriginalSD-tree. .. ........... ..... 69 7.7 The update of SD-tree when the query point moves to an edge that isnotrecordedintheoriginalSD-tree.. ........... ..... 70 7.8 The distances of expanded vertices and the SD-tree for an example ofC-MKNN..... ........... ........... ..... 78 7.9 The change of the SD-tree with regard to object updates in C-MKNN. 82 8.1 TheCPUtimeofupdatecostasafunctionofPOIs .... ..... 89 8.2 The performance of MNDR as a function of the number of cells . . 90 8.3 TheperformanceofMNDRasafunctionofPOIs ..... ..... 91 8.4 TheperformanceofMNDRasafunctionofrange ..... ..... 92 8.5 TheCPUtimeimprovementofusingCorollary1 ..... ..... 93 8.6 The performance of MKNN as a function of the number of cells. . . 94 8.7 The performance of MKNN as a function of k. ....... ..... 95 8.8 TheperformanceofMKNNasafunctionofPOIs. .... ..... 96 8.9 The portion of CPU time used in Algorithm 1 and graph construction. 97 8.10 The portion of CPU time used in Algorithm 1 and graph construction. 98 8.11 The CPU time of the update cost in MOVNet compared to S-GRID 100 8.12 The distribution of connecting vertices as a function of the number ofcells. ...... ........... ........... ..... 101 ix 8.13 The performance of initial query result processing in C-MNDR as a functionofthenumberofcells. .... ........... ..... 102 8.14 The performance of initial query result processing in C-MNDR as a functionofPOIs .. ........... ........... ..... 103 8.15 The performance of initial query result processing in C-MNDR as a functionofqueryrange ......... ........... ..... 104 8.16 The cost of query updates in C-MNDR as a function of the number ofcells. ...... ........... ........... ..... 105 8.17 The performance of query updates in C-MNDR as a function of the numberofPOIs. . ........... ........... ..... 106 8.18 The performance of query updates in C-MNDR as a function of range. 107 8.19 The performance of query updates in C-MNDR as a function of num- berofqueries. ... ........... ........... ..... 108 8.20 The performance of query updates in C-MNDR as a function of the percentageofobjectupdates. ..... ........... ..... 109 8.21TheoverallperformanceofC-MNDR. . ........... ..... 109 8.22 The performance of initial query result processing in C-MKNN as a functionofthenumberofcells. .... ........... ..... 111 8.23 The performance of initial query result processing in C-MKNN as a functionofPOIs .. ........... ........... ..... 112 8.24 The performance of initial query result processing in C-MKNN as a function of k .... ........... ........... ..... 113 x 8.25 The performance of query updates in C-MKNN as a function of the numberofcells. . . ........... ........... ..... 114 8.26 The performance of query updates in C-MKNN as a function of the numberofPOIs. . ........... ........... ..... 115 8.27 The performance of query updates in C-MKNN as a function of k. 116 8.28 The performance of query updates in C-MKNN as a function of num- berofqueries. ... ........... ........... ..... 117 8.29 The performance of query updates in C-MKNN as a function of per- centageofobjectupdates. ....... ........... ..... 118 8.30TheoverallperformanceofC-MKNN. ........... ..... 119 9.1 Algorithms in MOVNet that are best fit for various system require- ments. ....... ........... ........... ..... 121 A.1 Thesysteminfrastructure........ ........... ..... 128 A.2 An example of the system with 7 servers and their service zone iden- tifier(SID)tree. .. ........... ........... ..... 129 A.3 Servicezonesandgridcells........ ........... ..... 132 A.4 Thegridindex. .. ........... ........... ..... 132 A.5 NumberofGridIndexEntriesAnalysis ........... ..... 133 A.6 The number of grid index entries as a function of the number of queries.138 A.7 The server and mobile communication cost as functions of the number ofqueries. ..... ........... ........... ..... 139 xi Abstract Recently, the usage of various GPS devices (e.g., car-based navigation systems and handheld PDAs) has become very popular in many - especially urban - areas. More and more people carry these devices on the go. Moreover, since wireless connectivity is embedded into these devices, the end users are able to report their positions when experiencing the service and hence themselves become Points Of Interest (POIs) during query processing. With the rapidly growing number of users willing to subscribe to various location-based services, designing novel systems that support a very large number of users is raising intense interest in the research community. Specifically, the mobility that is made possible by these portable GPS devices in metro cities results in two fundamental requirements when implementing location- based services: road network-based distance computation and the capability to process moving objects as POIs. It is highly desirable to design a novel infras- tructure that (i) efficiently manages object location updates and (ii) provides fast distance computation with regard to the connectivity of the underlying network. To the best of our knowledge, there have been few studies that address both challenges at the same time. xii In this dissertation, we first discuss the requirements and challenges of designing a scalable and flexible location-based service on moving POIs in metro cities. We propose a novel system to process location-based queries on MOVing objects in road Networks (MOVNet, for short). In its current form, MOVNet utilizes a dual- index structure in which an on-disk R-tree stores the network connectivity and an in-memory grid index maintains moving object position updates. In addition, a technique to speedily compute the overlapping grid cells in the network is proposed to relate these two indices. Moreover, given an arbitrary edge in the space, we analyze the minimum and maximum number of grid cells that are possibly affected, which can be effectively used to prune the search space during query processing. Based on these features, we first propose algorithms for snapshot network-distance- based range queries andk nearest neighbor queries to support mobile location-based services. Next, we extend the functionality of MOVNet to support continuous query processing by introducing the concept of a Shortest-Distance-based Tree (SD-Tree, for short). We illustrate that the network connectivity and distance information can be preserved and reused by the SD-Tree when the query point moves to a new location, hence reducing the update cost for continuous queries. We demonstrate via theoretical analysis and experimental results that MOVNet yields excellent performance in various networks and with a very large number of moving objects. xiii Chapter 1 Introduction 1.1 Motivation In recent years, sales of portable Global Positioning System (GPS) devices have been booming. According to the sales statistics report released by Canalys [Can07] in March 2007, there were over 2,867,820 units of portable GPS devices shipped in the United States in 2006, an increase of 269% over the year 2005. An example product of these latest portable GPS devices is the Garmin N¨ uvi [N¨ uv07] that can achieve an accuracy of less than 3 meters with the support of WAAS [Adm] and store the full coverage map of the United States with tens of thousands of stationary POIs (e.g., restaurants, gas stations, etc). Moreover, by mandated regulations of the Federal Communications Commission (FCC), all wireless carriers in the country must meet certain criteria for supporting location-based services [Com96]. The mandate requires 95% of handsets to be resolved within 300 meters for network-based tracking (e.g., wireless base station triangulation) and 150 meters for handset-based tracking (e.g., GPS triangulation). To meet the above criteria, some of the latest cell phones are integrated with GPS receivers. 1 An example of these products is the gpsOne chipset from Qualcomm [Qua], which allows the cell phone to relay the location of the user during a phone call. With the widespread use of these latest GPS devices as well as the emerging wire- less technologies, such as 3G and WiMAX [WiM] that provide wider and more stable communications, more and more people are willing to subscribe to location-based ser- vices. Various novel applications, such as road-side assistance, location-based games, and location-aware advertisements, are becoming popular in many – especially urban – areas. This has intensified research interests to overcome the inherent challenges in designing scalable and efficient infrastructures to support very large numbers of users concurrently. The mobility that is made possible by the usage of car-based or handheld GPS devices in metro cities results in two fundamental system requirements: distance computations based on a (road-)network and processing of moving POIs. Some early techniques on spatial data processing use the Euclidean distance compu- tation. However, the capability of network-based distance computation is appealing when deploying a location-based service in an area with a dense road network. It provides an improved precision in terms of the actual distance and hence the answers of the queries become more realistic. As an example to demonstrate the differences between network- based and Euclidean-based distance computation, let us assume object m 1 in Figure 1.1 wants to find its three nearest neighbors. The answer set is m 3 , m 4 ,and m 2 in increasing order of the Euclidean distance while it is m 4 , m 2 ,and m 3 in the increasing order of the network distance. Clearly, the Euclidean distance cannot capture the actual path length the network in the distance computation, hence network-based distance computation is required for many applications. 2 m 1 m 2 m 3 m 4 The closest neighbors of m 1 In the order of the Euclidean distance: m 3, m 4, m 2 In the order of the network distance: m 4, m 2, m 3 Figure 1.1: The differences of kNN query results when applying with the network distance from the Euclidean distance. Moreover, the integration of wireless communication capabilities with GPS devices enables end users or vehicles to report their positions on the go and hence themselves become POIs during query processing. Although some of the latest standalone GPS devices come with abundant internal memory to store tens of thousands of stationary POIs, managing a very large number of moving POIs is challenging due to the overhead of location updates. Therefore a centralized server is usually introduced to maintain the dynamic information and offer query processing capabilities to the end users. A number of recently proposed techniques incorporate POI mobility or network- distance processing, but often not both. Several techniques [CC05, KS04b, PZMT03] strive to solve location-based (Continuous) k Nearest Neighbor (kNN/C-kNN) queries in spatial networks. These methods assume that the positions of POIs are fixed (e.g., gas stations or bus stops). Some prior work has focused on providing the functionality for moving POI processing [MHP05, XMA05, YPK05]. Specifically, these techniques aim to continuously monitor a set of moving nearest neighbors. However, one of the limita- tions of these methods is their reliance on Euclidean distance measures. On the other 3 hand, there exist an increasing number of applications that require query processing of moving POIs based on an underlying network. For example, a visitor may desire to find the three nearest available taxis on the road and call for service. When a pedestrian calls for emergency assistance the call-center may want to locate all police cars within a five-mile distance and dispatch them to the call-originating location. To the best of our knowledge, there have been few studies on supporting location-based query processing on moving POIs in road networks. 1.2 Overview The main challenges when supporting POI mobility on an underlying road network are to (a) efficiently manage object location updates and (b) provide fast network-distance computations. To cope with these challenges, we propose our novel design of a location- based system on MOVing objects in road Networks (MOVNet). MOVNet aims to support various query processing (e.g., snapshot queries, continuous queries, range queries, and k nearest neighbor queries) with the capability of network distance computation on moving objects. MOVNet is a centralized solution in which a server is managing location updates of various objects and processing location-based query requests from end users. Moving POIs, such as pedestrians and cars, equipped with GPS devices report their locations to the server periodically via the cellular-based wireless network. These moving POIs periodically update their locations on the central server. Additionally, a number of the mobile users submit query requests that are based on their locations. To cope with these 4 queries efficiently, MOVNet combines an on-disk R*-tree [BKSS90] structure to store the connectivity information of the road network with an in-memory grid index to efficiently process moving object position updates. The inherent aspiration of using such a dual- index structure is based on the fact that the R-tree-based structures have been widely studied for efficiently handling large-size stationary spatial data sets and the grid index has been verified for suitably managing dynamic spatial data. A feature of MOVNet is the bi-directional mapping between the two indices that enables the retrieval of a minimal set of data for query processing. We illustrate this by using the bi-directional mapping in MOVNet, these two indices can be quickly related during query processing. Based on this infrastructure, we propose algorithms to execute snapshot range queries as well as snapshot kNN queries to illustrate the scalability and flexibility of MOVNet. Moreover, we introduce the concept of affected cells that form the set of grid cells overlap- ping with a given edge and we provide an efficient algorithm to compute these cells. We analyze the geometric relationship between the affected cells and edges of the network. We compute the minimum and maximum number of affected cells with an arbitrary edge in the network, which enables the pruning of the search space during the mobile range query processing. In the mobile kNN query algorithm, we utilize the concept of a pro- gressive probe into the grid index to estimate a subspace enclosing the result set, which results in a substantial improvement of the system throughput. The continuous query is one of the most complicated query types in location-based services due to its expensive consumption of memory and computational resources. How- ever, it provides a prolonged perspective on the change of object movements and hence it is well-suited for monitoring purposes. In the second part of the dissertation we extend 5 the functionality of MOVNet to support continuous queries. Specifically, we introduce the concept of connecting vertex in each grid cell. We demonstrate that the technique of pre-computing can be used in MOVNet so that a corresponding distance table can be created in each grid cell to speed up the range query processing. We also present our design of a Shortest-Distance-based tree (SD-tree, for short) that preserves the network connectivity and distance information in snapshot query processing. We propose a novel algorithm that rotates, truncates, and extends the edges of an SD-tree with regard to the query point movement. This algorithm facilitates the continuous query processing by avoiding to re-compute the network connectivity and distance information when the query point moves to a new position. Based on these techniques, we propose algorithms to process continuous range queries and continuous kNN queries. 1.3 Contributions The contributions of our design of MOVNet are: • To the best of our knowledge, MOVNet is the first dual-index design that utilizes an R-tree to store network connectivity and a grid index to support location updates from moving POIs. We demonstrate that these two indices can be seamlessly cor- related through a bi-directional mapping mechanism at the same time this method offers substantial flexibility and scalability. • We propose snapshot range query and k nearest neighbor query algorithms to illus- trate the functionality and performance of MOVNet. As comparisons, we devised two baseline algorithms which represent traditional techniques. The simulation 6 results consistently show that our algorithms outperform the baseline algorithms. Additionally, we compared our snapshot kNN query algorithm with an existing work, which showed that our design is more efficient in various settings. • We analyze the geometric properties of the minimum and maximum number of affected cells with an arbitrary edge. We demonstrate that the maximum bound can be effectively used to enhance the performance of the range query algorithm. Additionally, in the kNN query algorithm our technique of a progressive probe significantly improves the system throughput. • We introduce a novel data structure, SD-tree, to preserve the network connectivity and distance information. We analyze that under certain conditions, we can infer network connectivity and distance information even when the query point moves to a new position. We use this data structure to monitor the query affected space, and update the result space with regard to the query point movements. We illustrate that such a technique facilitates the continuous query processing. • We propose algorithms that support continuous query processing in MOVNet. We verify the performance of MOVNet on supporting continuous queries through vig- orous experiments. We illustrate that MOVNet is very efficient in continuous query processing with a large number of moving POIs in metro road networks. Our design of MOVNet infrastructure can be used in many applications, such as: Cellular applications MOVNet can manage a large number of moving POIs. With the capability of network-based distance computation, a large number of mobile users in the metro city may enjoy the services at the same time. 7 Fleet management Many users want to find available taxis when walking on the road. MOVNet supports the service center to receive these requests, locate nearest avail- able taxis from the caller’s location and dispatch one of them to the caller with a very short system latency. Enhanced-911 MOVNet supports the end user to call for emergency service. The 911 center is able to quickly locate the position of the caller and keep track of the position during the phone call. Based on this information, the 911 center is able to dispatch rescue teams to the call-originating location. Additionally, the 911 center is able to monitor the caller and rescue teams’s spatial positions, and perform quick responses to optimize the emergency handling. 1.4 Outline The remainder of this dissertation is organized as follows. In Chapter 2 we review the related work. Chapter 3 discusses our assumptions, and problem statement. In the fol- lowing Chapter 4 we describe our design of a dual-index structure, algorithms that relate the on-disk R-tree structure and in-memory grid index during query processing, and the minimum and maximum bound of the number of overlapping cells with an arbitrary edge in the space. Based on our dual-index design, we propose algorithms to process snapshot mobile network distance range queries and k nearest neighbor queries in Chapter 5. In Chapter 6 we discuss the number of overlapping grid cells with regard to an arbitrary range query in the space. Moreover, we study the complexity of our snapshot range query and k nearest neighbor query algorithms. We extends the functionality of MOVNet to 8 support continuous query processing in Chapter 7. We vigorously verify the performance of MOVNet and illustrate the simulation results in Chapter 8. We present our future research direction for MOVNet in Chapter 9. 1.5 Terminology Definitions Before we proceed, we introduce useful symbolic notations for the purpose of mathemat- ical clarity. Table 1.1 presents all the symbols used in this thesis and their meanings briefly. 9 Table 1.1: Summary of symbolic notations Symbol Meaning G a directional weighted graph representing a road network v a vertex in the network V the set of vertices in the network e(v 1 ,v 2 ) an edge in the network E the number of edges in the network E the set of edges in the network m a moving object in the network M the number of moving objects in the network M the set of moving objects in the network length(e) the length of edge e loc t (m) the location of object m at time t loc(m) the location of object m at current time (x m ,y m ) the coordinate of object m q the query point t atime stamp Δt the time interval of location updates dist t (m 1 ,m 2 ) the network distance from m 1 to m 2 at time t dist(m 1 ,m 2 ) the network distance from m 1 to m 2 at current time dist t (e,m 1 ) the network distance from the starting vertex of e to m 1 at time t dist(e,m 1 ) the network distance from the starting vertex of e to m 1 at current time l the side length of a grid cell n the number of cells overlapping with an edge c(column,row) the column and row index of a grid cell d the range of a range query L i the ith level of grid cells in the progressive probe 10 Chapter 2 Background and Related Work 2.1 Global Positioning System (GPS) The invention of GPS has had a huge influence on modern navigation systems. GPS was developed by the U.S. Department of Defense in the mid-1980s. Since it became fully functional in 1994, GPS has been acting as the backbone of modern navigation systems around the world. The GPS consists of a constellation of 24 satellites in circular orbits at an altitude of 20,200 kilometers [Lan]. Each satellite circles the Earth twice a day. Furthermore, there are six orbital planes with four satellites in each plane. The orbits were designed so that at least four satellites are always within line of sight from most places on the earth. Each GPS satellite repeatedly broadcasts radio signals traveling by line-of-sight, meaning that they will pass through air but will not penetrate most solid objects. Specifically, GPS signals contain three pieces of information [HWLC94]: a pseudo random sequence, ephemeris data, and almanac data. The pseudo random sequence identifies which satellite is transmitting the signal. Ephemeris data allows the GPS receiver to determine the 11 location of GPS satellites at any time throughout the day. Almanac data consists of information about the satellite status and current time from the on-board atomic clock of the satellite. The GPS receiver calculates its location based on GPS signals using the principle of trilateration [Ken02]. First, the GPS receiver calculates its distance to a GPS satellite based on the timing signal transmission delay from the satellite to the receiver multiplied by the speed of radio signals. After measuring its distance to at least four satellites, the GPS receiver calculates its current position at the intersection of four abstract spheres, one around each satellite, with a radius of the distance from the satellite to the GPS receiver. The accuracy of GPS can be affected by the atmospheric conditions (e.g., Ionosphere, Troposphere) as well as reflections of the radio signal off the ground and the surround- ing structures close to a GPS receiver. The normal GPS accuracy is about 30 meters horizontally and 52 meters vertically at the 95% probability level. To improve the accu- racy of GPS,theWideArea Augmentation System(WAAS) [LWE + 95] has been widely embedded in GPS devices recently. WAAS uses 25 ground reference stations across the United States to receive GPS signals and calculate correction messages. The correction messages are uploaded to a geosynchronous satellite and then broadcast from the satellite on the same frequency as GPS to the receivers. The current WAAS system only works for North America as of 2006. Most positioning signal receivers are designed for the use with the GPS system. These devices have been manufactured in a wide variety of form factors for different purposes, 12 from devices integrated into cars, PDAs, and phones, to dedicated devices such as hand- held GPS receivers. The most popular variants are used in car-based navigation systems that visualize the position information calculated from GPS signals to locate an auto- mobile on a road retrieved from a map database. In these car-based systems, the map database usually consists of vector information of some area of interest. Streets and points of interest are encoded and stored as geographic coordinates. The client is able to find some desired places through searching by address, name, or geographic coordinates. The map database is usually stored on some removable media, such as a CD or flash memory. A common approach is to have a base map permanently stored in the ROM of GPS devices. Additional detailed information of areas of interest can be downloaded from a CD or online by the user in the future. 2.2 Euclidean-distance-based Spatial Query Processing on Stationary POIs Spatial query processing has been extensively studied and widely used in various applica- tions. The research community early on was focusing on processing spatial queries in the Euclidean-distance space on stationary POIs. Several query types, such as kNN queries and range queries, have been comprehensively studied. In terms of kNN query processing, R-tree and its variants [Gut84, BKSS90, SRF87] are prevalent as the index structures. For example, branch-and-bound depth-first algorithms(e.g., [FN75, RKV95]) were pro- posed to find nearest neighbors of a query point. In contrast, Hjaltason et al. [HS99] proposed an approach to reduce the number of visits on R-tree nodes by using a best-first 13 manner. Moreover, Korn et al. [KSF + 96] presented a multi-step kNN algorithm. As an improvement, Seidl et al. [SK98] proposed an optimal version by using an incremental approach on ranking queries on the index structure. As for range (window) queries, the R-tree family, as well as the Quadtree family [AA01, Cla83, FB74] can support efficient query processing. In general, a minimum bounding rectangular (MBR) is defined as the query window, and a tree traversal is performed. The query processing only visits tree nodes that overlap with the query window. In the mobile environment, the users are able to submit location-based queries (i.e., the coordinates of the query point is the current location of the user). Therefore, the mobility of the query point first become an issue that also enables continuous query processing. Sistla et al [SWCD97] first addressed the importance of processing this type of the queries, however there was no applicable approach was proposed. In deed, the first technique on processing location-based queries on stationary POIs in Euclidean distances was proposed by Zheng et al. [ZL01]. In this work, Voronoi regions [OBSC00] of the POIs are pre-computed and stored in an R-tree. When a user submits a nearest neighbor query, the Voronoi diagram is used to compute the nearest neighbor. In addition, the server computes an conservative approximation of the validity time of the result assuming a maximum speed of the query point. In other words, the estimated validity time predicts when the query point will cross the closest boundary of the Voronoi region. This work only deals with 1st NN queries, as generating k-order Voronoi diagrams are very complicated, if not possible. In contrast, Song et al. [SR01] utilized the property that if the server returns mNN objects, withm>k,fromthequerypoint,the kNN result remains in this m objects when the query point is moving within a constrained area. This work can be 14 applied to any arbitrary number of k, though a good estimate of m is difficult to find in practice. Lately, Tao et al. [TPS02] addressed the issue of continuous nearest neighbor search. By assuming that the query point is constantly moving on a line segment, this work generates the set of split points along the segment that causes the query result changed. Consequently, it generates the kNN result for each sub-segment bounded by the split points. As an improvement, the time-parameterized queries [TP02] only assume a steady velocity of the query point while providing an expiry time and the location that causes the expiration of the current result. However, this technique is still based on the assumption that the future location can be predicted by using the current movement of the query point, which may not be realistic for many applications. To solve this problem, Zhang et al. [ZZP + 03] proposed a framework aimed to solve kNN as well as range queries without assuming the movement of the query point. Specifically, this technique calculates the validity region and the influence set of the query at current time and returns back to the query point. It is up to the query point to decide when to re-submit the query request based on its new location. 2.3 Network-distance-based Spatial Query Processing on Stationary POIs The issue of processing spatial queries on stationary POIs with the network distances has been intensively studied in recent years. Papadias et al. [PZMT03] first presented 15 an architecture with disk-based data storage that integrates network and Euclidean in- formation in processing network-based queries. Specifically, their technique is based on the ideas of the Euclidean restriction and the network expansion. The concept of the Euclidean restriction is that for two arbitrary objects in a network, the network distance is at least the same as the Euclidean distance between them, which can be regarded as a lower bound in the distance computation. In contrast, the network expansion method performs the queries directly from the query point by expanding the nearby vertices in the order of the distances from the query point. Consequently, they proposed algorithms for kNN and range query processing in each of the two manners. As an improvement, the VN 3 approach [KS04b] was proposed as a Voronoi-based approach that partitions a large network into a set of small Voronoi regions. The goal was to avoid online distance computation in processing kNN queries by pre-computing the distances within and across Voronoi regions. Moreover, Huang et al. [HJS05] addressed the same problem by intro- ducing the islands approach that estimates the overhead of pre-computation and the trade-off between query and update performance for kNN queries with varying densities of POIs and networks. As another example, Sankaranarayanan et al. [SAS05] proposed a technique that decomposes the network space and index with a Quadtree to compute kNN objects. Recently, Samet et al. [SSA08] proposed a novel technique that is based on pre-computing and distance encoding. Specifically, the shortest paths between all pos- sible vertices in the network are collected and the query processing for kNN objects is simplified to search in encoded subspace. To cope with Continuous kNN (C-kNN) queries on stationary POIs in the network, Kolahdouzan et al. [KS04a] proposed the Intersection Examination and Upper Bound 16 Algorithm (IE/UBA), which can be regarded as the counterpart of VN 3 in C-kNN query processing, to compute the kNN objects of all nodes on the path and the split points between adjacent nodes whose nearest neighbors are different. Lately, Cho et al. [CC05] solved the same problem by introducing UNICONS that incorporates the precomputed kNN lists into Dijkstra’s algorithm. The simulation results showed that it outperforms the IE/UBA approach when dealing with dense networks. Another interesting study was presented by Shekhar et al. [SY03] addressing the issue of finding the in-route nearest neighbor so that the detour from the original route on the way to the destination remains the smallest. The results showed that the pre-computing approach that partitions the space into a group of Voronoi-based zones centered at the POIs is very efficient with regard to the map size and the route length. In short, these works hold the assumption that the POIs are static. Therefore, the idea of pre-computing distances between POIs and vertices is widely used and proved to be efficient. On the other hand, these works cannot be applied to a dynamic environment where the POIs are constantly moving. 2.4 Index Design on Moving POIs A large number of the spatial applications require the capability to process moving POIs. This requirement raises the issue of managing location updates of moving POIs in an in- dex structure. Although tree-based index (e.g., R-tree and its variants [Gut84, BKSS90, SRF87]) structures have been widely used in managing stationary spatial data, they 17 suffer from expensive overhead of node reconstruction when dealing with location up- dates. To overcome this challenge, using the trajectory of moving POIs to presume the movement of objects has been used in R-tree-based structures (e.g.,the TPR-Tree and its variants [TPS03, SJLL00]), B-tree-based structures (e.g., the B x tree [JLO04]), and Qualtree-based structures [TUW98]. In general, these works assume that the movement of POIs can be represented as a linear function of time. Changing the velocity vector of moving POIs consequently invokes the update of the movement function. As an alter- native, STRIPE [PCC04] introduces the idea of transforming the trajectories of objects in D-dimensional space into points in 2-D space. However, the assumption of being able to predict the trajectories of moving objects is not always realistic. If the prediction of the object movements fails (e.g., pedestrian strolling in a shopping mall), these ap- proaches are inappropriate. Hence the Lazy-Update R-tree (LUR-tree) [KLL02] and its extension [LHJ + 03] modify the original R-tree design to support frequent updates with no restriction on object movement. Alternatively, by assuming moving objects in fixed networks, Frentzos proposed the Fixed Network R-tree (FNR-tree) [Fre03], which has a 2-tier R-tree infrastructure, where a forest of 1-D R-tree indexing the moving objects in road segments on top of a 2-D R-tree indexing the network structure. Recently, grid-based index structures raised intensive interest due to its simplicity and efficiency in managing moving objects. For instance, Xiong et al. proposed LU- Grid [XMA06], an update-tolerant on-disk grid index, that outperforms the LUR-tree in terms of update and query costs. Based on this fact, most of the recent works leverage either an in-memory grid index [CAA03, GL04, MHP05, YPK05] or an on-disk grid in- dex [XMA05] in spatio-temporal processing. Another notable technique was proposed by 18 Hu et al. [HLL06], who introduced the concept of distance signature that specifically focuses on indexing the network distance between objects by partitioning the distances into categories to prune the search space. However, this work assumes a network with long edges, which does not apply to our data sets. Consequently, MOVNet utilizes an in-memory grid index to manage the location updates of moving POIs. 2.5 Euclidean-distance-based Spatial Query Processing on Moving POIs There are many research works addressed the issue of spatial query processing on mov- ing POIs with Euclidean distances. Most of these works are backed by an grid index in maintaining the position updates. For instance, Chon et al. [CAA03] first presented an algorithm based on the trajectory of moving POIs overlapping the grid cells to solve snapshot range queries and kNN queries. By introducing the computation capabilities on the mobile client side, MobiEyes [GL04] proposed a distributed infrastructure to pro- cess mobile range queries on dynamic POIs. Similarly, Hu et al. [HXL05] proposed a generic framework to handle continuous queries by introducing the concept of safe region through which the location updates from the mobile clients can be further reduced. In contrast, SINA [MXA04] and SEA-CNN [XMA05] were introduced as centralized solu- tions with the idea of shared execution to process continuous range and kNN queries on moving POIs. Yu et al. [YPK05] proposed an algorithm (referred as YPK-CNN) for monitoring C-kNN queries on moving objects by defining a search region based on the maximum distance between the query point and the current locations of previous kNNs. 19 As an enhancement, Mouratidis et al. [MHP05] presented a solution (CPM) for C-kNN queries that defines a conceptual partitioning of the space by organizing grid cells into rectangles. Location updates are handled only when objects fall within the vicinity of queries hence to improve the system throughput. However, the above techniques do no consider network distance computation, which makes them unsuitable for applications where the network-based distance computation is required. 2.6 Network-distance-based Spatial Query Processing on Moving POIs For environments where POIs are dynamic and distances are based on network paths only a few techniques exist. Jensen et al. [JKPT03] addressed the challenge of query processing on moving POIs in a network. Specifically, this work described an abstract infrastructure on handling location updates of moving POIs in a network and proposed a kNN query algorithm. This work is fundamentally different from MOVNet due to its system assump- tions. MOVNet adopts a centralized infrastructure with periodic location updates from moving POIs, while their method assumes that the mobile client is willing to participate in the kNN query processing. As a centralized alternative, S-GRID [HJLS07] was in- troduced as a means to process kNN queries. A pre-computed structure is maintained with regard to the spatial network data such as to improve the efficiency of query pro- cessing. Moreover, Mouratidis et al. [MYPM06] addressed the issue of processing C-kNN queries in road networks by proposing two algorithms (namely, IMA/GMA) that handle arbitrary object and query movement patterns in a road network. This work utilizes an 20 in-memory data structure to store the network connectivity, therefore it is undesirable to use it for large-size networks (e.g, metro cities) due to the memory requirements. Instead, MOVNet uses an on-disk R-tree structure that has already been intensively studied for large-size 2-D data usage. Finally in brief, Table 2.1 lists the properties of query processing methods described above compared to MOVNet. Method Query Distance POI type Index VN 3 kNN Network static R-tree Islands k-NN Network static R-tree IE/UBA C-kNN Network static R-tree UNICONS C-kNN Network static R-tree MQM C-range Euclidean moving Grid MobiEyes C-range Euclidean moving Grid YPK-CNN C-kNN Euclidean moving Grid SEA-CNN C-kNN Euclidean moving Grid CPM C-kNN Euclidean moving Grid S-Grid kNN Network moving Grid IMA/GMA C-kNN Network moving Qual-tree MOVNet kNN, range Network moving Grid, R-tree Table 2.1: Properties of several query processing methods 21 Chapter 3 System Assumptions and Problem Statements In this chapter, we describe our data modeling of the road network, system assumptions and problem statements. 3.1 Network Modeling and Assumptions MOVNet focuses on processing moving objects in a road network; we define the road network as follows: Definition 1. A road network (or network for short) is a directional weighted graph G consisting of a set of edges (i.e., road segments)E, and a set of vertices (i.e., intersections, dead ends)V,whereE ⊆V ×V. Definition 2. For any network G(E,V), each edge e is represented as e(v 1 ,v 2 ),which means it is connected to two vertices v 1 , v 2 ,where v 1 and v 2 are the starting and ending vertex, respectively. Let v 1 = v 2 . Each edge e is associated with a length, given by a function length(e) : E→R + ,whereR + is the set of positive real numbers. 22 (a) (b) Figure 3.1: An example of road network and its modeling graph. A network in MOVNet is transformed into a modeling graph in memory. Specifically, graph vertices represent the following three cases: (i) the intersections of the network, (ii) the dead end of a road segment, and (iii) the points where the curvature of the road segment exceeds a certain threshold so that the road segment is split into two to preserve the curvature property. Although polylines can also be used to represent the edges, we use a set of line segments to represent an edge due to the nature of our data set. As a result, the modeling graph is a piecewise approximation of the network. Figure 3.1(a) shows an example road network, and Figure 3.1(b) demonstrates the corresponding modeling graph. There are different objects (e.g., cars, taxis, and pedestrians) moving along the road segments in a network. These objects are identified as the set of moving objectsM.We assume that these objects are always located on the edges at any time. A moving object m is a POI located in the network. For simplicity, if the distance between an object m and its closest edge e is ≤ δ,where δ is a threshold, we assume that m is on e (Although 23 other existing map matching techniques can be used). The location of m at time t is defined as loc t (m)=(x m , y m ), where x m and y m are the x and y coordinates of m at time t, respectively. A query point q is a moving object ⊂ M issuing a location-based spatial query at any time. Note that queries are processed with network distances in MOVNet. For simplicity, we use the term distance to refer the network distance in the following sections unless explicitly denoted as in different metrics. MOVNet assumes that periodical sampling of the coordinates of moving objects is used to represent their locations as a function of time. This method is also used in [XMA05, YPK05]. It provides a good approximation on the positions of moving objects. Our primary goal is to reduce the evaluation cost during query processing. A spatial query submitted by the user at time t 1 is computed based on loc t 0 (M), where the system has the last snapshot of moving objects at t 0 with t 0 ≤ t 1 , t 1 − t 0 < Δt,Δt is a fixed time interval, and the result is valid until t 0 +Δt. We define the distance between objects and vertices in the network as follows: Definition 3. The distance function of two moving objects m 1 and m 2 at time t is dist t (m 1 ,m 2 ): loc t (m 1 ) × loc t (m 2 ) →R + . dist t (m 1 ,m 2 ) denotes the shortest path from m 1 to m 2 in the metric of the network distance at time t. For simplicity, we denote dist(m 1 ,m 2 ) as the distance function of m 1 and m 2 at the current time. Definition 4. The distance function of an edge e(v 1 ,v 2 ) and a moving object m at time t is defined as dist t (e,m): loc(v 1 ) × loc t (m) →R + . dist t (e,m) denotes the shortest path from v 1 to m in the metric of the network distance at time t. For simplicity, we denote dist(e,m) as the distance function of e(v 1 ,v 2 ) and m at current time. 24 3.2 Problem Statements MOVNet supports the processing of location-based network-distance-based queries. By using the definitions introduced in Section 3.1, we formalize these queries as follows: • Assume a query point q,a value d,a network G and a set of moving objects M, a snapshot location-based network distance range query retrieves all POIs of M that are within the distance d from q at time t. The query can be represented as rangeQuery t (q,d): loc t (q) × loc t (M)→{m i , i = 1, ..., n}, ∀ m i , dist t (q,m i ) ≤ d. • Assume a query point q,a value d,a network G and a set of moving objects M, a continuous network-distance-based range query retrieves all POIs ofM that are within the distance d from q in a time period [t 1 ,t 2 ]. The query can be represented as continuousRangeQuery t (q,d): loc t (q) × loc t (M)→{m i , i = 1, ..., n}, ∀ m i , dist t (q,m i ) ≤ d, t ∈ [t 1 ,t 2 ]. • Given a query point q,a value k,anetwork G and a set of moving objects M,a snapshot network-distance-based k nearest neighbor query retrieves the k objects of M that are closest to q according to the network distance at time t. Formally, the query can be represented as kNNQuery t (q,k): loc t (q) × loc t (M)→{m i , i = 1, ..., k},where ∀ m j = {M - m i }, dist t (q,m j ) ≥ dist t (q,m i ). • Given a query point q,a value k,anetwork G and a set of moving objects M,a continuous network-distance-based k nearest neighbor query retrieves the k objects ofM that are closest to q according to the network distance in a time period [t 1 ,t 2 ]. 25 Formally, the query can be represented as kNNQuery t (q,k): loc t (q) × loc t (M) → {m i , i = 1, ..., k},where∀ m j ={M - m i }, dist t (q,m j )≥ dist t (q,m i ), t∈ [t 1 ,t 2 ]. 26 Chapter 4 Index Design In this chapter, we first describe our design of the dual-index structure in MOVNet. After that, we analyze the minimum and maximum bounds on the number of grid cells that can overlap with an arbitrary edge. We illustrate that the maximum bound can be used to prune the search space during the snapshot range query processing in Section 5.1. 4.1 Dual-index Structure Design The distance between two moving objects highly depends on the length of edges and the connectivity of vertices, as well as current locations of the objects. To efficiently manage both stationary network connectivity and dynamic object position updates, MOVNet utilizes a dual-index structure. First, an on-disk R-tree stores the stationary network data. Second, an in-memory grid index supports the positions of moving objects. Since MOVNet has two indices for managing the data, it is important to design efficient means to relate these data during the query processing. We propose an incremental cell over- lapping algorithm that quickly computes the overlapping cells given an arbitrary edge. We elaborate on these features as follows. 27 V 6 V 1 V 3 V 4 V 5 V 8 3.5 1 4.0 2 2.6 2.7 2 V 2 V 7 3.8 3.5 V 1 V 2 V 8 2.6 V 1 2.6 V 3 2.0 V 4 3.5 V 7 V 2 V 3 V 4 V 5 V 6 V 8 2.7 V 2 2.0 V 4 1.0 V 2 3.5 V 3 2.0 V 5 1.0 V 6 4.0 V 4 4.0 V 7 3.8 V 8 3.5 V 7 3.5 V 1 2.7 Vertex List Hashing Buckets Edge List Vertex Coordinate Figure 4.1: An example of road network and its modeling graph data structure MOVNet utilizes an on-disk R*-tree, which has been intensively studied for handling very large 2-D spatial data, to record the connectivity and coordinates of vertices in a stationary network. The edges of the network are stored as MBRs bounded by their vertices. Once the edges are retrieved from disk, a corresponding modeling graph is constructed in memory using the following structure. We use a vertex array to store the coordinates of vertices in the graph. For each vertex, the array maintains a list recording its outgoing edges. To quickly locate a vertex in the array, MOVNet manages a hash table to map the coordinate of a vertex into its index in the vertex array. Figure 4.1 illustrates an example modeling graph of the network and its corresponding data representations. A memory-based grid index is used to store the locations of moving objects [YPK05]. Without loss of generality, we assume the service space is a square. We can partition the space into a regular grid of cells with a size of l × l.Weuse c(column,row)to denote a specific cell in the grid index (assuming the cells are ordered from the bottom left corner of the space). At time t, a moving object m has loc t (m)=(x m , y m ), therefore 28 01 0 1 2 3 4 5 6 7 V 6 V 1 V 3 V 4 V 5 V 8 V 2 3456 2 7 V 7 C(5, 5) Object List Object Array (x, y) 2.6 2 2 3.5 4.0 3.8 3.5 2.7 Figure 4.2: An example network indexed by the grid index and its data storage. it overlaps with cell c( xm l , ym l ). Each cell maintains an object list containing the identifiers of enclosed objects. The objects’ coordinates are stored in an object array, and the object identifier is the index into this array. Figure 4.2 shows a part of the network of Figure 3.1(b) that is managed by a grid index with 8 × 8 cells. An example object on e(v 2 ,v 4 )is enclosed by c(5,5). Accordingly, the object list of c(5,5) records the object identifier and hence we can retrieve the coordinate of the object from the object array. Cells that overlap with underlying road segments contain the actual data of moving object positions. In general, if the service space is an urban area with a dense network, moving objects can be regarded as uniformly distributed in the grid with a certain gran- ularity. In the extreme case, where the moving objects are highly skewed, we can use the hierarchical object index [YPK05] to achieve better load balancing. Given a set of grid cells, retrieving the underlying network can be performed by range queries into the R-tree structure. It is highly desirable to have an algorithm so that for an arbitrary edge, we are able to find overlapping cells very quickly. Although this is similar to the line rasterization algorithm (e.g., Bresenham’s Algorithm [Bre65]), it is noteworthy to point out that these existing algorithms only obtain an approximation of 29 the overlapping cells (or pixels, in that case). In contrast, our goal is to compute the complete set of overlapping cells. For this purpose, we devise an incremental algorithm. Before we describe our design of the cell overlapping algorithm, we define the concept of affected cells as follows: Definition 5. Assume that the service space is managed by a grid-based index. We define the set of cells {c 1 , c 2 , .... c n }, where an edge e(v 1 ,v 2 ) consecutively overlaps from v 1 to v 2 , as the set of affected cells of e. For instance, in Figure 4.2, the affected cells of e(v 1 ,v 2 )are {c(1,6), c(2,6), c(3,6), c(4,6)}. Given an edge e(v 1 ,v 2 ), the coordinates of vertex v 1 and v 2 are (x v1 , y v1 )and (x v2 , y v2 ), respectively. The set of affected cells of e can be computed with Algorithm 1. We use straight line segments to represent edges in the network. Therefore, for any edge e(v 1 ,v 2 ) in the network, it can be described by a first degree polynomial function in the form of y = m · x + b with x ∈ [x v1 , x v2 ]. Algorithm 1 first captures the gradient m and the y-intercept b of an edge (Line 3). After that, it computes the cells overlapping with the starting and ending vertex of the edge, respectively (Lines 4 - 5). The algorithm follows a step-forward approach where in each step, it moves one cell on the x-axis from the cell overlapping with the starting vertex and calculates the affected cells along the y-axis (Lines 7 - 18). Finally, it terminates once it reaches the cell overlapping with the ending vertex (Lines 19 - 21). The complexity of Algorithm 1 is linear in the length of the edge. Our experimental results show that the CPU time used for computing overlapping cells consumes less than 30 Algorithm 1 Compute-affectedcells (e, c) 1: /* e is the edge */ 2: /* l is the side length of a cell */ 3: m = y v2 −y v1 x v2 −x v1 , b = y v1 - m · x v1 4: startX = x v1 l , startY = y v1 l 5: endX = x v2 l , endY = y v2 l 6: cellList = φ 7: while startX = endX do 8: if endX > startX then 9: nextX = startX +1 10: else 11: nextX = startX -1 12: end if 13: nextY = m×nextX×l+b l 14: for i = startY to nextY do 15: cellList = cellList ∪ c(startX, i) 16: end for 17: startX = nextX, startY = nextY 18: end while 19: for i = startY to endY do 20: cellList = cellList ∪ c(endX, i) 21: end for 22: return cellList 5% of the query processing time with various settings. This indicates that our method is well suited for online computing. More importantly, by introducing Algorithm 1, MOVNet creates a means to bi-directionally map underlying networks and moving object positions. We design our query processing algorithm in the following chapters showing the flexibility and scalability of this dual-index approach. 4.2 The Minimum and Maximum Bounds on the Number of Cells Overlapping with an Edge Although the network is indexed by an R*-tree, once the edges are retrieved from the disk, a corresponding modeling graph is maintained in memory. Meanwhile, cells of the 31 01 0 1 2 3 3 2 x y e x1 e y1 e y2 e x2 e x3 e y3 e y4 e x0 e y0 l l Figure 4.3: Computing the length of edges with regard to the number of affected cells grid index overlap with the graph in the same service space. We present an important geometric property with regard to an arbitrary edge overlapping with grid cells. Given an edge that is represented as a straight line segment in the space, the rela- tionship between the length of an edge e(v 1 ,v 2 ) and the number of its affected cells can be described as follows. Lemma 1. Assume that the service space is managed by a grid-based index with a cell size of l × l. For an edge e(v 1 ,v 2 ) with a set of affected cells {c 1 , c 2 , .... c n }, the maximum length of e is √ 2 × l × n. The minimum length of e is ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩01 ≤ n ≤ 2 n−3 2 2 + n−2 2 2 ·ln ≥ 3 Proof. Without loss of generality let us consider an edge e in the service space that overlaps with grid cells as shown in Figure 4.3. Assume the number of affected cells for e is n. Therefore, for 0 ≤ e xi ≤ l,0 ≤ e yi ≤ l,wehave 32 length(e)= ( n−1 i=0 e xi ) 2 +( n−1 i=0 e yi ) 2 (4.1) We observe that, when e xi = e yi = l,where 0 ≤ i ≤ n− 1, we have the maximum length of e when substituting e xi and e yi in (1) length max (e)= n 2 · l 2 + n 2 · l 2 = √ 2· n· l To compute the minimum length of e, we observe from Figure 4.3 that e y1 + e y2 = e x2 + e x3 = l, and so on, which can be summarized as ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ e x(2j) + e x(2j+1) = l 1≤ j ≤ n−3 2 e y(2k−1) + e y(2k) = l 1≤ k≤ n−2 2 (4.2) When n = 1, the minimum length of e = e 2 x0 + e 2 y0 =0, where e x0 = e y0 =0. Similarly, when n = 2, the minimum length of e =0,where e x0 = e y0 = e x1 = e y1 =0. When n is ≥ 3 and odd, we have ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ ( n−1 i=0 e xi ) 2 =(e x0 + e x1 + n−3 2 j=1 (e x(2j) + e x(2j+1) )+ e x(n−1) ) 2 ( n−1 i=0 e yi ) 2 =(e y0 + n−2 2 j=1 (e y(2j−1) + e y(2j) )+ e y(n−2) + e y(n−1) ) 2 Using the properties in (2), the above equations can be transformed into ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ ( n−1 i=0 e xi ) 2 =(e x0 + e x1 +( n−3 2 )· l + e x(n−1) ) 2 ( n−1 i=0 e yi ) 2 =(e y0 +( n−2 2 )· l)+ e y(n−2) + e y(n−1) ) 2 33 Substituting the corresponding parts of (1) with the above equations, we can conclude that, if e x0 = e x1 = e x(n−1) = e y0 = e y(n−2) = e y(n−1) =0, length min (e)= n− 3 2 2 + n− 2 2 2 · l Similarly, when n ≥ 3 and even, we have ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ ( n−1 i=0 e xi ) 2 =(e x0 + e x1 + n−3 2 j=1 (e x(2j) + e x(2j+1) )+ e x(n−2) + e x(n−1) ) 2 ( n−1 i=0 e yi ) 2 =(e y0 + n−2 2 j=1 (e y(2j−1) + e y(2j) )+ e y(n−1) ) 2 Using the same properties as shown in (2), we can conclude that length min (e)= n−3 2 2 + n−2 2 2 · l, where n ≥ 3 and even. We have proved that when n ≥ 3, in both even and odd cases, length min (e)= n−3 2 2 + n−2 2 2 · l. Lemma 1 states the minimum and maximum bounds of the length of an edge given a fixed number of cells. We further deduce from Lemma 1 the maximum and minimum number of affected cells with regard to an arbitrary edge. Corollary 1. Assume that the service space is managed by a grid-based index with a cell size of l × l. For an edge e(v 1 ,v 2 ), the maximum and minimum number of affected cells are √ 2·length(e) l +3,and length(e) √ 2·l , respectively. 34 Proof. Lemma 1 tells us that, given an edge e(v 1 ,v 2 ), length(e) ≤ √ 2 · l · n, hence we can directly deduce that n≥ length(e) √ 2·l . Similarly, since length(e) ≥ n−3 2 2 + n−2 2 2 · l, hence length(e) ≥ 2· n−3 2 2 · c, which leads us to conclude that n≤ √ 2·length(e) l +3. To summarize, Corollary 1 provides a precise range on how edges overlap with grid cells. As we shall see, this property can be incorporated in the snapshot range query processing to prune the search space. 35 Chapter 5 Snapshot Query Processing In this chapter, we present our design of snapshot query algorithms based on our dual- index structure. We begin with the snapshot range query processing. Additionally, we illustrate that the maximum bound of the number of cells that overlap with an arbitrary edge as described in Section 4.2 can be used to prune the search space during the query processing. Finally, we propose a snapshot kNN query algorithm by introducing the concept of a progressive probe and leveraging our range query algorithm. 5.1 Snapshot Range Query Algorithm We propose a Mobile Network Distance Range query algorithm (MNDR) that supports the processing of snapshot location-based network-distance-based range queries. It exe- cutes the following steps. First, we know from Euclidean distance restriction [PZMT03] that for an object m in a network, dist(q,m) is always larger than or equal to the Euclidean distance of q to m. Therefore, if we perform a Euclidean range query with q as the center and d as the radius, the enclosed sub-network G resulting from the query contains all moving objects 36 within the distance d from q. The advantage of performing this operation as the first step in MNDR results from the fact that only the network data in MOVNet is stored on disk. Once these data are retrieved from the R-tree and the corresponding modeling graph is created, we are able to perform the latter steps in memory. As a comparison, we implemented a baseline algorithm in the simulation based on the idea of network expansion [PZMT03]. The results show that our approach performs better with a wide range of settings. Second, the starting vertex of an edge e(v 1 ,v 2 ) has the property that if dist(q,v 1 ) >d, the affected cells of the edge are not required to be examined because any moving object on e has a distance greater than d from q. Hence for each vertex in the modeling graph from the first step, MNDR leverages Dijkstra’s algorithm [Dij59] to compute the distance from q. In addition, our algorithm avoids unnecessary processing on any edge with a distance from the query point greater than d because no object on these edges will be in the result set. Finally, for each edge whose starting vertex has a distance ≤ d, MNDR generates the list of affected cells by using Algorithm 1 and retrieves the corresponding moving objects from the grid index. Algorithm 2 details MNDR. When MOVNet receives a range query request from a query object q, it first executes a Euclidean distance range query based on the position of q and the range d, see Line 5. After the set of edges and vertices is retrieved, a correl- ative modeling graph is built in memory (Line 6) that includes all network connectivity information needed for the query. Next, our algorithm locates q on an edge and adds it as the starting vertex into the modeling graph (Lines 7 - 8). Subsequently, MNDR 37 Algorithm 2 Mobile Network Distance Range Query (q, d) 1: /* q is the query object */ 2: /* d is the distance */ 3: resultObjs = φ 4: /* Finding the set of edgesE , AND verticesV overlapped by the circle with center point q, and radius d */ 5: (E ,V ) = Euclidean-range(q, d) 6: G = Create-modeling-graph(E ,V ) 7: e = Object-map-matching(q,E ) 8: q = Add-vertex-into-graph(G, q, e) 9: S = Compute-distance(G, q, d) 10: for each vertex v in S do 11: for each edge e outgoing from v do 12: cellSet = cellSet ∪ cellOverlapping(e,d− dist(q,v)) 13: end for 14: end for 15: resultObjs = Retrieve-objects(cellSet, G) 16: for each object m in resultObjs do 17: e(v 1 ,v 2 ) = Object-map-matching(m,E ) 18: dist(q,m)= min(dist(q,v 1 )+ dist(v 1 ,m), dist(q,v 2 )+ dist(v 2 ,m)) 19: if dist(q,m) >d then 20: resultObjs = resultObjs - m 21: end if 22: end for 23: return resultObjs invokes a modified Dijkstra’s algorithm to compute the distance of each vertex from q. Note that we add a constraint d in the distance computing so that any edge e(v 1 ,v 2 )with dist(q,e) >d will not be processed. This constraint prunes the search space. Once the distance computation terminates, the set S (Line 9) consists of the list of vertices whose distances from q are no longer than d and the constituting shortest paths. The algorithm further generates the set of grid cells overlapping with edges in S as showninLines10 - 14. Next, the system extracts moving objects from grid cells in cellSet. Note that this step is executed after we obtained the complete set of overlapping grid cells, hence we 38 avoid to retrieve the objects from the grid index repeatedly. Lines 16 - 22 describe a final filtering step that is performed on the retrieved moving objects, ensuring that for every object m, dist(q,m) ≤ d. Algorithm 3 Compute-distance (G, q, d) 1: /* G is the network to be visited */ 2: /* q is the starting vertex */ 3: /* d is the distance of expansion range */ 4: cellList = φ 5: for each vertex v in G do 6: dist(q,v)= ∞ 7: previous[v] = NULL 8: end for 9: dist(q,q)= 0 10: S = φ 11: verticesQueue = G(V) 12: while verticesQueue = φ AND u = Extract− min(verticesQueue) <d do 13: S = S ∪ u 14: for each edge e(u,v) outgoing from u do 15: if dist(q,u)+ dist(u,v) < dist(q,v) then 16: dist(q,v)= dist(q,u)+ dist(u,v) 17: previous[v]= u 18: end if 19: end for 20: end while 21: return S Algorithm 3 describes an algorithm based on Dijkstra’s algorithm that computes the distance of each vertex from the query point. It uses the same concept of edge relaxation as in the original algorithm, where it greedily searches for the vertex u in the vertex set verticesQueue that has the least disk(q,u) value. Moreover, Line 12 shows that the program terminates when the starting vertex of any edge in verticesQueue is larger than d. After that, set S consists of the list of vertices whose shortest path from q is no longer than d and the constituting shortest paths. 39 01 0 1 2 3 4 5 6 7 V 6 V 1 V 3 V 4 V 5 V 8 q inserted as the starting vertex V 2 3456 2 7 01 0 1 2 3 4 5 6 7 d[V 1 ]= 3.6 d[V 8 ] = 3456 27 d[V 3 ]= 3 d[V 6 ]= 6.5 d[V 5 ]= 3.4 d[V 4 ]= 2.5 q d[V 2 ]= 1 8 (a) (b) Figure 5.1: Mobile Network Distance Range query example. To illustrate our MNDR algorithm with an example, let us assume that the system is processing a network as shown in Figure 4.2, where the side length of cells is 1.0 unit. A query object q with dist(q,v2) = 1.0 submits a range query with a range d =3.5. MOVNet first launches a Euclidean distance range query with q as the center and d as the radius. Consequently, edges overlapping with the shadowed area will be retrieved from the R-tree index and a corresponding modeling graph is built as shown in Figure 5.1(a). Note that q is inserted as the starting vertex into the modeling graph. Next, Dijkstra’s algorithm is invoked. The algorithm runs in a greedy network expansion manner to compute the shortest path from the starting vertex (i.e., q) to other vertices with the distance constraint d. When Dijkstra’s algorithm finishes, the distance of each vertex from q is shown in Figure 5.1(b). Note that v 8 is not processed because dist(q,v 1 ) > 3.5. There is no object in e(v 1 ,v 8 ) that will be in the result set. In addition, S = (v 2 ,1), (v 4 ,2.5), (v 3 ,3), (v 5 ,3.4). Based on this information, MNDR computes cellSet in Lines 10 - 14, shown as the shadowed cells in Figure 5.1(b). After that, the moving 40 objects in cellSet are retrieved from the grid index to constitute the result set. However, several steps are required to ensure that the distance of each moving object is within range d (Lines 16 - 22). First, some cells might overlap with several edges. For instance, c(6,6) overlaps with e(v 2 ,v 3 )and e(v 3 ,v 4 ). Hence for each object in the result set, MNDR determines which edge the object is located on (Line 17). Second, some objects may be reachable by more than one path from the query point. Our system will only consider the shortest path and examine the path against the range d (Line 18). For example, moving objects on edge e(v 3 ,v 4 )have two paths from q (q → v 2 → v 3 , q → v 4 ). Our algorithm will compute the distance of each object via each path, and only use the shortest one. Finally, once the distance from q to the object is determined, MNDR confirms that the distance is ≤ d. For instance, for any object m retrieved from c(5,0), dist(q,m) > 3.5, thus the algorithm removes these objects in Lines 19 - 21. The MNDR algorithm is able to compute the exact result set for mobile network distance range queries. However, when computing cellSet in Algorithm 2, some cells can be further pruned before the system extracts the corresponding moving objects from the grid index. For instance, in the example illustrated above, every cell overlapping e(v 4 ,v 6 ) is in cellSet. Clearly, some of the cells should be pruned because their distances from q >d. This optimization can be achieved by using the geometric properties introduced in Section 4.2. We utilize the property of the maximum number of affected cells in Corol- lary 1 to prune the search space. Let us assume that MNDR generates the list of cells overlapping with an edge e(v 1 ,v 2 )and thereare n 1 affected cells. By using Corollary 1 we deduce that a range d−dist(q,v 1 ) is only able to overlap with at most n 2 cells, where 41 n 2 <n 1 . Therefore MNDR will only record the first n 2 cells on e(v 1 ,v 2 )into cellSet.As an example consult Figure 5.1(b). We know that dist(q,v 4 )=2.5, therefore we only need to record cells on e(v 4 ,v 6 ) within a range of 3.5-2.5=1.0from v 4 . Using the maximum bound of the number of affected cells, MNDR records the first 5 cells on e(v 4 ,v 6 )starting from v 4 , even though there are 6 cells overlapping with e(v 4 ,v 6 ). Our simulation results indicate that this property offers substantial performance im- provements when computing the affected cells over long edges (i.e., freeway segments). 5.2 Snapshot k Nearest Neighbor Query Algorithm We present a Mobile k Nearest Neighbor query algorithm (MKNN) leveraging our MNDR algorithm to efficiently compute the kNN POIs from the query point in the network. Moreover, we introduces the concept of progressive probe to estimate the result space as the first step of MKNN. We observe that the grid index in MOVNet enables fine-grained space partitioning. Additionally, the grid index maintains an object list in each grid cell, which can be quickly accessed to retrieve the number of enclosed objects. Therefore, we begin by searching the surrounding area of the query point in the grid index and continuously enlarging the area until we are able find a sub-space that contains kNN POIs in terms of the Euclidean distance. We term this procedure a progressive probe. Note that in the progressive probe, we only retrieve the size of the object list from each cell, while the distance of each object from the query point is not computed because we aim at obtaining an approximate area enclosing kNN objects by the network distance. Our experimental study shows that in 42 30% to 48% of the test cases the actual kNN objects are bounded by our progressive probe. More importantly, the complexity of retrieving the object list size from each cell is O(1), which is very efficient especially since our grid index is an in-memory structure. We detail the progressive probe next. We define that cells in the grid index are grouped into levels centered at c( xq l , yq l ), where q is a moving object submitting a mobile kNN query. The first level L 0 is the single cell c( xq l , yq l ) and cells in the next level are the surrounding cells of L 0 ,and so on. Formally, cells in the level L i (i∈{1,2,···}) can be represented as L i = c(x 1 ,y 1 )∪ c(x 2 ,y 2 )∪ c(x 3 ,y 3 )∪ c(x 4 ,y 4 ) ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ xq l − i≤ x 1 ≤ xq l + i,y 1 = yq l + i x 2 = xq l − i, yq l − i+1≤ y 2 ≤ yq l + i− 1 x 3 = xq l + i, yq l − i+1≤ y 3 ≤ yq l + i− 1 xq l − i≤ x 4 ≤ xq l + i,y 4 = yq l − i By using the definition above, the progressive probe first retrieves the number of objects in L 0 via the grid index. If there are less than k objects in L 0 , it continues to scan the number of objects in the next level of cells, and so on. Figure 5.2(a) illustrates an example of these steps. Assume the system is maintaining a network as shown in Figure 4.2 and a query object q in c(5,5) submits a nearest neighbor query with k = 10. The progressive probe first locates q in c(5,5), which becomes L 0 .Afterthat,the number of POIs in c(5,5) is retrieved from the grid index. If there are less than 10 POIs 43 01 0 1 2 3 4 5 6 7 3456 2 7 L 0 L 2 L 1 q 01 0 1 2 3 4 5 6 7 d[V 1 ]= 3.5 3456 27 d[V 3 ]= 3 d[V 6 ]= 6.5 d[V 5 ]= 3.4 d[V 4 ]= 2.5 q d[V 2 ]= 1 (a) (b) Figure 5.2: Mobile Network Distance k-NN query example. in L 0 , the progressive probe sequentially search the next levels L i ,where i∈{1,2,···}, illustrated in the shadowed areas in Figure 5.2(a). Assuming that at least 10 POIs have been found after the scan in L 2 , the probe stops and results in an estimated space for kNN objects in the network. Based on the fact that the R-tree is on secondary storage in MOVNet, MKNN utilizes this estimated area to launch a range query extracting the edges from the R-tree, instead of following a network expansion approach to retrieve a few edges at a time. Moreover, we introduce the following data structures in MKNN: candidateObjs and unvisitedV ertices. These are minimum priority queues on the value of the distances from the query point. The set of candidate objects is retrieved from the grid index as possible objects in the final result set. The set of unvisited vertices is to be expanded when there are less than k objects found during query processing. Additionally, we manage resultObjs as a maximum priority queue in terms of the distance from the query point with a size of k. 44 Algorithm 4 elaborates on the MKNN algorithm. MKNN first executes the progressive probe in the grid index so that an approximate query result space is created. After that, MKNN uses this subspace as an initial range to invoke the MNDR module so that the corresponding edges are retrieved from the R-tree and the distance of each vertex from q is computed (Lines 8 - 12). Given the example of Figure 5.2(a), Figure 5.2(b) demonstrates the correlated modeling graph and the distance to each vertex. Next, a vertex is de-queued from unvisitedV ertices (Line 15). For each outgoing edge from the vertex, the set of affected cells is computed and objects are retrieved from the corresponding grid cells and placed into candidateObjs (Lines 21 - 24). After that, we examine two possible cases: First, if there are less than k objects in resultObjs, MKNN de-queues objects from candidateObjs into resultObjs (Lines 25 - 27). Second, if the distance of the kth result object is greater than the distance of the first element of candidateObjs,the kth result object will be de-queued and inserted into candidateObjs. Next, candidateObjs de-queues an object and inserts it into resultObjs (Lines 28 - 30). The algorithm terminates when resultObjs contains k POIs and the distance of the kth result object is less than the distance of the minimum vertex in univsitedV ertices (Lines 17 - 20). Otherwise, if the last vertex v in the modeling graph (i.e., the vertex with the longest distance to q) is visited and the distance of the kth result object is greater than dist(q,v), MKNN will use dist(q,v) as the radius to launch a range query in the R-tree as a new iteration of MKNN (Line 32). Although this step causes I/O operations as well as the overhead of creating a modeling graph again, MKNN maintains the set of visited vertices in each iteration to avoid visiting these vertices in future iterations (Line 13). As our simulation results have verified, under various settings, MKNN requires no more than 45 two iterations during query processing in more than 97% of the test cases. Therefore, this method significantly reduces the I/O cost and ensures high system throughput. 46 Algorithm 4 Mobile Network Distance kNN Query (q, k) 1: /* q is the query object */ 2: /* k is the number of NN objects */ 3: /* l is the side length of cell */ 4: foundkObjs = false 5: visitedV ertices = φ 6: radius = Progressive-probe(q, k, l) 7: while foundkObjs = false do 8: (E ,V ) = Euclidean-range(q x , q y , radius) 9: G = Create-modeling-graph(E ,V ) 10: e = Object-map-matching(q,E ) 11: q = Add-vertex-into-graph(G, q x , q y , e) 12: S = Compute-distance(G, q) 13: unvisitedV ertices = S - visitedV ertices 14: while unvisitedV ertices != NULL do 15: minV ertex = De-queue(univisitedV ertices) 16: cellSet = φ 17: if resultObjs.size =kAND minV ertex.dist ≥ kth resultObjs.dist then 18: foundkObjs = true 19: break 20: end if 21: for each edge e outgoing from minV ertex do 22: cellSet = cellSet ∪ cellOverlapping(e,d− dist(q,v)) 23: end for 24: candidateObjs = candidateObjs ∪ Retrieve-objects(cellSet, G) 25: while resultObjs.size < k do 26: De-queue(candidateObjs)to resultObjs 27: end while 28: while Peak(candidateObjs).dist ≤ kth resultObjs.dist do 29: Swith(De-queue(candidateObjs), De-queue(resultObjs)) 30: end while 31: end while 32: radius = minV ertex.dist 33: end while 34: return resultObjs 47 Chapter 6 Performance Analysis We present our theoretical analysis of MOVNet in the following sections. We begin with discussing the relationship between the number of overlapping cells with regard to a range query in the space . Next, we analyze the system cost of MNDR and MKNN, respectively. We assume that the network and moving objects are uniformly distributed, which is similar to previous works [MHP05, YPK05], in 1 unit square space (i.e., for an object m,0 ≤ x m < 1, and 0 ≤ y m < 1). Although this is an optimal simplification, we use it to obtain general observations about the effect of the system parameters. A grid index with l × l side length accepts the moving object location updates. There are totally E edges and M moving objects in the network. 6.1 The Number of Overlapping Cells with regard to a Range Query Assume a user submits a range query in the space with an area of d × d,with d> l. Then d can be represented as d = i× l + x,where x⊂ [0,l)and i is an integer. Without 48 l l - x x x d Set0 Set1 Set1 Set2 Figure 6.1: the Relationship Between the Number of Overlapping Cells with regard to a Range Query loss of generality let us consider the case where the top-left corner of query q is located somewhere within the top-left cell of the grid index as shown in Fig 6.1. It can be verified that if the top-left corner of Q is inside Set0then Q covers (i+1) 2 zones, for Set1 Q covers (i+1) 2 + i + 1 zones, and for Set2itis(i+2) 2 [KPHA02]. Additionally, we know that the size of the area of Set0is (l− x) 2 ,the size of the area of Set1is2(l− x)x and it is x 2 for Set2. Let us assume that spatial range queries submitted by the users are uniformly distributed in the space, we can deduct that on average, a range query with a size of d × d overlaps with the following number of grid cells: NumberofOverlappingCells = (d + l) 2 l 2 The formula above shows that the number of overlapping cells with regard to a range query is affected by two factors: the side length of the range query and the cell size. The side length of the range query is quadratic to the number of overlapping cell (i.e., the 49 area of the range query is proportional to the number of overlapping cells). Additionally, the number of overlapping cells is inversely proportional to the cell size. 6.2 Analysis of MNDR For a MNDR query with a range d, let us assume the query covers an area of 2d× 2d. Although the Euclidean distance query in MNDR is in actually performed within an area of πd 2 , our assumption does not change the quality of our analysis. During the processing of MNDR, there are O(d 2 E) edges retrieved from the on-disk R-tree in the Euclidean distance range query. The next step that creates a modeling graph is of complexity O(d 2 E) since every edge will be recorded in the graph. Finding the edge where the query point is located can be achieved during the modeling graph construction. Additionally, inserting the query point into the modeling graph as the starting vertex requires only O(1) operations. The running time of Dijkstra’s algorithm to compute the distance of each vertex from the query point is O(d 2 E·lg(d 2 E)). Next, MNDR calculates the cell set overlapping with the edges based on the distance information. Note that each edge is examined at most once during the course of this step. Therefore, O(d 2 E) iterations are needed to calculate the overlapping cells. Moreover, since the length of the edge is bounded by d, the total complexity of this step is O(d 3 E). Finally, MNDR retrieves the objects from the grid index and computes the result set. For a range query with a side length of 2d, the number of overlapping grid cells is (2d + l) 2 /l 2 [WZK05]. For each cell, we can assume that there are l 2 M objects. Hence the number of moving objects retrieved in the final step is O((2d + l) 2 M). To sum up, the cost of MNDR can be represented as 50 O(d 2 Elg(d 2 E)+(2d + l) 2 M) We observe that the cost of MNDR is linear in the number of POIs. Similarly, the system throughput is proportional to the side length of cells (or inversely proportional to the number of cells). Additionally, both factors are lower-bounded by the cost of graph construction, Dijkstra’s algorithm, and the overlapping cell computation. Finally, the CPU cost is a quadratic function of d, which means a larger range results in a serious increase in CPU cost. 6.3 Analysis of MKNN For simplicity, let us assume that the progressive probe results in a subspace containing k nearest neighbor objects. In the case that MKNN needs to expand to a larger space with more iterations, it can be modeled with our cost model times a constant, which does not change the characteristic of our analysis. Since POIs are uniformly distributed, the subspace containing kNN objects has a size of k M . Therefore, MKNN needs to scan k M·l 2 cells to find the kth object and return a subspace. The subsequent steps that perform a Euclidean distance range query, construct the modeling graph, compute the overlapping cells, and retrieve objects are the same as the ones in MNDR. Hence the cost in these operations can be summarized as O( kE M ·(2+ lg kE M + kE M )+( kE M +l) 2 ·M). The final step that filters objects from candidateObjects into resultObjects is bounded by the size of resultObjects (i.e., k). In summary, the cost of MKNN is 51 O( k Ml 2 + kE M · kE M +( kE M + l) 2 · M) The equation above shows that the CPU cost of MKNN is proportional to k.The explanation is that with an increasing k, MKNN needs to search for a larger space to find the query result. Additionally, the CPU cost is inversely proportional to the number of objects. This is because with more POIs, the search space for finding the kth object becomes smaller, and vice versa. Finally, the system throughput as a function of the cell size is bounded by two factors: the cost from the progressive probe and the cost of retrieving objects from the grid index. A smaller cell size results in more overhead from the progressive probe. In contrast, increasing the size of cells implies that more objects are retrieved from the grid index. 52 Chapter 7 Continuous Query Processing In the chapter, we extend the functionality of MOVNet to support continuous query processing. We start by presenting our design of data structures by leveraging the dual- index capability of our snapshot query processing. Next, we present our continuous range query algorithm (C-MNDR). After that, we discuss our efficient continuous kNN query processing algorithm design, namely, C-MKNN. 7.1 Data Structure Design Recall that in our design of snapshot query algorithms, we estimate the set of the affected cells before we retrieve any edge and object, hence to minimize the I/O cost on R*- tree. This is accomplished by using the Euclidean distance restriction in MNDR, and the progressive probe in MKNN, respectively. On the other hand, for a dense network, such as the Los Angeles County data set that is used in our simulation, we observe the following characteristics: First, the network distance between the query point and the moving object grows much faster than that of the Euclidean distance. For instance, when a query has a 4-mile range, the average Euclidean distance from the query point to POIs 53 in the result set is 1.327 miles. In contrast, the average Euclidean distance is 3.478 miles with a 10-mile range setting. Since we use the Euclidean Distance Restriction as the first step to retrieve edges from the R-tree, there are a very large number of out-of-range edges that are retrieved from the R*-tree. Second, for dense networks, there are a very large number of short edges. Each cell has only a few edges that cross the cell boundary. Corollary 1 introduced in Section 4.2 works very efficiently only with long edges. Based on these observation, we introduce the concept of connecting vertices to effi- ciently support continuous query processing in dense networks. A connecting vertex is a vertex in the network that has at least one outgoing edge that crosses the boundary of its enclosing cell. For example, Figure 7.1 shows a dense network that overlaps with a2 × 2 grid index. There are 4 connecting vertices in c(0,1): v 1 , v 2 , v 3 ,and v 5 .The outgoing edge of a connecting vertex connects with its pairing connecting vertex.Acon- necting vertex has at least one corresponding pairing connecting vertex. For instance, the connecting vertex v 1 has two pairing connecting vertices, v 6 and v 7 .Thenumberof connecting vertices in MOVNet varies as the user changes the number of cells of the grid index in MOVNet. Figure 7.2 shows an example the data structures that is used in continuous query processing for c(0,1) in Figure 7.1. Based on our dual index design, an on-disk R*-tree structure is used to manage the road network data. Additionally, an in-memory grid index overlaps with the service space. Each cell has an object list, which records the identifiers of enclosed objects. The identifiers point to the object array in memory that stores the coordinates of objects. When a cell overlaps with a query, the enclosed edges are retrieved from R*-tree by using a stationary range query whose range is the area of 54 V 1 V 2 V 3 V 4 V 5 V 6 V 7 V 8 V 9 V 10 V 11 V 12 V 13 V 14 3.5 4.0 3.0 3.8 5.0 9.0 2.0 1.0 3.0 3.0 m 1 m 2 m 3 m 4 7.0 4.5 4.2 3.5 3.5 7.2 7.0 4.0 3.0 4.5 1 1 0 0 m 5 Figure 7.1: An example network indexed by a 2 × 2 grid index. V 1 V 2 V 5 2.0 1.0 V 3 V 4 4.0 V 2 V 2 V 3 V 4 V 5 V 1 V 3 3.5 V 1 2.0 V 5 3.5 V 1 V 6 3.5 V 7 9.0 V 9 3.0 V 10 3.0 V 7 7.2 Vertex List Hashing Buckets Edge List Vertex Coordinates V 2 1.0 4.0 V 3 3.5 V 4 3.5 V 5 Connecting Vertex List V 1 V 1 V 2 V 2 V 3 V 3 V 5 V 5 0 2.0 3.0 3.5 2.0 0 1.0 5.5 3.0 1.0 0 6.5 3.5 5.5 6.5 0 Distance Table of Connecting Vertices Modeling Graph Object Array m 1 m 2 m 5 Object List Figure 7.2: An example of the data structure of C-MNDR the cell. Correspondingly, a modeling graph is created on-the-fly and stored in memory. For this purpose, we design a function Create-graph(tr, c). It creates the graph for the cell c by retrieving the edges from R*-tree tr. It also initializes the distance of every vertex to be ∞. Note that the modeling graph only records the edges that are fully 55 enclosed in the cell. For instance, the edge list of v 2 records e(v 2 ,v 1 )and e(v 2 ,v 3 ). In contrast, e(v 2 ,v 9 ) crosses the boundary of c(0,1) hence it is not recorded by the modeling graph. Moreover, for each cell in the grid index, we pre-compute the following data set: First, we record the set of the connecting vertices and their pairing connecting vertices in the connect vertex list. Second, we create a distance table of the connecting vertices in each cell. It records the distance between each connecting vertex in the same cell. For example, for connecting vertex v 1 in c(0,1), the distances to other connecting vertices in the same cell are dist(v 1 ,v 2 ) = 2.0, dist(v 1 ,v 3 ) = 3.0, and dist(v 1 ,v 5 )=3.5, respectively. Finding the connecting vertices for a grid cell can be accomplished by launching an intersection range query and an enclosed range query with regard to the area of the grid cell. Edges that are in the result set of the intersection range query but are not in the result set of the enclosed range query connect the pairs of the connecting vertices. Once we have obtained the connecting vertices of a grid cell, computing the values in the distance table can be solved by invoking the Dijkstra’s algorithm starting from every connecting vertex. After the pre-computing is finished, the data is stored in memory in MOVNet. The user can also choose to store the connecting vertex list and the distance table on disk when there is a memory constraint. The connecting vertex list together with the distance table are able to provide a more precise estimation on the edges and cells that are affected by a query. In the following section, we describe how to use our data structures to process continuous queries. 56 7.2 Continuous Range Query Algorithm In this section, we describe efficient processing of continuous network-distance-based range queries. We first describe our design of computing initial query results. More importantly, we introduce the concept of Shortest-Distance-based Tree (SD-Tree) to monitor the edges that are affected by a continuous query. When the query point moves, we present a novel algorithm to rotate, truncate, and expand the SD-Tree to obtain the updated set of af- fected cells and the distances of vertices with regard to the query point movement. By using such a technique, continuous query processing can be accomplished in an incremen- tal manner, which significantly reduces the query cost. Finally, we outline of complete procedure for continuous range query processing. 7.2.1 Initial Result Computation in C-MNDR To process a continuous range query, the first step is to initiate snapshot range query processing to obtain the initial query result. When a moving object q submits a continuous range query, we first locate the cell where q is located. After that, we insert the query point as the start point into the modeling graph in the cell. Next, Dijkstra’s algorithm is invoked to compute the distance of each vertex in the graph from the query point. We set the distance constraint d in executing Dijkstra’s algorithm. Therefore for any vertex whosedistanceis >d, the algorithm terminates to expand to other vertices. When the distance computation finishes, we are able to obtain the distance of each connecting vertex in the cell from the query point. We maintain a minimum priority queue CV ToExpand to store these connecting vertices based on the distance values from the query point. Note 57 that if the distance of a connecting vertex is out of the query range, it will not be inserted into CV ToExpand.Moreover, resultCellSet stores the set of cells that possibly have objects in the query range. We initialize the result set by inserting the cell that encloses q into resultCellSet. Next, we start to compute other cells that possibly have result objects. We start by popping the first connecting vertex v from CV ToExpand. For each out-going edge of the connecting vertex, Algorithm 1 is invoked to compute the overlapping cells. These cells will be recorded in resultCellSet. After that, the pairing vertex v of v is selected and the overlapping cell c for v is located. Next, we use the distance table to determine the distances of other connecting vertices in c . When a connecting vertex is in the query range, it will be appended to CV ToExpand. The algorithm continuous to expand connecting vertices until CV ToExpand becomes empty. At that moment, we have discovered all the cells that are affected by the range query. The next phase is to retrieve moving objects in resultCellSet from the grid index to constitute the result set. Algorithm 5 details the computing of initial range query results. After we have located the enclosing cell of the query point (Line 2), created the modeling graph (Line 4), and inserted connecting vertices that are in the range (Lines 7 - 9) into CV ToExpand,we start to obtain the complete set of affected cells by using the distance tables (Lines 10 - 28). To illustrate this part of the algorithm with an example, let us assume that the system is processing a network as shown in Figure 7.1. A moving object m 5 with dist(m 5 ,v 2 )= 0.5 submits a continuous range query where the range d=7.5. After we finish executing 58 Algorithm 5 Compute-init-cont-rangeQuery (q, d) 1: resultObjs = φ, resultCellSet = φ 2: c = Locate-cell(q) 3: resultCellSet = resultCellSet ∪ c 4: G = Create-graph(tree, c) 5: Add-vertex-into-graph(G, q) 6: S = Compute-distance(G, q, d) 7: for each connecting vertex v in S where dist(q,v) <d do 8: CV ToExpand = CV ToExpand ∪ v 9: end for 10: while CV ToExpand != NULL do 11: v = De-queue(CV ToExpand) 12: for each pairing vertex v of v do 13: resultCellSet = resultCellSet ∪ cellOverlapping(e(v,v ),d− dist(q,v)) 14: if dist(q,v)+ length(v,v )<d AND dist(q,v)+ length(v,v ) < dist(q,v ) then 15: c = Locate-cell(v ) 16: if G in c == null then 17: G = Create-graph(tree, c ) 18: end if 19: dist(q,v )= dist(q,v)+ length(v,v ) 20: for each connecting vertex v in c do 21: if dist(q,v )<d AND the pairing vertex of v != v then 22: CV ToExpand = CV ToExpand ∪ v 23: end if 24: end for 25: Compute-distance(G , v , d) 26: end if 27: end for 28: end while 29: resultObjs =Retrieve-objects(resultCellSet) 30: return resultObjs Line 9 in Algorithm 5, the distance of each vertex in c(0,1) is shown in Figure 7.3 (a). Additionally, CV ToExpand has (v 2 =0.5), (v 1 =1.5), (v 3 =1.5), (v 5 =5).Next, v 2 is de-queued from CV ToExpand. It has a pairing connecting vertex v 9 , which is located in c(1,1). We insert c(1,1) into resultCellSet. With the distance information stored in the connecting vertex list, we determine that dist(q,v 9 )= 3.5 <d. Therefore, we retrieve the edges in c(1,1) and create the corresponding modeling graph. Additionally, 59 V 6 V 7 V 8 V 9 V 11 V 12 V 13 V 14 m 1 m 2 m 4 1 1 0 0 q d[V 1 ]= 1.5 d[V 2 ]= 0.5 d[V 3 ]= 1.5 d[V 4 ]= 5.5 d[V 5 ]= 5.0 V 10 m 3 V 7 V 8 V 11 V 12 V 13 V 14 m 1 m 2 m 4 1 1 0 0 q d[V 1 ]= 1.5 d[V 2 ]= 0.5 d[V 3 ]= 1.5 d[V 4 ]= 5.5 d[V 5 ]= 5.0 V 6 d[V 9 ]= 3.5 d[V 10 ]= 7.3 m 3 (a) (b) V 7 V 8 V 11 V 12 V 13 V 14 m 1 m 2 m 4 1 1 0 0 q d[V 1 ]= 1.5 d[V 2 ]= 0.5 d[V 3 ]= 1.5 d[V 4 ]= 5.5 d[V 5 ]= 5.0 d[V 6 ]= 5.0 d[V 10 ]= 7.3 d[V 9 ]= 3.5 m 3 V 7 V 8 V 11 V 12 V 13 V 14 m 1 m 2 m 4 1 1 0 0 q d[V 1 ]= 1.5 d[V 2 ]= 0.5 d[V 3 ]= 1.5 d[V 4 ]= 5.5 d[V 5 ]= 5.0 d[V 6 ]= 5.0 d[V 10 ]= 4.5 d[V 9 ]= 3.5 m 3 (c) (d) Figure 7.3: The distances of vertices in an example of initial continuous range query processing. the connecting vertex list of c(1,1) indicates that the connecting vertices in c(1,1) are v 9 , v 10 and v 12 . Based on the values stored in the distance table of c(1,1), we are able to conclude that dist(q,v 10 )=7.3, and dist(q,v 12 )=7.7 (which is out of range). We set the condition in Line 21 to avoid the expansion in loop, hence our expansion on v 9 will not move back to v 2 . Additionally, v 9 also connects with v 14 , which is a path that leads to c(1,0). Consequently, we insert v 9 and v 10 into CV ToExpand.Now CV ToExpand 60 enqueues the following items (v 1 =1.5), (v 3 =1.5), (v 9 =3.5), (v 5 =5), (v 10 =7.3) . The distance of each vertex within the query range in c(1,1) is shown in Figure 7.3 (b). Next, we de-queue v 1 from CV ToExpand. The pairing connecting vertex v 7 is out of range while v 6 has dist(q,v 6 )= 5.0. Therefore, we insert c(0,0) into resultCellSet.On the other side, other connecting vertices in c(0,0) are out of range based on the values in the distance table. Therefore, no connecting vertex is inserted into CV ToExpand.When the expansion finishes with c(0,0), the distance of each vertex within the query range is showninFigure7.3 (c). Now we de-queue v 3 from CV ToExpand. The pairing connecting vertex is v 10 .Al- though v 10 was expanded and we recorded dist(q,v 10 )=7.3, there is a shorter route from q via v 3 where dist(q,v 10 )=4.5. This case is recognized by our algorithm in Line 14. We update the distance of v 10 . Additionally, following the path from v 10 , other connecting vertices in c(1,1) have dist(q,v 9 )=8.3and dist(q,v 12 )=12.0, which are both out of range. Therefore, no connecting vertex is inserted into CV ToExpand.Afterthat,the distance of each vertex in c(1,1) is computed again. However, when the distance compu- tation finishes, no other vertex distances in c(1,1) are shortened. The distance of vertices in c(1,1) is presented in Figure 7.3 (d). There are three connecting vertices left in CV ToExpand: (v 9 =3.5), (v 5 =5), (v 10 =7.3) . We de-queue v 9 first. It has two pairing vertices, v 2 and v 14 .Since dist(q,v 2 )= 0.5 is already recorded and it is shorter than the path via v 9 ,we do not expand on v 9 at this moment. Additionally, v 14 is out of range. We only insert c(1,0) into resultCellSet. Next, we de-queue v 5 . The pairing vertex is v 7 and it is out of range, hence there is no expansion needed. Finally, v 10 is de-queued and its pairing vertex is 61 v 3 . However dist(q,v 3 )=1.5 is already recorded, which indicates that we have already found a shorter route. Therefore we do not expand v 3 . Since there is no vertex left in CV ToExpand, we finish searching the affected cells with resultCellSet = c(0,1), c(1,1), c(0,0), c(1,0) . Finally, the moving objects in resultCellSet are retrieved and the distance from q is computed based on the distances of the starting and ending vertices of the edge where the object is located (Line 29) to constitute the result set. In this example, resultObjs has 3 objects when the algorithm terminates: m 1 , m 2 ,and m 3 . 7.2.2 Monitoring Object Updates in C-MNDR After the initial query result is constructed, our next step is to monitor the change of the query result with regard to moving object updates. To accommodate continuous query processing, we extend the data structure of the object array that stores the coordinates of moving objects. We assume that when an object sends an updated position to MOVNet, the server invokes a map-matching procedure to locate the object on a road segment. For each element in the object array, we record the coordinates of the moving object, the edge on which the moving object is located (i.e., the starting and ending vertices of theedge). Wealsocreatea queryPoint flag indicating if the moving object belongs to a query point. More importantly, we observe that the initial query result processing creates several data sets that are useful for latter processing. First, resultCellSet indicates a snapshot of the area in the service space that is affected by the query. Second, the expansion procedure of the initial query processing provides the complete set of distances of vertices 62 that are within the query range. Based on these information, we introduce the concept of Shortest-Distance-based Tree (SD-tree, for short) to facilitate the continuous query processing. The SD-tree is a tree-based structure that continuously records the distances of vertices and the corresponding shortest paths from the query point. For instance, in processing the query in Figure 7.3 (d), we finish with obtaining the set of vertices whose distance are within the range (v 2 =0.5), (v 1 =1.5), (v 3 =1.5), (v 9 =3.5), (v 10 = 4.5), (v 5 =5), (v 4 =5.5). Consequently, the initial SD-tree is constructed as shown in Figure 7.4. Starting from the query point, we record every vertex that is within the query range in the SD-tree. The branches of the SD-tree represent the shortest path for each vertex from the query point. For instance, dist(q,v 10 )=4.5, and the shortest path is q → v 2 → v 3 → v 10 . Note that the SD-tree does not record every edge whose starting and ending vertices fall both within the range, such as e(v 9 ,v 10 ), because these edges do not belong to any shortest path. Additionally, we define two types of vertices in the SD-tree. A complete vertex in the SD-tree indicates that each edge starting from the vertex is recorded in the SD-tree. In contrast, a partial vertex implies that some edges starting from the vertex are not recorded in the SD-tree. For example, v 2 is connected with q, v 3 ,and v 9 . These edges are all recorded in the SD-tree, hence v 2 is a complete vertex, which is shown in black in Figure 7.4. In contrast, v 5 is connected with v 1 , v 4 and v 7 . Since v 7 is out of range, it is not recorded in the SD-tree. Additionally, v 5 and v 4 are not connected in the SD-tree because each of them follows a different shortest path from q. Therefore v 5 is a partial vertex, which in shown in red in Figure 7.4. To process moving object updates, we define that there are two types of object up- dates: query point updates and non-query-point updates. Since MOVNet uses periodic 63 q V 2 (0.5) V 1 (1.5) V 5 (5.0) V 9 (3.5) V 3 (1.5) V 4 (5.5) V 10 (4.5) V 6 (5.0) Figure 7.4: An example of the SD-tree sampling on moving object positions, during each cycle there are a number of object updates received and stored in an object update buffer. At the beginning of each new cycle, MOVNet invokes a procedure to process these object updates. If MOVNet receives a query point update (which can be confirmed by checking the queryPoint bit in the ob- ject array), we update the coordinates of the object first. The query point is also inserted into a queue for further processing, which we will describe in more details in the next section. Let us first assume that the query point has no update. For non-query-point updates, our goal is to monitor how the resultCellSet changes. Specifically, there are two cases that affect the result set: First, an object m moves onto an edge whose starting or ending vertex is in the SD-tree, and hence m might be in the result set. Second, an object m ∈ resultObjs moves to an edge whose starting and ending vertices are not in the SD-tree, and therefore m will be removed from the result set. Algorithm 6 shows the details of updating the result object set when the query point is stationary during an update cycle. For an updated object m, if it was previously in the result set, it will be removed from the result set first (Line 4). After that, we 64 Algorithm 6 Update-CRange-resultSet() 1: /*M is set of updated objects */ 2: for each m ∈M do 3: if m ∈ resultObjs then 4: Remove-Obj(m, resultObjs) 5: end if 6: c = Locate-cell(m) 7: if c ∈ resultCellSet then 8: /* m is lcoated on e(v 1 ,v 2 )*/ 9: dist(q,m)= min(dist(q,v1) + dist(v 1 ,m), dist(q,v 2 )+ dist(v 2 ,m)) 10: if dist(q,m) ≤ d then 11: resultObjs = resultObjs ∪ m 12: end if 13: end if 14: end for 15: return resultObjs locate the enclosing cell with regard to the updated coordinates of the object. If the cell is in resultCellSet, we perform the distance computation based on m’s edge’s starting and ending vertices (Line 9). Otherwise, m is not in resultCellSet, hence it is not in resultObjs. Finally, when dist(q,m) ≤ d, m will be inserted into the result set. 7.2.3 Query Object Update Processing in C-MNDR In this section, we consider the scenario when MOVNet receives a query point update. Recall that the SD-tree stores a snapshot of the connectivity and the distance information within the query range. Therefore, if the query point moves to a position that is still within this range, we are able to use the SD-tree to avoid launching the query processing from the very beginning. On the other hand, if the query point moves out of the SD-tree monitoring area, we will have to invoke the initial query result processing to obtain the query result. For instance, if the query point in Figure 7.3 (d) moves to e(v 11 ,v 12 )within one update interval, there is no information we can use directly from the query processing 65 in previous steps. Hence a new range query will be issued. Note that this is unlikely to happen, unless the query object moves very fast. In the scenario that the query point moves to a new position that is within the range from the original location, there are two cases: First, the query point moves to an edge that is recorded in the SD-tree. Second, the query point moves to an edge that is not recorded in the SD-tree, however both the starting and ending vertices of the edge are recorded in the SD-tree. q V 2 (0.5) V 1 (1.5) V 5 (5.0) V 9 (3.5) V 3 (1.5) V 4 (5.5) V 10 (4.5) V 6 (5.0) q V 2 (5.0) V 1 (3.0) V 5 (0.5) V 9 (8.0) V 3 (6.0) V 4 (10.5) V 10 (9.0) V 6 (6.5) (a) (b) q V 2 (5.0) V 1 (3.0) V 5 (0.5) V 3 (6.0) V 6 (6.5) q V 2 (5.0) V 1 (3.0) V 5 (0.5) V 3 (6.0) V 4 (4.0) V 6 (6.5) (c) (d) Figure 7.5: The update of SD-tree when the query point moves to an edge that is recorded in the original SD-tree. Let us consider the first case based on the example in Figure 7.5. Assume q moves to e(v 5 ,v 1 )where dist(v 5 ,q)= 0.5. In order to update resultCellSet and the SD-tree, 66 we first move q to the edge on which it is located as shown in Figure 7.5 (a). Next, we rotate the tree to place q such that q will be the root element in the SD-tree again. We also recompute the distance for each vertex correspondingly as displayed in Figure 7.5 (b). Next, we check the validity of vertices and their distances on the rotated tree. We know that for a complete vertex in the SD-tree, we have already retrieved the complete connectivity information, therefore it is always valid for the updated distance because the change of its parent and children vertices records all possible distance updates. However, for a partial vertex, if the updated distance is shortened, the distance information on the children of the vertex is invalid because some shorter paths can be constructed. Therefore, we need to expand these shortened-distance partial vertices in the SD-tree. Moreover, if the updated distance for a partial vertex becomes longer, there is no requirement to expand the network from the vertex. This is because that during the processing in previous steps (e.g., the initial query result processing), we have already ensured that the routes that are not recorded from the partial vertex are either out of range or non- optimal. With a longer value of distance, it is not necessary to expand the vertex. As we can see in Figure 7.5 (b), only the distance of the partial vertex v 5 is shortened from 5.0to 0.5 when we compare the distance information to the one in Figure 7.4. Other partial vertices have longer distances from q after the tree rotation. Therefore, we insert v 5 into a queue to later invoke the network expansion. In case v 5 has any children, we remove these child vertices. After that, we update the distances of vertices according to the values in the SD-tree. If a vertex is not recorded in the updated SD-tree, the distance is reset. Next, we truncate the vertices that become out of range after the rotation. In Figure 7.5 (b), v 9 , v 4 and v 10 become out of range, they are removed from the SD-tree and 67 their parent vertices become partial vertices (Figure 7.5 (c)). Note that when a vertex is removed, the overlapping cells of its outgoing edges will be removed from resultCellSet, except the one that connects its parent vertex in the SD-tree. Finally, we expand from v 5 to obtain other vertices that are within the range with an optimal path. This step can be regarded as a similar expansion algorithm as we described in Section 7.2.1 with v 5 having an initial distance of 0.5. Once the expansion finishes, the SD-tree has been transformed into the one in Figure 7.5 (d). Specifically, we found ashortestpath q → v 5 → v 4 where dist(q,v 4 )= 4.0. Additionally, resultCellSet = c(0,1), c(1,1), c(0,0) . We now move onto the second case. Let us assume that q moves to e(v 9 ,v 10 )where dist(v 10 ,q)=0.1. The challenge is that e(v 9 ,v 10 ) is not recorded as an edge in the SD- tree, hence we are not able to move q directly to that edge in the SD-tree. Recall that a dense network is very highly connected. Our study suggests that the SD-tree records about 50% of the edges that are within the range for the data set of networks in Los Angeles County. It is highly desirable to have an algorithm to support processing query point updates in this case. We observe that for a query point q located on e(v 9 ,v 10 ), the shortest path of a vertex is via either v 9 or v 10 . Based on the information stored in the SD-tree, we are able to construct the updated distance values with a few steps. To accomplish that, we first use v 9 as the root element to rotate the SD-tree. Next, we insert the updated SD-tree as a child node of q connected by e(q,v 9 ) (i.e., the sub-tree starting from the left child of q in Figure 7.6). After that, we use v 10 as the root element to rotate the original SD-tree. 68 The rotated SD-tree is added as another sub-tree starting as the right child node of q connected by e(q,v 10 ) in Figure 7.6. q V 2 (6.7) V 1 (8.7) V 5 (12.2) V 9 (3.7) V 3 (7.7) V 4 (11.7) V 10 (10.7) V 6 (12.2) V 10 (0.1) V 3 (3.1) V 2 (4.1) V 4 (7.1) V 9 (7.1) V 1 (6.1) V 5 (9.6) V 6 (9.6) Figure 7.6: The transformation of the SD-tree when q moves to an edge that is not recorded in the original SD-tree. Figure 7.6 explores possible paths to each vertex which was originally within the range based on connectivity information stored in the SD-tree. More specifically, for each vertex, there are two paths from q via the starting and ending vertex of the edge on which q is located, respectively. To compute the optimal path, we invoke a pre-ordered (breadth first) tree traversal to obtain the path that has a shorter distance from q.Let us start with v 9 in this example, which is the first child of q in Figure 7.6. There are two paths, q → v 9 ,and q → v 10 → v 3 → v 2 → v 9 , that are recorded in the transformed SD-tree. Since the first path is shorter, we delete v 9 as the child of v 2 . Additionally, we set v 2 to be a partial vertex. Next, v 10 is examined. As we shall see, the path q → v 10 has a shorter distance than the one of q → v 9 → v 2 → v 3 → v 10 . Therefore, we keep the first path. When we examine on v 2 , we find that the path q → v 10 → v 3 → v 2 is shorter than the one of q → v 9 → v 2 .Consequently,weremove v 2 as the child of v 9 as well as all children of v 2 . The result of the removal operation is shown in Figure 7.7 (a). 69 q V 9 (3.7) V 10 (0.1) V 3 (3.1) V 2 (4.1) V 4 (7.1) V 1 (6.1) V 5 (9.6) V 6 (9.6) q V 9 (3.7) V 10 (0.1) V 3 (3.1) V 2 (4.1) V 4 (7.1) V 1 (6.1) V 11 (4.6) (a) Tree traversal to obtain the shortest path for vertices in the SD-tree. (b) The updated SD-tree with valid shortest distances on the vertices. Figure 7.7: The update of SD-tree when the query point moves to an edge that is not recorded in the original SD-tree. The traversal continues to process other vertices in the SD-tree. Since each vertex has only one instance in Figure 7.7(a), there is no operation during the remaining tree traversal. The resulting SD-tree has the optimal distance values of vertices based on information stored in the original SD-tree. After that, we check the validity of vertices and their distances. Figure 7.7(a) indicates that for partial vertices, dist(q,v 10 ) is shortened. Therefore, v 10 is inserted into the queue for expansion in latter steps. Meanwhile, we have already constructed some network connectivity and distance information based on the data stored in the SD-tree, and thus avoid to start query processing from the very beginning. Finally, we remove vertices that are out of range in the SD-tree, the result is showninFigure7.7 (b). In summary, we elaborate on the procedure of maintaining the SD-tree with regard to the query point update in Algorithm 7. 70 Algorithm 7 Update-SD-tree(q, d) 1: if q moves to an edge e(v 1 ,v 2 )that is not recorded SD-tree then 2: subTree1 = Rotate-tree(SD-tree, v 1 ) 3: subTree2 = Rotate-tree(SD-tree, v 2 ) 4: SD-tree = φ 5: SD-tree = Add-child(subTree1) 6: SD-tree = Add-child(subTree2) 7: Determine-shortest-paths(SD-tree) 8: else 9: SD-tree = Rotate-tree(SD-tree, q) 10: end if 11: Remove-out-of-range-vertices(SD-tree) 12: for each vertex v whose distance is shortened do 13: Delete-child-vertices(v) 14: Expand-vertex(v, d) 15: end for 7.2.4 Overview of the Continuous Range Query Processing In previous sections, we presented the issues and solutions involving initial query result processing, creating the SD-tree, dealing with object updates, and maintaining the SD- tree with regard to the query point updates. In this section we combine all the different components and describe the complete procedure of C-MNDR to process a continuous network-distance-based range query in Algorithm 8. When MOVNet receives a request from q for a continuous range query, it launches the initial query result processing (Line 3) as we described in Section 7.2.1. Once the processing is finished, we create a corresponding SD-tree based on the connectivity and distance information we have collected (Line 4). Next, at the beginning of each update cycle, MOVNet examines if the query point submits an update. If the query point moves to a position that is out of the SD-tree monitoring area, MOVNet again invokes the initial query result processing and constructs a new SD-tree afterward (Lines 7 - 11). On the other hand, if the query point moves to a position that is in the area enclosed by the 71 Algorithm 8 C-MNDR(q, d) 1: /* q is the query object */ 2: /* d is the range */ 3: Compute-init-cont-rangeQuery (q, d) 4: Build-SD-tree(q) 5: for each update cycle do 6: if query point position is updated then 7: if query point moves out of SD-tree then 8: Compute-init-cont-rangeQuery (q, d) 9: Build-SD-tree(q) 10: continue 11: else 12: resultObjs = φ, resultCellSet = φ 13: Update-SD-tree(q, d) 14: resultObjs =Retrieve-objects(resultCellSet) 15: end if 16: else 17: Update-CRange-resultSet() 18: end if 19: end for SD-tree, we utilize the current SD-tree to expedite the query processing (Lines 12 - 14). In case that the query point has no update, C-MNDR only launches Update-CRange- resultSet to update the result set. In summary, MOVNet processes continuous queries by using the connecting vertices to determine the set of cells and vertices that overlap with the query. Additionally, MOVNet uses the SD-tree to monitor the changes of the query along the time dimension. We have presented the algorithm to rotate, truncate, and extend the edges in the SD-tree with regard to object updates. Our simulation results indicate that the system performance of C-MNDR is much more efficient than executing the snapshot-based query processing at the beginning of each update cycle. Extensive results will be presented in Section 8.3. 72 7.3 Contiuous k Nearest Neighbor Query Algorithm In this section, we leverage our design of C-MNDR to accommodate the processing of continuous kNN queries (C-MKNN). We first use the connecting vertices to obtain the initial query result. After that, we discuss how to use the SD-tree to maintain the query result. Finally, we summarize the complete procedure for processing continuous kNN queries. 7.3.1 Initial Result Computation in C-MKNN In our design of C-MNDR, the query processing for the initial query result starts with computing the distance information in the cell where q is located. After that, we are able to use the connecting vertices of the cell to estimate the distances of moving objects in neighboring cells. Specifically, a procedure of cell-based expansion is invoked based on the distances of connecting vertices stored in CV ToExpand and the values in the distance tables. As the first step of processing a continuous kNN query, we adopt the technique of cell-based expansion in C-MNDR, which provides fine granularity in the space in search for kNN objects. As indicated, we utilize the data structures of CV ToExpand and resultCellSet from C-MNDR. Moreover, we introduce a minimum priority queue termed candidateObjs, which acts as a buffer to store moving objects that are retrieved from the grid index as possible objects in the result set. We store these objects in the order of their distances from the query point. Additionally, we manage resultObjs as a maximum priority queue in terms of the distance from the query point with a size of k. 73 We define a function Update-resultObjs(candidateObjs). Its purpose is to update resultObjs by examining three cases in the following order: First, if there is an object m that has instances in resultObjs and candidateObjs, respectively, we check the distance for each instance. If dist(q,m)in resultObjs less than the distance between m and q in candidateObjs, we discard the one in candidateObjs. On the other hand, if m in resultObjs has a longer distance, we replace it with the instance in candidateObjs.Next, if there are less than k objects in resultObjs, we de-queue objects from candidateObjs into resultObjs . Third, if the distance of the kth result object is greater than the distance of the first element of candidateObjs,the kth result object will be de-queued and inserted into candidateObjs.Next, candidateObjs de-queues an object and inserts it into resultObjs. Algorithm 9 Compute-init-cont-NNQuery (q, k) 1: foundkObjs = false 2: c = Locate-cell(q) 3: resultCellSet = resultCellSet ∪ c 4: G = Create-graph(tree, c) 5: Add-vertex-into-graph(G, q) 6: S = Compute-distance(G, q) 7: Add each connecting vertex v in S to CV ToExpand 8: candidateObjs = Retrieve-objs(c, G) 9: resultObjs=Update-resultObjs(c, G) 10: if resultObjs.size =kAND 1st CV ToExpand.dist ≥ kth resultObjs.dist then 11: foundkObjs = true 12: else 13: resultObjs = Cell-based-expansion(CV ToExpand, k, foundkObjs) 14: end if 15: return resultObjs Algorithm 9 elaborates on the details of computing initial query result in C-MKNN. We first locate the cell c that encloses q (Line 2). Then we insert q as the starting point into the modeling graph and Dijkstra’s algorithm is invoked to compute the distance of 74 each vertex in the graph from the query point (Lines 3 - 6). In C-MKNN, we compute the distances for all vertices in the same cell. When the distance computing finishes, we are able to obtain the distance of each connecting vertex in the cell from the query point. We use CV ToExpand to store these connecting vertices ordered by the value of distances from the query point (Line 7). Moreover, we retrieve the objects in c to build the result set (Line 9). At this moment, if there are less than k objects in resultObjs or the distance of the kth result object is longer than the distance of the first connecting vertex, we need to examine the neighbor cells to compute the complete result set. Hence we invoke a cell-based network expansion, which is presented in Algorithm 10. During the cell-based expansion, we de-queue the first connecting vertex v from CV ToExpand. For each out- going edge of the connecting vertex, Algorithm 1 is invoked the compute the overlapping cells. These cells will be recorded in resultCellSet. After that, the pairing vertex v of v is selected and the overlapping cell c for v is located. Please note the condition that we examine in Line 7 to avoid unnecessary network expansion: if we have already found a path to v that is shorter than the path via v, we do not expand v at this step. Next, we compute the distances of vertices in c and retrieve objects to update our result set (Lines 11 -13 in Algorithm 1). Finally, we use the distance table to determine the distances of other connecting vertices in c and insert them into CV ToExpand (Lines 14 - 18 in Algorithm 1). The cell-based network expansion terminates once we have found k objects and the kth result object is closer than the distance of the minimum connecting vertex in CV ToExpand.Atthistime, resultObjs contains the complete result set. 75 Algorithm 10 Cell-based-expansion(CV ToExpand, k, foundkObjs) 1: while CV ToExpand != NULL AND foundkObjs == false do 2: v = De-queue(CV ToExpand) 3: for each pairing vertex v of v do 4: C = cellOverlapping(e(v,v )) 5: resultCellSet = resultCellSet ∪ C 6: candidateObjs = Retrieve-objs(C) 7: if dist(q,v)+ length(v,v ) < dist(q,v ) then 8: c = Locate-cell(v ) 9: G = Create-graph(tree, c ) 10: dist(q,v )= dist(q,v)+ length(v,v ) 11: Compute-distance(G , v ) 12: candidateObjs = Retrieve-objs(c , G ) 13: resultObjs =Update-resultObjs(c , G) 14: for each connecting vertex v in c do 15: if the pairing vertex of v != v then 16: CV ToExpand = CV ToExpand ∪ v 17: end if 18: end for 19: end if 20: end for 21: if resultObjs.size =kAND 1st CV ToExpand.dist≥ kth resultObjs.dist then 22: foundkObjs = true 23: end if 24: end while 25: return resultObjs Step CV ToExpand resultObjs 1 (v 2 =0.5), (v 3 =1.5), (v 1 =1.5), (m 1 =3.0), (m 2 =4.5) (v 5 =5) 2 (v 3 =1.5), (v 1 =1.5), (v 9 =3.5), (m 1 =3.0), (m 2 =4.5), (m 3 =7.7) (v 5 =5), (v 10 =7.3), (v 12 =7.7) 3 (v 1 =1.5), (v 9 =3.5), (v 5 =5), (m 1 =3.0), (m 2 =4.5), (m 3 =4.9) (v 10 =7.3), (v 12 =7.7) 4 (v 9 =3.5), (v 5 =5), (v 10 =7.3), (m 1 =3.0), (m 2 =4.5), (m 3 =4.9) (v 12 =7.7), (v 8 =10.0), (v 7 =10.5) 5 (v 5 =5), (v 10 =7.3), (v 12 =7.7), (m 1 =3.0), (m 2 =4.5), (m 3 =4.9) (v 8 =10.0), (v 7 =10.5), (v 14 =10.5), (v 13 =13.5) Table 7.1: Running steps of an example of C-MKNN 76 Let us illustrate the execution of C-MKNN with an example as shown in Figure 7.1. There are 5 moving objects in the service space. The positions of these objects at current time are: dist(v 1 ,m 1 )=1.5, dist(v 3 ,m 2 )=3.0, dist(v 10 ,m 3 )=0.4, dist(v 12 ,m 4 )= 6.5, and dist(v 2 ,m 5 )= 0.5. Assume that m 5 submits a continuous kNN query where k = 3. C-MKNN first locates the query point in c(0,1). After we finish executing Line 4 in Algorithm 9, the values of CV ToExpand and resultObjs are shown in Step 1 of Table 7.1. Since there are less than 3 objects in resultObjs, v 2 is de-queued from CV ToExpand. It has a pairing connecting vertex v 9 , which is located in c(1,1). We insert c(1,1) into resultCellSet. Moreover, with the distance information stored in the connecting vertex list, we determine that dist(q,v 9 )= 3.5. After we compute the distances of vertices and retrieve the objects from c(1,1), m 3 is inserted into resultObjs. The connecting vertex list of c(1,1) indicates that the connecting vertices in c(1,1) are v 9 , v 10 and v 12 .Based on the values stored in the distance table of c(1,1), we insert each connecting vertex into CV ToExpand,which is showninStep2of Table7.1. Although there are 3 objects in resultObjs, the minimum connecting vertex v 3 in CV ToExpand has a shorter distance the kth object m 3 in resultObjs. Therefore, we de-queue v 3 from CV ToExpand. v 3 is connected with v 10 . We know that dist(q,v 10 ) =7.3 from previous step of computing. Now we foune a shorter route via v 3 , hence we update dist(q,v 10 )= 4.5. Based on the updated distance information, we update m 3 in resultObjs with dist(q,m 3 )=5.0. Additionally, the paths to other connecting vertices (i.e., v 9 ,and v 12 )via v 3 are longer than the ones that we recorded in CV ToExpand, hence we keep our previous records of the distances in CV ToExpand,which isshownin Step 3 of Table 7.1. 77 Next, v 1 is de-queued from CV ToExpand. The pairing vertices are v 6 and v 7 ,which are both located in c(0,0). Since there is no object in the cell, resultObjs keeps the same value. Additionally, the updated structure of CV ToExpand is showninStep 4 of Table 7.1. Finally, v 9 is de-queued from CV ToExpand. The updated CV ToExpand queue is illustrated in Step 5 of Table 7.1. There is one object m 4 in c(1,0) whose distance is longer than m 3 . At this point, we have found k objects and the next element in CV ToExpand is v 5 , which has the same distance as m 3 . We conclude that the result set is complete and the algorithm terminates. At this moment, resultCellSet = c(0,1), c(1,1), c(0,0), c(0,1) The distances of each expanded vertex are shown in Figure 7.8 (a). m 1 m 2 m 4 1 1 0 0 q d[V 1 ]= 1.5 d[V 2 ]= 0.5 d[V 3 ]= 1.5 d[V 4 ]= 5.5 d[V 5 ]= 5.0 d[V 6 ]= 5.0 d[V 10 ]= 4.5 d[V 9 ]= 3.5 m 3 d[V 12 ]= 7.7 d[V 11 ]= 9.0 d[V 7 ]= 10.5 d[V 8 ]= 10 d[V 14 ]= 10.5 d[V 13 ]= 13.5 q V 2 (0.5) V 1 (1.5) V 9 (3.5) V 3 (1.5) V 10 (4.5) (a) The distances of expanded vertices. (b) The corresponding SD-tree. Figure 7.8: The distances of expanded vertices and the SD-tree for an example of C- MKNN. 78 7.3.2 Monitoring Object Updates in C-MKNN After the initial query result is computed, C-MKNN begins to monitor the location change of moving objects over time. Similar to the design of C-MNDR, we utilize the SD-tree to maintain the connectivity and distance information of the network. In this section, we elaborate on the details of dealing with moving object updates in C-MKNN. We observe that once the initial query result is computed, the distance information of the network within the range of the kth object in resultObjs is optimal, i.e., for each object in resultObjs, C-MKNN obtains the shortest path during the initial query result processing. Therefore, we can create a corresponding SD-tree with a range constraint that equals the distance of the kth object in resultObjs. For instance, in the example described in Section 7.3.1, the kth object m 3 has a distance of 4.9. Therefore we create the corresponding SD-tree as shown in Figure 7.8 (b). Similar to our design of C-MNDR, we distinguish two types of the object updates: query point updates and non-query-point updates. Let us consider non-query-point up- dates first. We present details of processing non-query-point udpates in Algorithm 11. Since resultCellSet records the cells that are affected by a query, we monitor the change of objects in these cells as the first step to maintain the query result. For an updated object m,if it was in resultObjs or candidateObjs, it will be removed from these data structures. After that, we locate the enclosing cell with regard to the coordinates of the updated object. If it is in resultCellSet and the starting or ending vertex of the 79 edge where the object is located on is recorded in the SD-tree, we perform the dis- tance computation based on its starting and ending vertices, and insert the object into candidateObjs. After all updated objects are processed, we invoke Update-resultObjs to update resultObjs from candidateObjs. Algorithm 11 Update-CNN-resultSet() 1: foundkObjs = false 2: /*M is set of updated objects */ 3: for each m ∈M do 4: if m ∈ resultObjs then 5: Remove-Obj(m, resultObjs) 6: end if 7: c = Locate-cell(m) 8: if c ∈ resultCellSet AND m is located on e(v 1 ,v 2 )where v 1 or v 2 is recorded in the SD-tree then 9: dist(q,m)= min(dist(q,v1) + dist(v 1 ,m), dist(q,v 2 )+ dist(v 2 ,m)) 10: candidateObjs = candidateObjs ∪ m 11: end if 12: end for 13: Update-resultObjs(candidateObjs) 14: if resultObjs.size =kAND 1st CV ToExpand.dist ≥ kth resultObjs.dist then 15: foundkObjs = true 16: else 17: resultObjs = Cell-based-expansion(CV ToExpand, k, foundkObjs) 18: Update-SD-tree() 19: end if 20: return resultObjs Assume that m 3 updates its position during an update cycle. Let us consider two cases: First, m 3 moves to e(v 3 ,v 10 )where dist(v 3 ,m 3 )=2.0. In this case, we remove m 3 in Line 4 of Algorithm 11. After that, since v 3 and v 10 are both recorded in the SD-tree, we can compute its updated distance from q as 3.5. m 3 is inserted into candidateObjs in Line 9. Once we arrive at Line 12, we have resultObjs = (m 1 =3.0),(m 3 =3.5), (m 2 =4.5).Sincethe kth object m 2 has a shorter distance than the first element v 5 in CV ToExpand, we have computed the complete result set (Line 15). Note that the 80 distance of the kth object in resultObjs has changed from 4.9to4.5. However, we do not shrink the SD-tree in such a case. Our goal is for the SD-tree to maintain the shortest paths from the query point during the query processing. Afterward, we are able to avoid the expansion of these shortest paths again if more objects are moving against the query point. Additionally, the more shortest path information we have stored in the SD-tree, the less network expansion will be performed when the query point moves,as we shall see in the following section. In the second case, assume that m 3 moves to e(v 10 ,v 11 )with dist(v 10 ,m 3 )= 0.6, which is shown in Figure 7.9 (a). After we execute Line 12 in Algorithm 11, resultObjs = (m 1 =3.0),(m 2 =4.5), (m 3 =5.1).Sincethe kth object m 3 has a longer distance than the first element v 5 in CV ToExpand, we need to explore on the connecting vertex to ensure the completeness and correctness of the result set. Therefore, we de-queue v 5 from CV ToExpand by invoking Cell-based-expansion in Line 17. After the expansion finishes, v 10 becomes the first object in CV ToExpand, which has a longer distance than m 3 . At this moment, we update the SD-tree with a range of 5.1. Figure 7.9 (b) shows the updated SD-tree. Finally, let us consider the case when the query point updates its position. If the query point moves to a location that is out of the SD-tree monitoring area, we will invoke the initial query result processing. On the other hand, if the query point moves to a position that is inside the SD-tree, we can use the distance information from the SD-tree to improve the efficiency of the query processing. Specifically, we utilize the same technique of C-MNDR as we described in Section 7.2.3. We use the distance of the kth object in resultObjs as the range to rotate, truncate, and 81 m 1 m 2 m 4 1 1 0 0 q d[V 1 ]= 1.5 d[V 2 ]= 0.5 d[V 3 ]= 1.5 d[V 4 ]= 5.5 d[V 5 ]= 5.0 d[V 6 ]= 5.0 d[V 10 ]= 4.5 d[V 9 ]= 3.5 m 3 d[V 12 ]= 7.7 d[V 11 ]= 9.0 d[V 7 ]= 10.5 d[V 8 ]= 10 d[V 14 ]= 10.5 d[V 13 ]= 13.5 q V 2 (0.5) V 1 (1.5) V 5 (5.0) V 9 (3.5) V 3 (1.5) V 10 (4.5) V 6 (5.0) (a) m 3 moves to dist(v 10 ,m 3 )=0.6. (b) The updated SD-tree. Figure 7.9: The change of the SD-tree with regard to object updates in C-MKNN. extend the SD-tree. Once the distances of vertices in the range are updated, we start to retrieve objects in resultCellSet to compute the new set of resultObjs. If we cannot find kNN objects within the range, we will start from the partial vertex with the shortest distance in the SD-tree to explore the network. The algorithm follows the cell-based expansion we described in the previous section, which terminates when kNN objects are found. 7.3.3 Overview of the Continuous kNN Query Processing In this section, we summarize the complete procedure of C-MKNN as shown in Algo- rithm 12. When MOVNet receives a request from q for a continuous kNN query, it launches the initial query result processing (Line 2) as described in Section 7.3.1. Once the processing 82 Algorithm 12 C-MKNN(q, k) 1: /* q is the query object */ 2: Compute-init-cont-NNQuery (q, k) 3: build-SD-tree(q) 4: for each update cycle do 5: if query point position is updated then 6: if query point moves out of SD-tree then 7: Compute-init-cont-NNQuery (q, k) 8: build-SD-tree(q) 9: continue 10: else 11: d = kth resultObjs.dist 12: resultObjs = φ, resultCellSet = φ 13: Update-SD-tree(q, d) 14: resultObjs = Retrieve-objects(resultCellSet) 15: if resultObjs.size = kAND 1st CV ToExpand.dist ≥ kth resultObjs.dist then 16: foundkObjs = true 17: else 18: resultObjs = Cell-based-expansion(CV ToExpand, k, foundkObjs) 19: Update-SD-tree() 20: end if 21: end if 22: else 23: Update-CNN-resultSet() 24: end if 25: end for is finished, we create the corresponding SD-tree based on the connectivity and distance data we have collected (Line 3). Next, at the beginning of each update cycle, MOVNet first examines if the query point submits an update. If the query point moves to a position that is outside of the SD-tree monitoring area, MOVNet invokes initial query result processing and constructs the SD-tree afterwards (Lines 6 - 9). On the other hand, if the query point moves to a position that is still within the area enclosed by the SD-tree, we utilize the SD-tree to expedite the query processing (Lines 11 - 20). Specifically, we use the distance of the kth result object as the range to update the SD-tree. After that, 83 we retrieve objects from the affected cells and generate the updated result set. If we can not find kNN objects within the range from the update query point position, we need to explore the network further to retrieve more objects. This is accomplished by using our cell-based expansion algorithm (Line 18). Finally, in the case that there are only non-query-point updates during an update cycle, Update-CNN-resultSet is executed to update the result set. 84 Chapter 8 Experimental Evaluation To evaluate the performance of MOVNet, we performed extensive simulations on a real data set of a road network with high density. The results indicate that MOVNet achieves good throughput with a wide variety of input data. In Section 8.1 we start by describing the data set used in our simulation and our simulator implementation. After that, experi- mental results of the performance of snapshot query processing on MOVNet are presented in Section 8.2. Later on, we discuss the performance of continuous query processing in MOVNet in Section 8.3. 8.1 Simulator Implementation We obtained a real data set from TIGER/Line [Bur06]. The Los Angeles County (LA) data set has 304,162 road segments distributed over an area of 4,752 square miles. The average length of road segments is 0.1066 miles. For simplicity, we assume that each road segment is bi-directional. Additionally, we used a network simulator [Bri02] to generate the positions of 100,000 moving objects in the road network. The simulator assumes uniform distribution of the objects at initial time. After that, each object follows the 85 random walk model [Fel68] with a maximum speed limit to move in the service space. At each time stamp, a number of moving objects report their updated locations. The ratio of objects that report updates to the total number of POIs is an input parameter of the simulator.. Our design of MNDR adopts the concept of Euclidean restriction [PZMT03], hence in our simulation of snapshot query processing, we leveraged the concept of network expansion from the same work to design baseline algorithms for performance compar- isons. The baseline algorithm for mobile range queries executes as follows. First we retrieve the edge where the query point is located. Next, the closest vertex to the query point is expanded and outgoing edges from this vertex are retrieved. The expansion stops once all vertices whose distances from the query point are less than d have been expanded. After that, for each expanded edge, the overlapping cells are computed and POIs in these cells are retrieved to constitute the result set. Based on the same idea, the baseline algo- rithm for mobile kNN queries has the following steps. First we locate the road segment on which the query point is moving and compute the affected cells. Next, the corresponding moving objects are retrieved from the affected cells. If there are less than k objects in the result set, if the distance from the query point to the closest vertex is less than the distance of the kth object from the query point, the closest vertex is expanded and the outgoing edges from this vertex are retrieved. Afterward, the set of affected cells of the outgoing edges is computed and the corresponding objects are retrieved from the grid index. The vertex expansion process stops when there are k objects in the result set and the distance from the query point to the kth object is no greater than the distance from the query point to the closest un-expanded vertex. To achieve a fair comparison, the 86 baseline algorithms also use the R*-tree as the index structure to facilitate the retrieval of the network data. Additionally, we implemented the design of S-GRID [HJLS07] in Java to compare it with the performance of snapshot kNN query processing in MOVNet. Specifically, we implemented the Vertex-Edge component in S-GRID as an on-disk module. Edges of the network are indexed by an R*-tree. The pre-computed results in S-GRID is stored in memory. In our study of continuous range query processing, we adopted two algorithms to compare with C-MNDR. First, we leveraged our design of MNDR. Since MNDR mainly focuses on snapshot query processing, an overhaul method was used to accommodate the continuous query processing: at the beginning of each update cycle, we invoked MNDR from the query point as if to process a new snapshot range query. Second, we designed a baseline continuous range query processing algorithm. It utilizes the same data structures as C-MNDR. At the beginning of each update cycle, if the query point moved to a new location, the baseline algorithm invoked a new initial query result processing step. Therefore, the baseline algorithm did not use the information in the SD-tree to project the network connectivity with regard to the updated query point location. We studied the simulation results between the baseline algorithm and C-MNDR to obtain the performance improvement produced by the SD-tree during query processing. We implemented a simulator in Java. We arranged the road segments of the LA county data set into a R*-tree index file, in which we set the page size to 4KB. The high- level functionality of our simulator is as follows. For each test case, our simulator creates a service space with the area equal to the LA county size. It then opens the R*-tree 87 index file and uses a buffer for caching the disk pages read by MOVNet with a size of 10 pages. Next, a grid index is created in memory. At the beginning of each test, our simulator acquires the positions of objects from a text file that records the coordinates. For continuous query processing tests, the simulator reads from another file that contains the records of object updates at the beginning of each time stamp. 8.2 Snapshot Query Processing Simulation Results In our experiments of snapshot query processing, the simulator randomly picks a moving object and launches a query from its location. Table 8.1 summarizes the parameters used. In each experimental setting we varied a single parameter and kept the remaining ones at their default values. The experiments measured the CPU time (in milliseconds) and the number of disk page accesses as the performance indicator of the query processing. For each experimental configuration, the simulator executed 1,000 iterations and reported the average result. The simulation was executed on a workstation with 1.5 GB memory and a 3.0 GHz Xeon processor. Table 8.1: Snapshot query processing simulation parameters Parameter Default Value Range Number of POIs 50K 10K - 100K POI distribution Uniform Uniform Number of NNs (k) 50 2 - 128 Radius (miles) 5 2- 10 Number of cells per axis 1K 200 - 1,400 We are first interested in verifying the update costs from POIs in MOVNet. Since we use an in-memory grid index to handle these updates, there is no disk access to the 88 R*-tree index file, which is on secondary storage. Therefore, we measured the CPU time of the update processing. Note that the update and query processing should be finished in one update cycle to ensure the correctness of the query results. We assume that at the beginning of each update period, 20% of the POIs submit their new positions. Figure 8.1 shows that when there are 20,000 updates messages in one period (i.e., 100,000 POIs), MOVNet is able to record these changes in about 200 miliseconds. Additionally, the update cost is proportional to the number of update messages. Therefore, it is possible to estimate the CPU time that is required to process updates. 4000 8000 12000 16000 20000 0 50 100 150 200 CPU Time (milisec) Number of Updates Figure 8.1: The CPU time of update cost as a function of POIs We start by verifying the performance of MNDR. Figure 8.2(a) illustrates the effect of the number of cells with the LA county data set. The results show that MNDR requires less than half of the CPU time compared with the baseline algorithm. Correspondingly, Figure 8.2(b) studies the page accesses of both algorithms. As we can see, the baseline algorithm consumes more than 3,000 page accesses with various cell sizes. In comparison, MNDR requires less than 100 page accesses during query processing. An important observation is that a small number of cells causes the CPU time of MNDR to degrade. On the other hand, the disk access pattern of MNDR is stable with different cell sizes. 89 This can be explained by the fact that a disk access only occurs when we retrieve the road segments from the R*-tree file. Since we use a fixed range in this test, the number of disk accesses is not affected by changing the cell size. However, a larger cell size will result in a larger number of POIs being retrieved from the grid index during query processing. Therefore, the CPU time expended in this portion is larger than with smaller cell sizes. Overall, we conclude that MOVNet scales very well with varying number of cells. Note that with MNDR, the setting of 1,000 cells per axis achieves a stable and optimal performance, hence we set the default number of cells per axis to be 1,000 in our other tests. 2 4 6 8 10 12 14 0 50 100 150 200 250 300 CPU Time (milisec) Number of cells per axis (10 2 ) Baseline MNDR 2 4 6 8 10 12 14 0 500 1000 1500 2000 2500 3000 3500 Disk Access (Pages) Number of cells per axis (10 2 ) Baseline MNDR (a) (b) Figure 8.2: The performance of MNDR as a function of the number of cells Next, Figure 8.3(a) illustrates the effect of the number of POIs on the execution time of MNDR. As we can see, MNDR outperforms the baseline algorithm with various numbers of POIs. In the case of 20K POIs, the CPU time of MNDR is about 30% of the time of the baseline algorithm. Additionally, the output shows that the CPU time increases linearly with the number of POIs, which matches our complexity analysis expectation. 90 The very small gradient of the MNDR output suggests that MOVNet is very scalable to support a very large number of POIs. More importantly, with 100K POIs, the processing time for LA county is about 0.1 seconds. This demonstrates how efficiently MOVNet executes. Figure 8.3(b) plots the number of disk accesses of both algorithms. Similarly to the CPU time outputs, MNDR performs consistently much lower than the baseline algorithm. The I/O cost of MNDR is only about 10% of that of the baseline algorithm. Finally, the changing number of POIs has no effect on the I/O cost in MNDR. This is because POIs are stored in memory in MOVNet. For range queries with a fixed range, the I/O cost results from retrieving the underneath road network, which is also fixed. 20 40 60 80 100 0 100 200 300 400 500 CPU Time (milisec) Number of Objects (10 3 ) Baseline MNDR 20 40 60 80 100 0 500 1000 1500 2000 2500 3000 3500 Disk Access (Pages) Number of Objects (10 3 ) Baseline MNDR (a) (b) Figure 8.3: The performance of MNDR as a function of POIs Figure 8.4(a) plots the CPU time (using a logarithmic scale) versus the query range with the LA county set. The CPU time quadratically increases with a larger range. When the range is 4 miles, MNDR costs 0.076 seconds. Processing a range of 8 miles requires 0.2 seconds by using MNDR compared with 0.65 seconds when using the baseline algorithm. Additionally, MNDR always consumes about 40% of the CPU time compared with the 91 baseline algorithm during query processing. Figure 8.4(b) plots the corresponding page accesses. The output trend matches to the CPU output, as well as our complexity anal- ysis results. Assuming the road network is uniformly distributed, the number of edges grows quadratically with the increase of the range. Since these edges must be retrieved from the R*-tree file during query processing, the performance of MNDR is deteriorating correspondingly. 02468 10 10 100 1k CPU Time (milisec) Range (miles) Baseline MNDR 2468 10 10 100 Disk Access (Pages) Range (miles) Baseline MNDR (a) (b) Figure 8.4: The performance of MNDR as a function of range Next, we are interested in the performance improvement when using Corollary 1 in MNDR. Figure 8.5(a) plots the CPU time when using Corollary 1 to prune the search space in MNDR compared to not using it when handling the LA county data. The performance improvement of using Corollary 1 is about 10% when the range is 6.0 miles and less than 5% when the range is 2.0 miles. We believe this is largely due to the fact that the TIGER/Line data set consists of many very short road segments (0.1066 miles on average). There are only a few cells that overlap with each edge, which implies that there is little chance to prune some cells during query processing. However, the system 92 02468 10 0 50 100 150 200 250 300 350 CPU Time (milisec) Range (miles) W/O Search Space Pruning W/ Search Space Pruning 2468 10 0 20 40 60 80 100 120 140 CPU Time (milisec) Range (miles) W/O Search Space Pruning W/ Search Space Pruning (a) LA county (b) LA county, freeways only Figure 8.5: The CPU time improvement of using Corollary 1 improvement by using Corollary 1 is substantial when it is applied to large road segments. To illustrate this fact, we extracted the freeway segments in LA county (the average length of freeway segments is 2.7127 miles) and performed the simulation on just this network. Figure 8.5(b) shows the results, with query ranges from 2.0 up to 10.0 miles. The results indicate that the improvement of the system throughput by applying Corollary 1 is very noticable. Especially when the range is 6 miles, the system performance achieves a gain of over 30%. Hence we conclude that for a network with long road segments, it is very appealingtouseCorollary 1toprunethesearch space. Next we study the performance of MKNN. Figures 8.6(a) and (b) illustrate the CPU time and disk accesses of MKNN as a function of the number of cells, respectively. One observation is that the throughput of MKNN is relatively stable when the number of cells per axis exceeds 400. This is because when we use a fairly large cell size (e.g., 200 cells per axis), the grid index only provides a very coarse-grained space partitioning. Hence the progressive probe indicates an estimated sub-space with lots of unnecessary road 93 2 4 6 8 10 12 14 40 80 120 160 200 CPU Time (milisec) Number of cells per axis (10 2 ) Baseline MKNN 2 4 6 8 10 12 14 40 80 120 160 200 Disk Access (pages) Number of cells per axis (10 2 ) Baseline MKNN (a) (b) Figure 8.6: The performance of MKNN as a function of the number of cells. segments, which must be retrieved in later steps. However, the performance of MKNN becomes stable when we choose small cell sizes. This is because the progressive probe with these cell sizes results in a more accurate estimated result space. Hence we set the default number of cells per axis to be 1,000 in our other MKNN tests. More importantly, MKNN consistently requires less CPU time as well as disk accesses than the baseline algorithm. In general, MKNN requires less than 50 milliseconds to process a kNN query with k = 50. Correspondingly, the I/O cost of MKNN is less than 50% of that of the baseline algorithm. The results show that by using the progressive probe, we are able to define a candidate result space at the beginning of the query processing, which helps to minimize the I/O cost on retrieving edges from the road network. Figure 8.7(a) plots the performance of MKNN with regard to k. The CPU time grows proportionally with k. More importantly, MKNN outperforms the baseline algorithm. The growth of the CPU time in MKNN is much slower than that of the baseline algorithm as a function of k. MKNN costs less than 80% of the CPU time of the baseline algorithm 94 20 40 60 80 100 120 140 0 100 200 300 400 500 600 CPU Time (milisec) k Baseline MKNN 0 20 40 60 80 100 120 140 0 100 200 300 400 500 Disk Access (pages) k Baseline MKNN (a) (b) Figure 8.7: The performance of MKNN as a function of k. where k = 128. Figure 8.7(b) shows the number of disk accesses of both algorithms. The gradient of the MKNN output is very small, which suggests that with the increase of k, the progressive probe in MKNN significantly avoids excessive I/O operations on the R-tree. Finally, when k = 128, the CPU time in LA county is less than 0.5 seconds. This clearly shows that MOVNet can support a very large value of k. Figure 8.8(a) illustrates the CPU time of MKNN as a function of the number of POIs. The result shows that the CPU time is inversely proportional to the number of POIs, which is what we expect from the theoretical analysis. With a larger number of POIs, the performance of MKNN becomes better. This characteristic ensures that MOVNet is very applicable for use in metro areas. When there are 100K POIs in the service area, processing a kNN query with k = 50 requires only 26 milliseconds. Another important observation is that MKNN has better system throughput than the baseline algorithm with varying numbers of POIs. The improvement ranges from a factor of 4.23 up to 5.80. Figure 8.8(b) shows the disk access counts with regard to both MKNN and the baseline algorithm, which correlates with our CPU time measurement result. For instance, the 95 I/O cost of MKNN is about 30% of that of the baseline algorithm when there are 100K POIs in the system. 20 40 60 80 100 0 100 200 300 400 500 CPU Time (milisec) Number of objects (10 3 ) Baseline MKNN 20 40 60 80 100 0 100 200 300 400 Disk Access (pages) Number of objects (10 3 ) Baseline MKNN (a) (b) Figure 8.8: The performance of MKNN as a function of POIs. Next, we detail the number of iterations when running MKNN during query process- ing. Figure 8.9 (a) shows that the number of iterations of running MKNN with regard to varying numbers of cells in the space. We observe that the results are enclosed in the space from the progress probe in over 45% of the these cases once the number of cells per axis exceeds 1,200. Overall, there are less than 4% of the test cases that require more than two iterations in the MKNN execution. This explains why MKNN consistently out- performs the baseline algorithm: by introducing the progressive probe as the first step in MKNN, MOVNet is able to estimate the result space. Consequently, MOVNet uses this approximate area to retrieve the edges instead of retrieving a few edges at time as in the baseline algorithm. Therefore, I/O costs are significantly reduced in MKNN. Figure 8.9 (b) demonstrates the number of iterations as a function of k, which exhibits the same quality. It is noteworthy to point out that MKNN also records the set of vertices that 96 1 2 3 4 5 0 10 20 30 40 50 60 1 2 3 4 5 0 20 40 60 (a) Number of iterations with regard to number of cells. (b) Number of iterations with regard to k. Figure 8.9: The portion of CPU time used in Algorithm 1 and graph construction. have already been visited during previous iterations, which further reduces the CPU cost during query processing. Since our cell overlapping algorithm and the modeling graph construction are pro- cessed on the fly, we would like to study the overhead of these two modules during query processing. Figures 8.10(a) and (b) plot the percentage of CPU time used in these two parts as a function of the number of cells and range in MNDR, respectively. They show that the cost of computing overlapping cells is less than 5% with varying number of cells. Although the cost of contructing the modeling graph increases with a larger range, the CPU cost is no more than 12% of the CPU time with a 10-mile range. Similarly, Fig- ures 8.10(c) and (d) plot the percentage of CPU time of these two modules as a function of the number of cells and k in MKNN, respectively. The results exhibit similar charac- teristics. Therefore, we note that the overhead of these two online computation parts is a very small portion of the query processing. 97 10 12 14 16 18 20 0 4 8 12 16 20 24 6 8 10 0 4 8 12 16 20 (a) MNDR (b) MNDR 10 12 14 16 18 20 0 4 8 12 16 20 012 3456 78 0 4 8 12 16 20 (c) MKNN (d) MKNN Figure 8.10: The portion of CPU time used in Algorithm 1 and graph construction. 8.3 Continuous Query Processing Simulation Results In this section, we present and discuss the simulation results of continuous query process- ing. Table 8.2 summarizes the parameters used in our simulations. For each experimental setting, we varied a single parameter and kept the remaining ones at their default val- ues. We assume that objects are moving in the area of the LA county. The maximum speed for a moving object during each update cycle is 1/250 of the distance along the x-axis and y-axis, respectively. For each continuous query, the simulator randomly picks a moving object and launches a query from its location. We monitored the change of objects for 20 update cycles. The simulator output the initial query result as well as the 98 updated result set after each time stamp. The experiments measured the CPU time and the number of disk page accesses as the performance metrics of the query processing. For each experimental configuration, the simulator executed 50 iterations and reported the average result. The simulation was executed on a Linux server with 16 GB memory and a 3.0 GHz Xeon processor. Table 8.2: Continuous query processing simulation parameters Parameter Default Value Range Number of POIs 50K 20K - 100K POI distribution Uniform Uniform Number of NNs (k) 50 2 - 128 Radius (miles) 3 1-5 Number of cells per axis 400 100 - 500 Number of queries 2K 500 - 3K Percentage of updated objects 5 2-10 Update cycles 10 10 For a continuous query, the total cost consists of the object update cost (i.e., updating the locations of objects in the grid index), initial query result processing cost, and query update cost. Note that the CPU time for MOVNet to process all continuous queries should be less than one update cycle to ensure the correctness of the query results. Otherwise the query result would become invalid before the system finishes processing during each update cycle. 8.3.1 Object Update Cost in MOVNet We first verified the object update costs in MOVNet comparing to S-GRID. We assume that at the beginning of each update period, 10% of the POIs submit their new posi- tions. Figure 8.11 shows that when there are 10,000 updates messages in one period (i.e., 99 100,000 POIs), MOVNet is able to record these changes in about 4.5 seconds. Addition- ally, the update cost of MOVNet is slightly less than that of S-GRID. This is because both techniques include the map-matching procedure in the object update to record the edge where the object is located. Moreover, S-GRID records an object in a cell if its nearest vertex on edge e belongs to this cell. Therefore, distance computing is performed during an object update in S-GRID. In contrast, MOVNet directly inserts the object into the cell where encloses it, and hence simplifying the update procedure. 20 40 60 80 100 0 1 2 3 4 CPU time (sec) Number of objects in the system(K) MOVNet S-Grid Figure 8.11: The CPU time of the update cost in MOVNet compared to S-GRID 8.3.2 Connecting Vertex Distribution in MOVNet Next, we are interested in studying the relationship between the number of connecting vertices and the cell size. Figure 8.12 (a) demonstrates the statistics of the connecting vertex distribution. First, the number of connecting vertices grows with the increase of the number of cells. Additionally, when MOVNet utilizes 500 × 500 cells, there are over 100K connecting vertices, which is more than 3 times the number with 100 × 100 cells. In contrast, the number of connecting vertices is about 130K when MOVNet has 1000 100 0 200 400 600 800 1000 20.0k 40.0k 60.0k 80.0k 100.0k 120.0k 140.0k Number of CVs Number of Cells per axis Total number of CVs 200 400 600 800 1000 0 10 20 30 40 50 60 70 Number of CVs Number of Cells per axis Average number of CVs per cell (a) Number of CVs in MOVNet. (b) Density of CVs in MOVNet Figure 8.12: The distribution of connecting vertices as a function of the number of cells. × 1000 cells. It shows that when the cell size is relatively small, the growth rate of the number of connecting vertices slows down. As we shall see in the following section, a very large number of connecting vertices can become the bottleneck of the system performance. In Figure 8.12 (b), we demonstrate the density of connecting vertices in MOVNet, which refers to the average number of connecting vertices in a cell. It indicates that the density of the connecting vertices becomes smaller with a larger number of cells in the system. 8.3.3 Performance Study of C-MNDR We are first interested in verifying the performance of the initial query result processing in C-MNDR. Figure 8.13(a) illustrates the effect of the number of cells. The results show that C-MNDR requires about half of the CPU time compared with the MNDR algorithm. This is because with the help of distance tables, C-MNDR improves the efficiency in finding the set of network data in the range before retrieving objects from the grid index. Additionally, the CPU cost of C-MNDR increases with the growth of the number of cells. This suggests that with a larger number of cells, the number of 101 connecting vertices becomes larger, which also increases the system cost. In contrast, Figure 8.13(b) illustrates the page accesses of both algorithms. As we can see, the C- MNDR algorithm consumes more page accesses than MNDR with various cell sizes. This can be explained by the fact that MNDR uses the Euclidean distance restriction as the first step when retrieving network data. Although this is a preliminary estimation in terms of the network that is within the range, it minimizes the I/O cost by performing just one range query. On the other hand, C-MNDR uses cell-based network retrieval, which results in a significant number of I/O operations, especially when there are a large number of cells in MOVNet. 200 400 600 800 1000 0 20 40 60 80 100 CPU time (milisec) Number of cells per axis MNDR C-MNDR 200 400 600 800 1000 300 400 500 600 700 800 900 Number of cells per axis MNDR C-MNDR Disk access (pages) (a) CPU Cost (b) I/O cost Figure 8.13: The performance of initial query result processing in C-MNDR as a function of the number of cells. Next, Figure 8.14(a) illustrates the effect of the number of POIs. As we can see, C-MNDR consumes less than 50% of the CPU time compared with MNDR with various numbers of POIs. In the case of 100K POIs, the CPU time of C-MNDR is about 33 milliseconds. This demonstrates how efficiently C-MNDR executes. Additionally, the 102 output shows that the CPU time increases linearly with the number of POIs. The very small gradient of the C-MNDR as well as MNDR output suggests that MOVNet is very scalable to support a very large number of POIs. Figure 8.14(b) plots the disk accesses of both algorithms. The I/O cost of C-MNDR is about 3 times that of the MNDR algorithm. Moreover, the I/O cost remains stable with the change of the POIs in MOVNet. This is because POIs are managed by the in-memory grid index, hence it has no effect on I/O operations. 20 40 60 80 100 0 20 40 60 80 100 CPU time (milisec) MNDR C-MNDR Number of objects (K) 20 40 60 80 100 100 200 300 400 800 900 MNDR C-MNDR Disk access (pages) Number of objects (K) (a) CPU Cost (b) I/O Cost Figure 8.14: The performance of initial query result processing in C-MNDR as a function of POIs Figure 8.15(a) plots the CPU time of C-MNDR while changing of the range. The CPU time quadratically increases with a larger range. When the range is 4 miles, C- MNDR costs 0.025 seconds. Processing a range of 10 miles requires 0.095 seconds by using C-MNDR compared with 0.257 seconds when using the MNDR algorithm. Addi- tionally, when the range is small, the CPU cost of C-MNDR and MNDR are almost the 103 same. With a larger range, C-MNDR becomes much more faster than MNDR, which in- dicates the advantage of using the connecting vertices and corresponding distance tables. Figure 8.15(b) plots the corresponding page accesses. In contrast to the CPU output, C-MNDR requires more I/O operations than MNDR. For instance, when the range is 10 miles, MNDR consumes less than 1,100 page accesses while C-MNDR needs more than 2,200 page accesses. Moreover, the I/O cost grows quadratically in both algorithms, which shows the same characteristics as the CPU time. 2468 10 0 100 200 300 400 CPU time (milisec) MNDR C-MNDR Range (miles) 2468 10 0 400 800 1200 1600 2000 2400 Disk access (pages) MNDR C-MNDR Range (miles) (a) CPU cost (b) I/O cost Figure 8.15: The performance of initial query result processing in C-MNDR as a function of query range In summary, we conclude that with the help of connecting vertices, the CPU cost of C-MNDR in initial query result processing is lower than with MNDR. Additionally, MNDR has the advantage of using the Euclidean distance restriction to minimize the I/O cost in snapshot query processing. We now move on to studying the query update cost of C-MNDR. Figures 8.16 (a) and (b) show the CPU and I/O costs of C-MNDR with regard to the number of cells, respectively. C-MNDR and the baseline algorithm consume as about 10% of the CPU 104 time of MNDR. Moreover, C-MNDR consistently requires only about 70% CPU time compared with the baseline algorithm. For 2,000 continuous queries in a service space covered by 400× 400 cells, the query update cost is less than 3.5 seconds with C-MNDR. Finally, the savings in I/O cost are tremendous as C-MNDR only requires about 5% the cost of MNDR. Although MNDR consumes much fewer page accesses during the initial query result processing, the update cost is much higher than with C-MNDR. As a continuous query runs longer and longer, we shall see that the cumulative I/O cost of C-MNDR will become much lower than MNDR. We will discuss this characteristics in the following section. 100 200 300 400 500 0 5 40 45 50 CPU time (sec) Number of cells per axis C-MNDR Baseline MNDR 100 200 300 400 500 0.0 50.0k 100.0k 150.0k 200.0k 250.0k 300.0k Disc access (pages) C-MNDR Baseline MNDR Number of cells per axis (a) CPU Cost (b) I/O cost Figure 8.16: The cost of query updates in C-MNDR as a function of the number of cells. Next, Figure 8.17 (a) demonstrates the CPU cost of C-MNDR compared with MNDR with regard to the number of POIs. As we shall see, all algorithms incur larger costs as the number of POIs increases, which is caused by retrieving more objects from the grid index. The small gradient of these algorithms shows that MOVNet scales very well with the growth of POIs. We attribute this feature to our design of using an in-memory grid 105 index to manage the POIs. Additionally, C-MNDR consumes less than 90% CPU time of MNDR. Compared with the baseline algorithm, C-MNDR on average saves 33% of the CPU time. The I/O cost of C-MNDR and the baseline algorithm are only about 5% of that of MNDR (Figure 8.17 (b)), which shows the benefits of using the SD-tree during query processing. 20 40 60 80 100 0 5 40 45 50 CPU time (sec) C-MNDR Baseline MNDR Number of objects (K) 20 40 60 80 100 0.0 50.0k 100.0k 150.0k 200.0k 250.0k 300.0k Disc access (pages) C-MNDR Baseline MNDR Number of objects (K) (a) CPU Cost (b) I/O cost Figure 8.17: The performance of query updates in C-MNDR as a function of the number of POIs. Figure 8.18 shows the effect of the range on query update processing. When the range is 5 miles, MNDR requires over 100 seconds to process 2,000 queries in each update cycle, which might be unacceptable in many application scenarios. In contrast, C-MNDR costs only 6.7 seconds and the baseline algorithm consumes 10 seconds. Correspondingly, the I/O cost of C-MNDR is about 5% of that of MNDR with a 5-mile range. The results indicate that our design is well-suited for large range queries, because it saves more CPU and I/O costs in these cases due to the usage of the SD-tree. 106 12345 1 10 100 CPU time (sec) C-MNDR Baseline MNDR Range (miles) 12345 0 100k 200k 300k 400k 500k 600k 700k Range (miles) Disc access (pages) C-MNDR Baseline MNDR (a) CPU Cost (b) I/O cost Figure 8.18: The performance of query updates in C-MNDR as a function of range. Figure 8.19 plots the effect of the number of queries on the query updates of continuous range query processing. The CPU cost and the I/O cost grow proportionally with an increasing number of queries in MOVNet. The small gradient of C-MNDR compared with MNDR for both CPU and I/O costs indicates that C-MNDR is able to support a very large number of queries at the same time. The more queries exist in MOVNet, the higher a performance improvement can be achieved by using C-MNDR to process continuous queries. Specifically, C-MNDR can support 3,000 queries in 3.6 seconds and 19K page accesses during each cycle when processing the object updates to obtain the updated result set. In contrast, MNDR requires 69 seconds and over 400K page accesses. Additionally, C-MNDR saves about 35% of the CPU time as well as 45% of the page accesses compared with the baseline algorithm with 3,000 queries. We also demonstrate the system performance with various update rates of moving objects in Figures 8.20 (a) and (b). In MNDR, every query needs to update the query result by invoking the complete procedure of MNDR. Therefore, the update rate of moving 107 500 1000 1500 2000 2500 3000 0.1 1 10 CPU time (sec) C-MNDR Baseline MNDR Number of queries 500 1000 1500 2000 2500 3000 0 100k 200k 300k 400k Disc access (pages) C-MNDR Baseline MNDR Number of queries (a) CPU Cost (b) I/O cost Figure 8.19: The performance of query updates in C-MNDR as a function of number of queries. objects does not affect the performance output. On the other hand, with more objects reporting their location updates in one update cycle, C-MNDR requires more CPU time and page accesses. Specifically, the relationship between these two factors is linear. As we can see, the system performance of C-MNDR is much better than that of MNDR, with respect to both CPU time and page accesses. Moreover, the CPU time of C-MNDR is only about 70% of the value of the baseline algorithm. To summarize the performance of C-MNDR, we demonstrate an example of 2,000 continuous range queries running over 10 update cycle in Figure 8.21. The CPU and I/O cost at time stamp 0 in Figure 8.21 represent the initial query result processing. As we can see, C-MNDR requires less CPU time but more page accesses during this step. However, after one update cycle, the total cost of CPU time and page accesses of C- MNDR becomes less than that of MNDR. This is due to the help of the SD-tree, which saves a significant amount of re-computation on the network connectivity and distance 108 2468 10 0 2 4 6 8 10 12 40 45 50 CPU time (sec) C-MNDR Baseline MNDR Object update rate (%) 2468 10 0.0 50.0k 100.0k 150.0k 200.0k 250.0k 300.0k C-MNDR Baseline MNDR Disc access (pages) Object update rate (%) (a) CPU Cost (b) I/O cost Figure 8.20: The performance of query updates in C-MNDR as a function of the percent- age of object updates. information. As the queries continue to run, C-MNDR increasingly outperforms MNDR. Therefore, we conclude that C-MNDR is very efficient in processing continuous range queries. 02468 10 0 100 200 300 400 500 CPU time (sec) Time stamp C-MNDR MNDR 02468 10 0.0 500.0k 1.0M 1.5M 2.0M 2.5M 3.0M Time stamp C-MNDR MNDR Disk access (pages) (a) CPU Cost (b) I/O cost Figure 8.21: The overall performance of C-MNDR. 109 8.3.4 Performance Study of C-MKNN Similar to our study on C-MNDR, we are first interested in verifying the performance of the initial query result processing in C-MKNN. Figure 8.22(a) illustrates the effect of the number of cells. Overall, the results show that C-MKNN requires more CPU time compared with MKNN. This suggests that the progressive probe used in MKNN is more efficient than the distance table in C-MKNN in terms of snapshot query processing. In the example that we present for C-MKNN in Section 7.3.1, some vertices obtain their shortest paths from the query point after several runs of connecting vertex expansions. This affects the efficiency of snapshot query processing while providing the flexibility in continuous query processing. As we shall see, C-MKNN is able to minimize the query update cost so that the overall query cost is smaller than with MKNN in the long run. Additionally, the CPU cost of C-MKNN increases with the growth of the number of cells. It suggests that with a larger number of cells, the number of connecting vertices becomes larger. Hence the network expansion of the cells becomes more time-consuming. Figure 8.22(a) also shows that C-MKNN performs better than S-Grid with various cell sizes. When the cell size is relatively large, the performance of S-Grid becomes even worse than C-MKNN. This is because S-Grid records the distances between each pair of vertex and border point in the same cell. When the cell size is large, there are a very large numbers of vertices in a cell, especially for a dense network. Therefore, the search of the Vertex-Border component becomes cumbersome without the support of any index structure, which affects the system performance. Figure 8.22(b) studies the system output of page accesses. As we can see, the C-MKNN algorithm consumes more page accesses 110 than MKNN with various cell sizes. This can be explained by the fact that MKNN uses the progressive probe as the first step to define an estimated area for the result set. Corresponding network data is retrieved based on this area, which simplifies the I/O operation. On the other hand, C-MKNN uses a cell-based network expansion. The network data retrieval is accomplished cell by cell, which results in a large number of I/O operations. Moreover, the result suggests that the I/O operations grow proportionally with the number of cells. 200 400 600 800 1000 -5 0 5 10 15 20 25 30 CPU time (milisec) MKNN C-MKNN S-Grid Number of cells per axis 200 400 600 800 1000 0 50 100 150 200 250 300 350 Disk access (pages) MKNN C-MKNN Number of cells per axis (a) CPU Cost (b) I/O cost Figure 8.22: The performance of initial query result processing in C-MKNN as a function of the number of cells. Figure 8.23 plots the performance of C-MKNN in initial query result processing with regard to the number of POIs. We make the following observations: First, with a larger number of POIs, the processing cost becomes smaller. This can be explained by the fact that a larger number of POIs means a higher POI density. For a fixed k,the query search area becomes correspondingly smaller, hence the system cost is reduced. Second, MKNN outperforms C-MKNN in both CPU and I/O cost. When there are 20K POIs 111 in the system, processing a continuous kNN snapshot query with k =50inMKNN requires about 35% of the CPU time of C-MKNN. Moreover, the I/O cost for such a query in MKNN consumes about 10% of the page accesses of C-MKNN. Third, C-MKNN consistently performs better that S-Grid. When the number of POIs is 20K, the CPU processing cost in C-MKNN is over 30% less than that of S-Grid. When the number of POIs increases to 100K, the CPU cost in C-MKNN requires 73% of that of S-Grid. 20 40 60 80 100 0 10 20 30 40 CPU time (milisec) MKNN C-MKNN S-Grid Number of objects (K) 20 40 60 80 100 0 50 100 150 200 250 300 350 400 450 Disk access (pages) MKNN C-MKNN Number of objects (K) (a) CPU Cost (b) I/O Cost Figure 8.23: The performance of initial query result processing in C-MKNN as a function of POIs Figure 8.24 demonstrates the effect of k in snapshot query processing by using C- MKNN. It suggests that the CPU time and page accesses grow linearly with the increase of k. Moreover, the performance of MKNN becomes more and more efficient with the growth of k compared with C-MKNN and S-Grid. This can be explained by the fact that with a larger k, more cells need to be expanded in C-MKNN, which involves many more operations on network and for object retrieval. Hence the performance of C-MKNN deteriorates. However, the performance of C-MKNN is more efficient than S-Grid with 112 various settings. For a kNN query with k = 16, C-MKNN requires 3.39 milliseconds while S-Grid needs 4.16 milliseconds. When k grows to be 128, C-MKNN requires only 70% CPU time as compared with S-Grid. This suggests that C-MKNN is more scalable than S-Grid for larger k. 0 20406080 100 120 140 0 20 40 CPU time (milisec) MKNN C-MKNN S-Grid k 0 20406080 100 120 140 0 100 200 300 400 500 Disk access (pages) MKNN C-MKNN k (a) CPU cost (b) I/O cost Figure 8.24: The performance of initial query result processing in C-MKNN as a function of k In summary, we conclude that the CPU cost of C-MKNN in initial query result processing is lower than that of S-Grid. However, the algorithm consumes more CPU time and I/O page accesses than MKNN. Based on the simulation results, we conclude that MKNN is a very efficient solution for snapshot kNN query processing. We now move on to study the query update cost of C-MKNN. Figures 8.25 (a) and (b) show the CPU cost and I/O cost of C-MKNN with regard to the number of cells, respectively. We observe that C-MKNN and the baseline algorithm consume less CPU time than MKNN. When there are 500× 500 cells in MOVNet, C-MKNN requires about 20% CPU time compared with MKNN. Moreover, C-MKNN consistently requires less CPU time than the baseline algorithm. Additionally, a smaller number of cells requires 113 more CPU time in C-MKNN. It is because C-MKNN creates a modeling graph and computes the disance of vertices on the fly. With a larger cell size, on average each cell has more vertices to process, hence this affects the system performance. Finally, in terms of the I/O cost, C-MKNN only requires about 10% of the cost of MKNN. Compared with the baseline algorithm, C-MKNN requires less than 60% of the page accesses of the baseline algorithm in various settings. Our simulation results suggest that MKNN requires less CPU and I/O cost for snapshot query processing. On the other hand, C- MKNN is superb in query update processing. As we shall see in later discussion, after the contiuous kNN query runs for a while, the overall performance of C-MKNN becomes more efficient than MKNN. 100 200 300 400 500 0 2 4 6 8 C-MKNN Baseline MKNN CPU time (sec) Number of cells per axis 100 200 300 400 500 0 10k 20k 30k 40k 50k C-MKNN Baseline MKNN Disk access (pages) Number of cells per axis (a) CPU Cost (b) I/O cost Figure 8.25: The performance of query updates in C-MKNN as a function of the number of cells. Figure 8.26 plots the change of system performance in C-MKNN with regard to the number of POIs in MOVNet. Overall, the system performance of C-MKNN becomes better when there are more POIs in the system. Additionally, C-MKNN is more efficient 114 than both the baseline algorithm and MKNN for query update processing. Specifically, when MOVNet manages 100K POIs, processing 2,000 continuous kNN queries (k = 50) using C-MKNN requires 0.54 seconds, compared with 0.62 seconds for the baseline algorithm, and 3.3 seconds in MKNN. The difference among these algorithm becomes even more distinguishable when there are less POIs in the system. Additionally, the metric of page accesses has the similar properties as that of in CPU cost. 20 40 60 80 100 0 2 4 6 8 10 12 14 CPU time (sec) C-MKNN Baseline MKNN Number of objects (K) 20 40 60 80 100 0.0 20.0k 40.0k 60.0k 80.0k 100.0k Disk access (pages) C-MKNN Baseline MKNN Number of objects (a) CPU Cost (b) I/O cost Figure 8.26: The performance of query updates in C-MKNN as a function of the number of POIs. Figure 8.27 (a) shows that the CPU cost of C-MKNN for query updates grows pro- portionally with the increase of k.When k = 128, the CPU time of MKNN is 8.8 seconds, andthe CPUtime ofC-MKNNis2.8 seconds. Apparently, C-MKNN is more efficient in query update processing with regard to k. Although when k is relatively small, the difference of performance output between C-MKNN and the baseline algorithm is negli- gible. C-MKNN is able to save more CPU and I/O resource as k increases. Specifically, the CPU time of C-MKNN is about 87% of that of the baseline algorithm when k = 128. 115 Correspondingly, the number of page accesses of C-MKNN is about 52% of that of the baseline algorithm with k = 128. Therefore, we conclude that the SD-tree is very scalable in saving CPU time and page accesses with regard to k during query processing. 20 40 60 80 100 120 140 0 2 4 6 8 10 12 CPU time (sec) C-MKNN Baseline MKNN k 0 20 40 60 80 100 120 140 0.0 20.0k 40.0k 60.0k 80.0k 100.0k C-MKNN Baseline MKNN Disk access (pages) k (a) CPU Cost (b) I/O cost Figure 8.27: The performance of query updates in C-MKNN as a function of k. Next, we study the relationship between the system performance of C-MKNN and the number of queries in MOVNet. Figure 8.28 (a) shows that the CPU time grows linearly with an increasing number of queries. When the number of queries is 3,000, the CPU time for C-MKNN to process query updates is 1.2 seconds, compared with 5.8 seconds for MKNN. Additionally, the baseline algorithm consumes 1.6 seconds with the same setting, which requires over 30% more CPU time compared with C-MKNN. Similarly, the I/O cost of C-MKNN shows that C-MKNN is the most efficicent technique to processing query update, which is ploted in Figure 8.28 (b). Overall, the small gradient of the output curves on C-MKNN suggests that it is very scalable in supporting a large number of queries at the same time. 116 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 CPU time (sec) C-MKNN Baseline MKNN Number of queries 500 1000 1500 2000 2500 3000 0 10k 20k 30k 40k 50k 60k 70k Disk access (pages) C-MKNN Baseline MKNN Number of queries (a) CPU Cost (b) I/O cost Figure 8.28: The performance of query updates in C-MKNN as a function of number of queries. Figures 8.28 (a) and (b) show the system performance of C-MKNN with various update rates of moving objects in one update cycle. For our design of MKNN, every query needs to update the query result by invoking the complete procedure of MKNN at the beginning of each update cycle. Therefore, the ratio of object updates has no impact on the system performance in MKNN. On the other hand, with more objects reporting their location updates in one update cycle, C-MKNN requires more CPU time and page accesses to process these updates. As we can see from the results, the relationship between these two factors is linear. Specifically, the system performance in C-MKNN is much better than that of MKNN with various settings. Moreover, the CPU time of C-MNDR is about 70% of that of the baseline algorithm. Our simulation results presented above suggest that the design of MKNN is very efficient for snapshot query processing. In contrast, the query update cost of C-MKNN is much lower than that of MKNN, which proves it to be a better candidate for continuous 117 2468 10 0 1 2 3 4 5 CPU time (sec) C-MKNN Baseline MKNN Object update rate (%) 2468 10 0.0 5.0k 10.0k 15.0k 20.0k 25.0k 30.0k 35.0k 40.0k Disk access (pages) Object update rate (%) C-MKNN Baseline MKNN (a) CPU Cost (b) I/O cost Figure 8.29: The performance of query updates in C-MKNN as a function of percentage of object updates. query processing. To support this conclusion, we use Figure 8.30 to demonstrate the total system cost over 10 update cycles. We assume that there are 2,000 continuous kNN queries in MOVNet with k = 50. The grid index has 400 × 400 cells. As we can see, in update cycle 0, which is when initial query processing is invoked, MKNN outperforms C-MKNN. However, as the queries continue to run, the cost of MKNN grows much faster than C-MKNN. Finally, the overall CPU cost of C-MKNN becomes smaller than that of MKNN after 3 update cycles. Similarly, the overall I/O cost of C-MKNN becomes more efficient than MKNN after 9 update cycles. Users tend to keep continuous queries running for a long time, during which C-MKNN offers more benefits because it requires much less resources to perform query update processing. Therefore, the design of C-MKNN is a scalable approach to support continuous query processing. 118 02468 10 0 5 10 15 20 25 30 35 40 45 Time stamp CPU time (sec) C-MKNN MKNN 02468 10 0.0 50.0k 100.0k 150.0k 200.0k 250.0k 300.0k 350.0k 400.0k Time stamp Disk access (pages) C-MKNN MKNN (a) CPU Cost (b) I/O cost Figure 8.30: The overall performance of C-MKNN. 119 Chapter 9 Conclusion and Future Work Location-based services have generated growing interest in the research community. To cope with location-based query processing on moving POIs in road networks, we present the MOVNet infrastructure. Specifically, we design a dual-index structure that manages the location updates of moving POIs as well as the connectivity of underlying networks. Additionally, we present a cell overlapping algorithm that quickly relates the underlying network and moving objects during query processing. Based on the infrastructure of MOVNet, we propose algorithms to process snapshot and continuous queries in mobile environments. The experimental evaluation suggests that MOVNet is highly efficient in processing these queries with various networks. Specifically, when a user wants to consider the system requirement on query types (range vs. kNN), query duration (snapshot vs. continuous), and performance metrics (CPU cost vs. I/O cost), Figure 9.1 illustrates the most efficient approach for each application scenario in MOVNet. In summary, we conclude our design of mobile query processing in MOVNet as follows. 120 Range Queries kNN Queries Snapshot Continuous Snapshot Continuous CPU CPU CPU CPU IO IO IO IO MNDR C-MNDR MKNN C-MKNN Figure 9.1: Algorithms in MOVNet that are best fit for various system requirements. Snapshot Range Queries Our simulation results showed that the technique of con- necting vertex and the corresponding distance table is very helpful in find all net- work segments within the query range, hence to reduce the CPU cost. On the other hand, the Euclidean distance restriction used in MNDR aims to minimize the I/O operation on retrieving the road network data, which shows superb performance improvement on disk page accesses. Snapshot kNN Queries The technique of progressive probe in MKNN is an efficient approach to estimate the result space. Additionally, by obtaining an estimated result space as the first step of the query processing, we are able to minimize the I/O operations on retrieving the network data. Hence, MKNN is well-suited for snapshot kNN query processing. Continuous Range Queries Our experimental results verified that the CPU cost of C-MNDR outperforms MNDR in both the initial query result processing and the query update processing. Additionally, the I/O cost of C-MNDR during the query update processing is much less than that of MNDR. In the long run, C-MNDR 121 requires much less resources to process a large number of continuous range queries at the same time. Continuous kNN Queries Although C-MKNN requires more CPU and I/O resources during initial query result processing, our simulation results verified that with the help of the SD-tree, the query update cost in C-MKNN is much less than that of MKNN. Overall, C-MKNN is an efficient approach to process continuous k NN queries with regard to both CPU and I/O costs. In the future, we plan to extend our work as follows. Investigation of the network distribution in the space We find that usually the network data is not uniformly distributed in the space. Therefore, for some grid cells, the object lists are always empty because there is no edge overlaps with these cells. Recent techniques, such as presented in [YPK05], have proposed a hierarchical structure to manage skewed data in the grid index in terms of the Euclidean distance. However, there is no existing work in designing a grid index managing moving objects with the constraint of underlying networks. We plan to study the modeling of different network distributions and design an improved grid index data structure to reduce the memory requirement. Integrating traffic information in distance computation Traffic information is a type of dynamic information in location-based services. It is very important to integrate traffic conditions when deploying a location-based service in a metro city due to various traffic events (e.g., traffic jams, car accidents), which indeed affect the travel time of moving POIs in the service region. We plan to leverage the concept 122 of Travel Time Network (TTN) [KZWW05] to efficiently support dynamic traffic information updates and integrate the travel time network in distance computation so that the query result becomes more realistic in metro city applications. 123 Appendix A Distributed Continuous Range Query Processing on Moving Objects A.1 Introduction With the growing popularity of GPS-enabled mobile devices and the advances in wireless technology, the efficient processing of continuous range queries, which is defined as re- trieving the information of moving objects inside a user-defined region and continuously monitoring the change of query results in this region over a certain time period, has been of increasing interest. Continuous range query processing is very important due to its broad application base. For instance, the Department of Transportation may want to monitor the traffic change on a freeway section to develop a traffic control plan. In a natural disaster, it is highly desirable to locate all fire engines within a certain area for emergency response. Continuous range queries pose new challenges to the research com- munity because the movement of objects causes the query results to change correspond- ingly. Applying a central server processing solution where moving objects periodically 124 update their locations is obviously not scalable. On the other hand, the growing com- puting capabilities of mobile devices has enabled approaches such as MobiEyes [GL04] and MQM [CHC04] that use mobile devices to answer continuous range queries, where a centralized server acts as a mediator. However, these solutions suffer from some lim- itations. First, a centralized server is not robust enough under certain situations. In the mentioned example of natural disasters, some servers might be down or only provide limited computational capacity. Therefore, it is highly desirable to have a fault resilient infrastructure. Second, the communication between the server and moving objects should be minimized in order to manage data in large mobile environments. Finally, the memory and computing capabilities of mobile devices are limited so that the implementation of in-memory processing on moving objects needs to be carefully considered. In this paper, we address the problem of processing real-time continuous range queries by proposing a robust and scalable infrastructure. The goal is to build a system that sup- ports a large number of moving objects with limited server and communication resources. In our design, continuous range queries are mainly processed by mobile devices. Our work distinguishes itself from previous work with two contributions. First, we propose a dis- tributed server infrastructure. We introduce the feature of service zones.Aservicezone is a subspace being recursively binary partitioned from the entire service region. Each server controls a service zone. Our system is able to adaptively allocate and merge service zones as servers join or leave. In addition, we propose a novel grid index on continuous range queries. Instead of recording the distribution of queries, our design of grid index preserves the change of the query distribution and is more compact than other grid index structures. Our experimental results show that our design is very efficient to support 125 numerous continuous range queries with a very large number of moving objects under the mobile environments. The rest of this appendix is organized as follows. The related work is described in Section A.2. In Section A.3 we introduce the design of service zones, grid index, and the support of continuous range query processing. The experimental validation of our design is presented in Section A.4. Finally, we discuss the conclusions and future work in Section A.5. A.2 Related Work A number of studies have addressed continuous spatial queries. Related work, such as presented in [SR01], [TPS02], and [ZZP + 03], addressed the processing of continuous spa- tial queries on the server. For the efficient processing of a large number of continuous queries at the same time, Prabhakar et al. [PXK + 02] addressed the issue of stationary continuous queries in a centralized environment. In addition, Mokbel et al. [MXA04] pro- posed SINA that supports moving queries over moving objects in server-based processing. By contrast, MQM [CHC04], and MobiEyes [GL04] assume a distributed environment, where the mobile hosts have the computing capability to process continuous queries. A centralized server is introduced by both approaches to work as a mediator coordinating the query processing. In MQM, the concept of resident domain is introduced as a sub- space surrounding the moving object to process continuous queries. Continuous queries are partitioned into a set of monitor regions, where only the monitor regions covered by 126 the resident domain will be sent to the moving object. However, partitioning continuous queries is inefficient because it increases the number of queries in the system. On the issue of moving object indexing, the TPR-Tree [SJLL00] and its variants have been proposed to index moving objects with trajectories. However, the support of continuous queries by these methods is very inefficient. Kalashnikov et al. [KPHA02] eval- uated the efficiency of indexing moving objects and concluded that using a grid approach for query indexing results in the best performance. Other methods to process continuous queries without a specific index can be found such as the usage of validity regions [TPS02], safe regions [HXL05], safe periods [MXA04], and No-Action regions [XMA + 04]. These approaches have in common that they return a valid time or region of the answer. Once the result becomes invalid, the client submits the query for reevaluation. Our work distinguishes itself from the above approaches, by specifically addressing the scalability and robustness of the system. We adaptively organize servers to cooperatively work in the entire service space. Furthermore, we propose a grid index that is able to be implemented as an in-memory data structure on mobile devices. There is no restriction on the movement of objects and the system is extremely efficient to support continuous range queries with a very large number of moving objects. A.3 System Design and Components A.3.1 System Infrastructure and Assumptions Figure A.1 illustrates the system infrastructure of our design. We are considering mobile hosts with abundant power capacity, such as vehicles, that are equipped with a Global 127 Figure A.1: The system infrastructure Positioning System (GPS) for obtaining continuous position information. We assume that the mobile host has some memory and computing capacity to store the queries and process range query operations. In our paper, we use the term moving objects to refer to these mobile hosts participated in the query processing. On the base-station side, our design has two assumptions. First, the servers and moving objects communicate via cellular-based wireless network. Moreover, protocols such as GeoCast [NI97] can be adopted for sending messages within a certain region. Second, the servers with spatial databases are connected via the wired Internet infrastructure. Each server is able to receive query requests from any user and forward them to the appropriate servers. In our design, moving objects are represented as points and range queries are denoted as rectangular regions. Given a set of moving objects and continuous range queries, the challenge is to calculate which objects lie within which query regions at a certain time. In this paper, we focus on range queries, which are widely used in spatial applications and can be used as preprocessing tools for other queries, such as nearest neighbor queries. For simplicity, we use the term queries to refer to continuous range queries in the following sections. 128 A 01 E 00 C 10 D 111 B 11000 F 1101 G 11001 0 1 0 0 0 0 1 1 1 1 Server E Server A Server C Server D Server F 0 1 Server B Server G (a) (b) Figure A.2: An example of the system with 7 servers and their service zone identifier (SID) tree. A.3.2 Server Design In this section, we describe our design of the server infrastructure. First, we describe how the system adaptively manages the service region by adapting to a time-varying set of servers through the concept of the service zone. Next, we present another important feature, the grid index. By using the grid index, our system avoids excessive query retrieval from the server and significantly reduces the communication overhead. A.3.2.1 Service Zones We leverages the design of Content Addressable Network (CAN) [Rat02] to dynamically partition the entire service region into a set of subspaces. Each subspace is controlled by a server. We define the term service zone as the subspace controlled by a server. Each service zone is addressed with a service zone identifier (SID), which is calculated from the location of the service zone. Figure A.2a shows an example of the entire service 129 region partitioned into 7 service zones. The service zone partitioning is a binary partition approach that always equally divides a larger service zone into two smaller child service zones. Hence the corresponding SID address for service zones can be represented with a binary tree structure as shown in Figure A.2b. Each server maintains a routing table with tuples SID, address storing the routing information of its neighbor servers. By using the same routing mechanism as CAN, our system is able to allocate any service zone with complexity of O(nlogn) in a system of n servers. When a new query q is submitted, the system first forwards it to all servers covered by its query region through the M-CAN multicast algorithm from the design of CAN. When a server receives the query, it is inserted into the query repository. Consequently, the grid index on the server is updated. We will describe the details of the grid index in the next section. Finally, the server broadcasts a message GridIndexUpdate(G Index )to all moving objects associated with it, where G Index is the updated grid index. When a query q is about to be deleted, the server searches through its repository to delete the corresponding entry. Consequently, the server updates the grid index. When a new server joins the system, several steps must be taken to allocate a service zone for it. First, the new server must find a bootstrap server, which is already a member of the system. Second, the bootstrap server broadcasts a message that a new server is about to join the system. Other servers in the system reply back with the information of its current system load and service zone. Our goal is the balance the system load among servers. Hence the server with the highest system load (for instance, average used disk space, average memory usage, or other user identified resources) will be performed a partition to divide the corresponding service zone into halves. Next, the bootstrap 130 server sends a message to the partitioned server to forward queries overlapping the new server’s service zone. The partitioned server also broadcasts the updated service zone information to moving objects associated with it. Moving objects register with the new server if their current locations are controlled by the new server. After the new server receives queries forwarded from the partitioned server, it creates and maintains the grid index correspondingly. Finally, the neighbors of the partitioned server will be notified to update their routing tables. When a server leaves the system, we need to ensure that the corresponding service zone is taken over by the remaining servers. The departing server explicitly hands over its repository of moving objects and queries to one of its neighbors whose service zone can be merged with the departing servers zone to produce a valid single service zone. A.3.2.2 Grid Index The memory capacity of moving objects is limited. On the other hand, it is highly desirable to have an index structure that helps moving objects to retrieve queries from the server only when they are very close to the queries. Therefore, the index also needs to be compact in terms of the size to be used on moving objects. Here we present a grid index structure fulfilling these requirements. Previous work of grid index structures, such as [KPHA02] and [GL04], aims at random data access by recording the distribution of queries. However, objects move continuously along a trajectory in mobile environments, therefore queries in large parts of the service region can be pruned and no random access is needed. Based on this observation, our 131 (a) A 7 service zone exam- ple (b) A 64 grid cell example (c) Service zones and grid cells overlapping Figure A.3: Service zones and grid cells. C1 Q1 Q2 Grid Index Examples: C1: {{+Q2}, {-Q1}} C2 C3 C2: { ĭ , ĭ } C3: {{-Q3}, {-Q1}} Q3 o C1 {-Q1} {+Q2} ĭ {+Q3} (a) (b) Figure A.4: The grid index. grid index preserves the difference of the query distribution that can be efficiently used for continuous query processing. The basis of our grid index is a set of cells. Each cell is a region of space obtained by partitioning the entire service region using a uniform order. Figure A.3a demonstrates a system with 7 servers. Figure A.3b shows the entire service region divided into 64 grid cells. Figure A.3c shows how these grid cells are distributed on the example servers. By using a uniform grid order to partition the service region into grid cells, given the coordinates of an object, it is easy to calculate the cell in which the object resides. 132 The server maintains the grid index in its service zone. For each cell, the grid index structure consists of two lists identified as right,and lower that record the change of the query distribution from the right and lower neighbor cells, respectively. In the example shown in Figure A.4a, a service zone is divided into 16 grid cells. Cell C1, C2, and C3 are partially covered by a query Q1. There are queries Q2and Q3 covering the right and left neighbor cells of C1, repectively. As shown in Figure A.4a, the grid index for cell C1, C2, and C3is {{+Q2}, {−Q1}},{∅,∅},and {{−Q3}, {−Q1}}, respectively. l l - x x x d Set0 Set1 Set1 Set2 Figure A.5: Number of Grid Index Entries Analysis Once a moving object is associated with a server, the server will forward the grid index of its service zone to the moving object. By using the grid index stored in its local memory the moving object is able to forecast the query locations with a refined granularity. As an example shown in Figure A.4b, if there is a moving object o in the cell C1 is about to move across the right edge of C1, the right list of C1is {+Q2}. Hence the object submits a request to retrieve the query Q2 from the server. If the object is about to cross the lower edge of C1, since the lower list of C1is {−Q1}. The object 133 could either to retain the information of query Q1 if there is enough memory or remove Q1 if more memory is needed for query processing. If the object is about to move across the upper edge of C1, the lower list of the upper neighbor cell will be retrieved and the values in the list will be inversed. In this example, the object retrieves the lower list of C2 and calculate the inverse value, which is∅. This indicates that there is no query that needs to be retrieved from the server. When the object is about to move across the left edge of C1, a similar process will be performed on the right list of the left neighbor cell (i.e., C3). In this example, the inverse value of the list is {−Q3}. Therefore, the moving object submits a request to retrieve the query Q3 from the server. To study the impact of our design on the index size, let us assume the shape of queries and grid cells are square and the length of each side of a query Q is q.Let c denote the side of each grid cell with q>c.Then q can be represented as i× c + x,where x⊂ [0,c) and i is an integer. Without loss of generality let us consider the case where the top-left corner of query q is located somewhere within the top-left grid cell of the system as shown in Fig A.5. It can be verified that if the top-left corner of Q is inside Set0 it will result in 4(i + 1) index entries. For Set1 the number of index entries is 2(i+1)+2(i + 2), and for Set2itis4(i + 2). Assuming uniform distribution of queries, on the average Q results in 4(q +c)/c index entries. On the other hand, recording the distribution of queries requires (q + c) 2 /c 2 index entries on each Q [WZK05]. For all q/c≥ 3, our approach requires less index entries than recording the distribution of queries on the grid. 134 A.3.3 Query Processing on Moving Objects In this section, we describe the functionality of the moving objects. In our design, the following information is stored in the memory of moving objects for query processing: • OID: the unique identifier of the moving object. • currentPos: the current location of the moving object. • G Index : the grid index of the current service zone covering the moving object. • Queries: the list of queries received from the server. Notation Definition RegisterObject(OID) The message to register a moving object. UnregisterObject(OID) The message to delete a moving object. UpdateResult(OID,QID,Flag) The message to update a query result. RequestQueries(OID,Q List ) The message to retrieve a set of queries. GridIndexUpdate(G Index ) The grid index update broadcasted by the server. Table A.1: Message types in query processing. In order to implement the query processing mechanism on the mobile object, a set of messages is defined as shown in Table A.1. A moving object is associated with a server at all times. When a moving object turns its power on, it broadcasts a message RegisterObject(OID). The server monitoring the location of the object inserts it into the object repository and replies back with a GridIndexUpdate(G Index ) message. The server also sends the set of queries covering the current grid cell of the moving object. When a moving object is about to leave its current service zone, it sends a message UnregisterObject(OID) to the server. The server deletes the moving object from its 135 repository and sends back a set of tuplesSID, address from its routing table identifying adjacent service zones. The moving object sends a RegisterObject(OID) message to the server controlling the zone it is entering. When a moving object is about to move into a new grid cell, it consults the grid index as described in the previous section. If there are queries in the grid index needing to be retrieved, the moving object sends a message RequestQueries(OID,Q List )totheserver, where Q List is a list of queries with query identifier QID. Once the server receives the message, it will send the corresponding queries to the moving object. At all times, the moving object checks its current location currentPos against the queries in the Queries list. If the object moves into or moves out of a query, it sends a message UpdateResult(OID,QID,Flag) to the server, where QID is the query identifier and Flag indicates whether the object resides in the query region. Query processing on moving objects enables real time updates to the query result while reducing the cost of server processing substantially. We study the impact of our techniques in experiments and show that the results match our analytical expectation. A.4 Experimental Evaluation In this section we describe the experimental verification of our design. There are three metrics of interest extensively studied in our simulations. First, the number of grid index entries is measured as the average number of index entries generated on a server and forwarded to moving objects associated with it. This measure indicates the efficiency of our grid index design and whether the grid index can be used for in-memory processing 136 on moving objects. Second, the server communication cost is measured as the average number of messages transmitted from servers to the moving objects. More specifically, the server communication cost consists of the registration messages, which are generated when a moving object enters or leaves a service zone, and the query retrieval messages, which are generated when a server receives a RequestQueries message from a moving object. This metric implies whether the server may become a bottleneck in the system. Finally, the mobile communication cost is measured as the total number of messages transmitted from moving objects to servers. The mobile communication cost also con- sists of registration messages and query retrieval messages. Additionally, query update messages are generated by moving objects when they enter or leave a query region. This measure reflects the prime query processing cost and hence is important to demonstrate the scalability of our system. A.4.1 Simulator Implementation We implemented a prototype simulator that is structured into three main components: the service zone generator,the object and query loader,andthe performance monitor. The service zone generator creates a virtual square space with a 100km × 100km dimension. In the experiments, we partition the space into 64 service zones. Each service zone is identified by a SID representing a server. In the next step, the object and query loader generates moving objects and imports continuous range queries into the system. We use the random walk model to simulate the movement of objects. Initially 10,000 objects are uniformly distributed in the space. Each of them moves with a constant velocity, which is randomly selected in the range from 137 2000 4000 6000 8000 10000 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000 220000 2000 4000 6000 8000 10000 0 400 800 1200 1600 2000 2400 2800 3200 3600 (a) The total number index en- tries in the system (b) The average number of in- dex entries on each server Figure A.6: The number of grid index entries as a function of the number of queries. 10m to 20m per second, for a duration that is exponentially distributed with mean value equal to 100 seconds. We also generated two sets of rectangular regions as continuous range queries that are uniformly distributed in the space with an average area size of 1% and 10% of the plane size, respectively. After the objects and queries are loaded into the system, the performance monitor generates the grid index for each server with 256 grid cells partitioning the entire service space. Each simulation runs for 5,000 seconds and the performance monitor reports the number of grid index entries, the server communication cost, and the mobile communica- tion cost. Currently, our simulation is focused on the system performance in the steady state, i.e., we do not add any more queries when the objects start to move. We plan to implement a dynamic simulation environment in the future. A.4.2 Simulation Results We were first interested in the efficiency of our grid index in terms of the size. Figure A.6a plots the total number of grid index entries in the system as a function of the number of queries. The results clearly show that the total number of grid index entries increases 138 2000 4000 6000 8000 10000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2000 4000 6000 8000 10000 0.002 0.004 0.006 0.008 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (a) The server communication cost (b) The mobile communica- tion cost Figure A.7: The server and mobile communication cost as functions of the number of queries. linearly with the number of queries. Additionally, our grid index structure performs more efficiently with a larger average query size. With an average query size equal to 10% of the entire space, our grid index only doubles the number of entries compared with the case when the average query size equals 1% of the space. This behavior corroborates our analytical results described in Section A.3. Furthurmore, the absolute size of the grid index is very small. If we use 16 bytes to identify a query, it only takes 3.35 MB to represent 10,000 queries with an average size of 10% of the space. Figure A.6b shows the benefit of using a distributed infrastructure on the server side that further reduces the size of the grid index on each server. In the case of 10,000 queries with an average size of 10% of the space, on average the size of index entries is 54 KB on each server. This substantially reduces the requirement of memory on moving objects. Figures A.7a illustrates the average communication cost on each server with the set of queries with an average area size of 10% of the plane. As a general trend, the number of query retrieval messages increases with the number of queries. Intuitively, with a larger number of queries, the possibility for objects to retrieve query information from the server 139 is larger. More importantly, the server communication cost is small in our simulation results. With 10,000 queries and 10,000 objects in the system, the server communication cost is about 1 message per second, which demonstrates that our server infrastructure is very scalable and suitable for mobile environments. Figure A.7b demonstrates the mobile communication cost with respect to the number of queries. It shows that the query update messages are the primary cost of mobile communication cost. However, with 10,000 queries, the object query update message count on each object is about 0.7 per second. Assuming the size of query update message is 32 byte, the average message size transmitted from each object is about 22 bytes/second. Therefore, our design on the mobile object side is very scalable. A.5 Conclusions and Future Directions Continuous range queries have generated intense interest in the research community be- cause the advances in GPS devices is enabling new applications. We have presented a novel system that utilizes the computing capability of moving objects for continuous range query processing. Our design of service zones and a grid index is able to provide accurate real time query results for a very large number of moving objects and queries. In the future, we intend to study the communication costs so that the size of the grid can be optimized with regard to the query distribution. Moreover, a dynamic grid index retrieval from the server with respect to the memory capacity on moving objects is worth exploring. 140 Reference [AA01] Ashraf Aboulnaga and Walid G. Aref. Window Query Processing in Linear Quadtrees. Distributed and Parallel Databases, 10(2):111–126, 2001. [Adm] Federal Aviation Administration. The Wide Area Augmentation System (WAAS) Standard. URL: http://gps.faa.gov/Programs/WAAS/waas.htm. [BKSS90] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 322–331, 1990. [Bre65] Jack Bresenham. Algorithm for computer control of a digital plotter. IBM Systems Journal, 4(1):25–30, 1965. [Bri02] Thomas Brinkhoff. A framework for generating network-based moving ob- jects. GeoInformatica, 6(2):153–180, 2002. [Bur06] U.S. Census Bureau. The TIGER/Line Data Set, Second Edition, 2006. http://www.census.gov/geo/www/tiger/. [CAA03] Hae Don Chon, Divyakant Agrawal, and Amr El Abbadi. Range and kNN query processing for moving objects in grid model. MONET, 8(4), 2003. [Can07] Canalys.com. US GPS Navigation Market Picks Up Speed, March 2007. URL: http://www.canalys.com/pr/2007/r2007031.htm. [CC05] Hyung-Ju Cho and Chin-Wan Chung. An efficient and scalable approach to CNN Queries in a Road Network. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 865–876, 2005. [CHC04] Y. Cai, K. Hua, and G. Cao. Processing Range-Monitoring Queries on Het- erogeneous Mobile Objects. In Proceedings 5th IEEE International Confer- ence on Mobile Data Management (MDM), pages 27–38, 2004. [Cla83] K. L. Clarkson. Fast Algorithm for the All Nearest Neighbors Problem. In Proceedings of Annual Symposium on Foundations of Computer Science (FOCS), pages 226–232, 1983. [Com96] Federal Communications Commission. FCC 94-102, July 1996. URL: http://www.fcc.gov/Bureaus/Wireless/Orders/1996/fcc96264.txt. 141 [Dij59] E. Dijkstra. A note on two problems in connection with graphs. Numeriche Mathematik, 1, 1959. [FB74] R.A. Finkel and J.L. Bentley. Quadtree: A data structure for retrieval on composite keys. ACTA Informatica, 1974. [Fel68] William Feller. An Introduction to Probability Theory and its Applications, Vol. 1. Wiley, 1968. [FN75] Keinosuke Fukunaga and Patrenahalli M. Narendra. A Branch and Bound Algorithms for Computing k-nearest Neighbors. IEEE Transaction on Com- puters, 24(7):750–753, 1975. [Fre03] Elias Frentzos. Indexing objects moving on fixed networks. In Proceedings of Advances in Spatial and Temporal Databases, 8th International Symposium (SSTD), pages 289–305, 2003. [GL04] Bugra Gedik and Ling Liu. MobiEyes: Distributed Processing of Continu- ously Moving Queries on Moving Objects in a Mobile System. In Proceedings of 9th International Conference on Extending Database Technology (EDBT), pages 67–87, 2004. [Gut84] Antomn Guttman. R-Trees: A Dynamic Index Structure for Spatial Search- ing. In Proceedings of the ACM SIGMOD International Conference on Man- agement of Data, pages 47–57, 1984. [HJLS07] Xuegang Huang, Christian S. Jensen, Hua Lu, and Simonas Saltenis. S- GRID: A Versatile Approach to Efficient Query Processing in Spatial Net- works. In SSTD, 2007. [HJS05] Xuegang Huang, Christian S. Jensen, and Simonas Saltenis. The Islands Ap- proach to Nearest Neighbor Querying in Spatial Networks. In Proceedings of Advances in Spatial and Temporal Databases, 9th International Symposium (SSTD), pages 73–90, 2005. [HLL06] Haibo Hu, Dik Lun Lee, and Victor C. S. Lee. Distance Indexing on Road Networks. In Proceedings of the 32nd International Conference on Very Large Data Bases, pages 894–905, 2006. [HS99] G´ ısli R. Hjaltason and Hanan Samet. Distance Browsing in Spatial Databases. ACM Transaction on Database Systems, 24(2):265–318, 1999. [HWLC94] B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins. Global Positioning System: Theory and Practice. Springer-Verlag, 1994. [HXL05] Haibo Hu, Jianliang Xu, and Dik Lun Lee. A Generic Framework for Mon- itoring Continuous Spatial Queries over Moving Objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 479–490, 2005. 142 [JKPT03] Christian S. Jensen, Jan Kol´ arvr, Torben Bach Pedersen, and Igor Timko. Nearest Neighbor Queries in Road Networks. In Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems (ACM GIS), pages 1–8, 2003. [JLO04] Christian S. Jensen, Dan Lin, and Beng Chin Ooi. Query and Update Effi- cient B+-Tree Based Indexing of Moving Objects. In Proceedings of the 30th International Conference on Very Large Data Bases, pages 768–779, 2004. [Ken02] M. Kennedy. The Global Positioning System and GIS: An Introduction, 2nd Edition. Taylor and Francis, 2002. [KLL02] Dongseop Kwon, Sangjun Lee, and Sukho Lee. Indexing the Current Posi- tions of Moving Objects Using the Lazy Update R-tree. In Proceedings of the 3rd International Conference on Mobile Data Management(MDM), pages 113–120, 2002. [KPHA02] Dmitri V. Kalashnikov, Sunil Prabhakar, Susanne E. Hambrusch, and Walid G. Aref. Efficient Evaluation of Continuous Range Queries on Moving Objects. In Proceedings of the 13th International Conference in Database and Expert Systems Applications (DEXA), pages 731–740, 2002. [KS04a] Mohammad R. Kolahdouzan and Cyrus Shahabi. Continuous K-Nearest Neighbor Queries in Spatial Network Databases. In Proceedings of the 2nd International Workshop on Spatio-Temporal Database Management (STDBM), pages 33–40, 2004. [KS04b] Mohammad R. Kolahdouzan and Cyrus Shahabi. Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases. In Proceedings of the 30th International Conference on Very Large Data Bases, pages 840–851, 2004. [KSF + 96] Flip Korn, Nikolaos Sidiropoulos, Christos Faloutsos, Eliot Siegel, and Zenon Protopapas. Fast Nearest Neighbor Search in Medical Image Databases. In Proceedings of the 22th International Conference on Very Large Data Bases, pages 215–226, 1996. [KZWW05] Wei-Shinn Ku, Roger Zimmermann, Haojun Wang, and Chi-Ngai Wan. Adaptive Nearest Neighbor Queries in Travel Time Networks. In 13th ACM International Workshop on Geographic Information Systems (ACM-GIS), pages 210–219, 2005. [Lan] R. B. Langley. The Orbits of GPS Satellites. GPS World, 2(3):50 – 53. [LHJ + 03] Mong-Li Lee, Wynne Hsu, Christian S. Jensen, Bin Cui, and Keng Lik Teo. Supporting Frequent Updates in R-Trees: A Bottom-Up Approach. In Pro- ceedings of the 29th International Conference on Very Large Data Bases, pages 608–619, 2003. [LWE + 95] R. Loh, V. Wullschleger, B. Elrod, M. Lage, and F. Haas. The U. S. Wide- area Augmentation System (WAAS). Navigation, 42(3), 1995. 143 [MHP05] Kyriakos Mouratidis, Marios Hadjieleftheriou, and Dimitris Papadias. Con- ceptual partitioning: An efficient method for continuous nearest neighbor monitoring. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 634–645, 2005. [MXA04] Mohamed F. Mokbel, Xiaopeng Xiong, and Walid G. Aref. SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases. In Proceedings of the ACM SIGMOD International Conference on Manage- ment of Data, pages 623–634, 2004. [MYPM06] Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, and Nikos Mamoulis. Continuous nearest neighbor monitoring in road networks. In Proceedings of the 32th International Conference on Very Large Data Bases, pages 43–54, 2006. [NI97] Julio C. Navas and Tomasz Imielinski. GeoCast - Geographic Addressing and Routing. In International Conference on Mobile Computing and Networking, 1997. [N¨ uv07] N¨ uvi. Garmin, inc, 2007. URL:http://www.garmin.com. [OBSC00] A. Okabe, B. Boots, K. Sugihara, and S. N. Chiu, editors. Spatial Tessel- lations, Concepts and Applications of Voronoi Diagrams”. John Wiley and Sons Ltd., 2000. [PCC04] Jignesh M. Patel, Yun Chen, and V. Prasad Chakka. STRIPES: An Effi- cient Index for Predicted Trajectories. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 637–646, 2004. [PXK + 02] S. Prabhakar, Y. Xia, D. Kalashnikov, Walid G. Aref, and S. Hambr- usch. Query Indexing and Velocity Constrained Indexing: Scalable Tech- niques for Continuous Queries on Moving Objects. IEEE Trans. Computers, 51(10):1124–1140, 2002. [PZMT03] Dimitris Papadias, Jun Zhang, Nikos Mamoulis, and Yufei Tao. Query Pro- cessing in Spatial Network Databases. In Proceedings of the 29th Interna- tional Conference on Very Large Data Bases, pages 802–813, 2003. [Qua] Qualcomm. gpsOne. URL: http://www.qualcomm.com. [Rat02] Sylvia Ratnasamy. A Scalable Content-Addressable Network. In Ph.D. Dis- sertation University of California Berkeley, 2002. [RKV95] Nick Roussopoulos, Stephen Kelley, and Fr´ ed´ eric Vincent. Nearest Neighbor Queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 71–79, 1995. [SAS05] Jagan Sankaranarayanan, Houman Alborzi, and Hanan Samet. Efficient Query Processing on Spatial Networks. In Proceedings of 13th ACM Inter- national Workshop on Geographic Information Systems (ACM-GIS), pages 200–209, 2005. 144 [SJLL00] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. Indexing the Positions of Continuously Moving Objects. In Pro- ceedings of the ACM SIGMOD International Conference on Management of Data, pages 331–342, 2000. [SK98] Thomas Seidl and Hans-Peter Kriegel. Optimal Multi-Step k-Nearest Neigh- bor Search. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 154–165, 1998. [SR01] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for Moving Query Point. In Proceedings of Advances in Spatial and Temporal Databases, 7th International Symposium (SSTD), pages 79–96, 2001. [SRF87] Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In Proceedings of the 13th International Conference on Very Large Data Bases, pages 507–518, 1987. [SSA08] Hanan Samet, Jagan Sankaranarayanan, and Houman Alborzi. Scalable Net- work Distance Browsing in Spatial Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 43–54, 2008. [SWCD97] A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao. Modeling and Querying Moving Objects. In Proceedings of the 13th International Conference on Data Engineering, pages 422–432, 1997. [SY03] Shashi Shekhar and Jin Soung Yoo. Processing In-Route Nearest Neighbor Queries: A Comparison of Alternative Approaches. In Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems (ACM GIS), pages 9–16, 2003. [TP02] Yufei Tao and Dimitris Papadias. Time-Parameterized Queries in Spatio- Temporal Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 334–345, 2002. [TPS02] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In Proceedings of the 29th International Conference on Very Large Data Bases, pages 287–298, 2002. [TPS03] Yufei Tao, Dimitris Papadias, and Jimeng Sun. The TPR*-Tree: An Opti- mized Spatio-Temporal Access Method for Predictive Queries. In Proceed- ings of the 29th International Conference on Very Large Data Bases, pages 790–801, 2003. [TUW98] Jamel Tayeb, ¨ Ozg¨ ur Ulusoy, and Ouri Wolfson. A quadtree-Based Dynamic Attribute Indexing Method. The Computer Journal, 41(3), 1998. [WiM] WiMax. http://www.wimaxforum.org/. 145 [WZK05] Haojun Wang, Roger Zimmermann, and Wei-Shinn Ku. ASPEN: An Adap- tive Spatial Peer-to-Peer Network. In Proceedings of 13th ACM International Workshop on Geographic Information Systems (ACM-GIS), pages 230–239, 2005. [XMA + 04] Xiaopeng Xiong, Mohamed F. Mokbel, Walid G. Aref, Susanne E. Hambr- usch, and Sunil Prabhakar. Scalable Spatio-temporal Continuous Query Pro- cessing for Location-aware services. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), pages 317 – 328, 2004. [XMA05] Xiaopeng Xiong, Mohamed F. Mokbel, and Walid G. Aref. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio- temporal Databases. In Proceedings of the 21st International Conference on Data Engineering (ICDE), pages 643–654, 2005. [XMA06] Xiaopeng Xiong, Mohamed F. Mokbel, and Walid G. Aref. LUGrid: Update- tolerant Grid-based Indexing for Moving Objects. In Proceedings of 7th International Conference on Mobile Data Management (MDM), page 13, 2006. [YPK05] Xiaohui Yu, Ken Q. Pu, and Nick Koudas. Monitoring k-nearest neighbor queries over moving objects. In Proceedings of the 21st International Con- ference on Data Engineering (ICDE), pages 631–642, 2005. [ZL01] Baihua Zheng and Dik Lun Lee. Semantic Caching in Location-Dependent Query Processing. In Proceedings of Advances in Spatial and Temporal Databases, 7th International Symposium (SSTD), pages 97–116, 2001. [ZZP + 03] Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei Tao, and Dik Lun Lee. Location-based Spatial Queries. In Proceedings of the ACM SIGMOD Inter- national Conference on Management of Data, pages 443–454, 2003. 146
Abstract (if available)
Abstract
Recently, the usage of various GPS devices (e.g., car-based navigation systems and handheld PDAs) has become very popular in many - especially urban - areas. More and more people carry these devices on the go. Moreover, since wireless connectivity is embedded into these devices, the end users are able to report their positions when experiencing the service and hence themselves become Points Of Interest (POIs) during query processing. With the rapidly growing number of users willing to subscribe to various location-based services, designing novel systems that support a very large number of users is raising intense interest in the research community.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Efficient updates for continuous queries over moving objects
PDF
Location-based spatial queries in mobile environments
PDF
Ensuring query integrity for sptial data in the cloud
PDF
Privacy in location-based applications: going beyond K-anonymity, cloaking and anonymizers
PDF
Generalized optimal location planning
PDF
Gradient-based active query routing in wireless sensor networks
PDF
WOLAP: wavelet-based on-line analytical processing
PDF
Query processing in time-dependent spatial networks
PDF
Efficient reachability query evaluation in large spatiotemporal contact networks
PDF
Domical: a new cooperative caching framework for streaming media in wireless home networks
PDF
Scalable processing of spatial queries
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
Automatic image matching for mobile multimedia applications
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
Efficient indexing and querying of geo-tagged mobile videos
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
Enabling spatial-visual search for geospatial image databases
PDF
Edge indexing in a grid for highly dynamic virtual environments
PDF
Spatiotemporal traffic forecasting in road networks
PDF
Data replication and scheduling for content availability in vehicular networks
Asset Metadata
Creator
Wang, Haojun
(author)
Core Title
MOVNet: a framework to process location-based queries on moving objects in road networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
02/21/2009
Defense Date
10/23/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
location-based services,mobile computing,OAI-PMH Harvest,spatial-temporal data management
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Zimmermann, Roger (
committee chair
), Bardet, Jean-Pierre (
committee member
), Shahabi, Cyrus (
committee member
)
Creator Email
leohjwang@hotmail.com,whjun@hotmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1982
Unique identifier
UC1323027
Identifier
etd-Wang-2610 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-150329 (legacy record id),usctheses-m1982 (legacy record id)
Legacy Identifier
etd-Wang-2610.pdf
Dmrecord
150329
Document Type
Dissertation
Rights
Wang, Haojun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
location-based services
mobile computing
spatial-temporal data management