Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient reachability query evaluation in large spatiotemporal contact networks
(USC Thesis Other)
Efficient reachability query evaluation in large spatiotemporal contact networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENTREACHABILITYQUERYEVALUATIONINLARGE
SPATIOTEMPORALCONTACTNETWORKS
by
HoutanShirani-Mehr
ADissertationPresentedtothe
FACULTYOFTHEUSCGRADUATESCHOOL
UNIVERSITYOFSOUTHERNCALIFORNIA
InPartialFulfillmentofthe
RequirementsfortheDegree
DOCTOROFPHILOSOPHY
(COMPUTERSCIENCE)
August2013
Copyright 2013 HoutanShirani-Mehr
tomyparents
ii
Acknowledgments
First of all, I would like to express my sincere gratitude to my academic advisor, Pro-
fessor Cyrus Shahabi, without whom this work would not be possible. His motivation,
patience,andimmenseknowledgehasalwaysbeenagreatmotivationformeduringthis
journey. IamtrulyproudtohavehimasmyadvisorduringmyPhDstudies.
I am also greatly indebted to the folks at InfoLab for their support and encourage-
ment during my PhD studies. Among them, a special thanks goes to Dr. Farnoush
Banaei-KashaniandDr. JeffKhoshgozaran-Haghighi.
MysincerethanksalsogoestoDr. AndreiPopa,Dr. CurtisPadget,andMr. Yugand-
har Veluvali for offering me the summer internship opportunities in their groups and
leadingmeworkingondiverseexcitingprojects.
IamthankfultoProfessorJayKuo,ProfessorKostasPsounis,ProfessorShriNarayanan
and Dr. Aram Galystan who served in my qualification committee, and provided many
insightfulcommentsonmyresearchwork. IwouldalsoliketothankProfessorJayKuo
andProfessorShriNarayananforservinginmydefensecommittee.
Last but not least, I sincerely appreciate the love, support, and encouragement of
those to whom this dissertation is dedicated. To my dearest parents, Dr. Zahra Emami
Naeini and Dr. Homayoun Shirani-Mehr, for their endless love and support throughout
all these years. Words cannot describe the gratitude I owe to you. To my brothers, Dr.
iii
HoomanShirani-mehrandHoushmandShirani-Mehrforalwaysbeingthereforme. Im
trulyblessedtohaveyouall,andadmireyourcontinuoussupport.
iv
Contents
Dedication ii
Acknowledgments iii
ListofFigures vii
ListofTables viii
1 Introduction 3
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 ReachabilityQueryProcessinginContactNetworkswithnoConstraints 7
1.3 ReachabilityQueryProcessinginContactNetworkswithConstraints . 8
1.4 ThesisStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 RoadMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 RelatedWork 12
2.1 GraphReachability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 TrajectoryJoinandTrajectoryIndexing . . . . . . . . . . . . . . . . . 14
2.3 ExternalGraphTraversalandGraphIndexing . . . . . . . . . . . . . . 15
2.4 ContactNetworksAnalysis . . . . . . . . . . . . . . . . . . . . . . . . 16
3 ProblemDefinition 17
3.1 ContactNetworkwithConstraints . . . . . . . . . . . . . . . . . . . . 17
3.2 ReachabilityQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 ReachabilityQueryProcessinginContactNetworkswithnoConstraints 20
4.1 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 ReachGrid: ASpatiotemporalReachabilityIndex . . . . . . . . . . . . 21
4.2.1 IndexConstruction . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 QueryProcessing . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.3 ReachGridOptimization . . . . . . . . . . . . . . . . . . . . . 29
4.3 ReachGraph: AConnectivity-basedReachabilityIndex . . . . . . . . . 33
4.3.1 IndexConstruction . . . . . . . . . . . . . . . . . . . . . . . . 33
v
4.3.2 QueryProcessing . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 ReachGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 ReachGraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.3 ReachGridvs. ReachGraph . . . . . . . . . . . . . . . . . . . 55
4.4.4 ComparisonwithGraphReachability . . . . . . . . . . . . . . 57
5 ReachabilityQueryProcessinginContactNetworkswithConstraints 59
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 ConstraintsCategorizationforReachabilityQueryinContactNetworks 61
5.2.1 Contact-levelConstraints . . . . . . . . . . . . . . . . . . . . . 62
5.2.2 Object-levelConstraints . . . . . . . . . . . . . . . . . . . . . 64
5.3 ReachabilityinContactNetworkswithLatency . . . . . . . . . . . . . 64
5.3.1 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.2 ReachGraphandReachGridExtension . . . . . . . . . . . . . 65
5.3.3 IOComplexity . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 ExtensiontoContactNetworkswithOtherConstraints . . . . . . . . . 81
6 ConclusionsandFutureWork 84
ReferenceList 86
vi
ListofFigures
1.1 Objectspositionsandcontactsbetweenthemduringthetimeinterval[0,3] 6
4.1 AnexampleofindexconstructedforReachGrid . . . . . . . . . . . . . 25
4.2 ReachGridIndexExample . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 ReachGridQueryProcessing . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 TENmodelofC (a)andthecorrespondingDAG(b) . . . . . . . . . . 35
4.5 D
N
attheendofreductionstep . . . . . . . . . . . . . . . . . . . . . . 38
4.6 D
N
3
forH
N
whoseD
N
1
isthegraphinFigure4.5 . . . . . . . . . . . . 38
4.7 ReachGraphforthecontactnetworkinFigure4.5 . . . . . . . . . . . . 40
4.8 ReachGridresolutionsoptimization . . . . . . . . . . . . . . . . . . . 49
4.9 ReachGridconstructiontime . . . . . . . . . . . . . . . . . . . . . . . 49
4.10 ReachGridQueryProcessing . . . . . . . . . . . . . . . . . . . . . . . 50
4.11 Contactnetworkedges(a)andvertices(b) . . . . . . . . . . . . . . . . 51
4.12 Contactnetwork(D
N
)constructiontime . . . . . . . . . . . . . . . . . 52
4.13 IOcountvsdifferentpartitiondepths . . . . . . . . . . . . . . . . . . . 55
4.14 ReachGraphonlinequeryprocessingfordifferentapproaches . . . . . . 55
4.15 ReachGridvs. ReachGraph . . . . . . . . . . . . . . . . . . . . . . . . 56
4.16 CPUtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1 ConstraintsClassificationofReachabilityQueriesinContactNetworks 62
5.2 ReachGraphattheendofreductionstep . . . . . . . . . . . . . . . . . 67
5.3 HilbertConstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 HilbertCurveatResolution 2toMap 2dSpaceinto 1dspace . . . . . . 69
5.5 ReachGridresolutionsoptimization . . . . . . . . . . . . . . . . . . . 76
5.6 IOcountfortherow-wiseandHilbert-baseddiskplacementtechniques
forRWP
20k
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.7 IO count for B-traversal and U-traversal and different contact network
latencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.8 Contactnetworkedges(a)andvertices(b) . . . . . . . . . . . . . . . . 79
5.9 ReachGridvs. ReachGraph . . . . . . . . . . . . . . . . . . . . . . . . 80
vii
ListofTables
4.1 ComplexityComparison . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 DataCollectionSize . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 SystemSpecifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 AveragevertexdegreeforD
N
i
. . . . . . . . . . . . . . . . . . . . . . 53
4.5 GRAILvs. ReachGraph(denotedbyRG) . . . . . . . . . . . . . . . . 57
viii
Inmanyapplicationscenarios,anitem,suchasamessage,apieceofsensitiveinfor-
mation, contagious virus or a malicious malware, passes between two objects, such as
moving vehicles, individuals or cell phone devices, when the objects are sufficiently
close (i.e., when they are, so-called, in contact), and some application specific con-
straints are satisfied. An example of ”constraint” in the transmission of a malware is
that it takes some time such that the malware is activated on a cell phone and then it
can be transmitted to another one via Bluetooth. As another example for constraint, a
messagepassesbetweentwovehicleswithaprobabilitywhichdependsonvariouscon-
ditions such as the distance between the vehicles. In such applications, once an item
is initiated, it can penetrate the object population through the evolving network of con-
tactsamongobjects,termed“contactnetwork”. Areachabilityqueryevaluateswhether
two objects are “reachable” through the contact network. In this dissertation, we define
and study reachability query in large (i.e., disk resident) contact datasets which veri-
fieswhethertwoobjectsarereachablethroughthecontactnetworkrepresentedbysuch
contactdatasets. Themaincharacteristicsofourproblemarethelargescaleofthecon-
tact dataset as well as the dynamism of the network which models the contact dataset.
Thisunderlyingnetworkevolvesoverthetimeperiodduringwhichthecontactdatasetis
constructedastheobjectsaremovingintheenvironmentandsubsequentlynewcontacts
appearandoldcontactsdisappearovertime.
In this dissertation, due to the complexity of the general problem, we first simplify
the problem by focusing on reachability in contact datasets with no-constraints. With
such contact datasets, an item passes between two objects when they are close enough.
We propose two contact dataset indexes, termed ReachGrid and ReachGraph, for effi-
cient reachability query processing. With ReachGrid, at the query time only a small
necessaryportionofthecontactdatasetisconstructedandtraversed. WithReachGraph,
1
we precompute and leverage reachability at different scales for efficient query process-
ing. Weoptimizethediskplacementofbothindexesforefficientqueryprocessing.
Afterward, we extend ReachGrid and ReachGraph for contact networks with con-
straints. To this end, as a case study we focus on a specific type of constraint, i.e., the
latencyconstraint,andadoptReachGraphandReachGridforefficientreachabilityquery
processing. Furthermore,wediscusshowtogeneralizeReachGraphandReachGridfor
contactnetworkswithgeneralconstraintsbasedontheinsightsweobtainfromfocusing
oncontactnetworkswithlatency.
2
1
Introduction
1.1 Motivation
In many application scenarios, items such as infectious viruses, ideas and habits, mal-
wares, and broadcast messages can transmit between moving objects, e.g., individuals,
mobile devices, or vehicles, when they are in sufficiently close distance, i.e., once they
are so called in contact, and some application specific constraints are satisfied. The
application specific constraints can be in various forms. For example, an application
constraint in the transmission of a malware is that it takes some time such that the mal-
wareisactivatedonacellphoneandthenitcanbetransmittedtoanotheroneviaBlue-
tooth. Imagine another application in which moving persons are transmitting files over
cell phone Bluetooth. With this application, a file transmission is only possible when-
ever two cell phones be in contact for enough time such that the entire file can transmit
from one cell phone to the other one. Here, the application constraint is the fact that
two cell phones should be in contact for enough amount of time such that a message
is transmitted from one cell phone to another one. Accordingly, given a population of
movingobjects,onceanitemisinitiatedbyanobjectandtheapplicationconstraintsare
satisfied,theitemcanpenetratetheevolvingnetworkofcontactsamongobjectstermed
thecontactnetwork.
Arguably,oneofthemainbuildingblocksforitempropagationanalysisinevolving
contactnetworksistheabilitytocomputereachabilityquerieswhichevaluateswhether
two objects are “reachable” through the evolving contact network. Previously, lack of
3
accurate datasets that capture the contact networks has limited the accuracy and appli-
cability of propagation analysis (and particularly, reachability analysis) in contact net-
works,andpreviousstudieshaveinevitablyresortedtosimplifiedcontactnetworkmod-
els, or small-scale and inaccurate contact datasets. However, with the recent advances
indevelopingaccuratepositioningdevicesandprevalenceoflocation-basedservices,it
is becoming possible to capture the location of objects in large scales and for extended
periodsoftime,resultinginverylargecontactdatasetsthatcapturethehistoryofobjects
contacts accurately and with high spatiotemporal resolution. In this dissertation, we
focus on defining and efficient evaluation of reachability queries in large-scale (disk-
resident) historic contact datasets in the presence of possible application specific con-
straints. Ourmainchallengeistoreducethecomputationtimeforqueryevaluation.
Thetwomaincharacteristicsofourproblemarethelargescaleofthecontactdataset
and the dynamism of the network which models the contact dataset. The large scale of
the contact dataset makes it impossible to store it in the main memory. Therefore, the
contact dataset should be indexed and carefully placed on disk for fast and efficient
traversal during query processing. Furthermore, the underlying network which models
the contact dataset over a period of time is an evolving network. In such network, con-
tacts between objects have been appeared and disappeared over time as objects moved
intheenvironment.
Consider the contact network depicted in Figure 1.1 which shows the position of
a set of objects at each time instance within the time interval T=[0,3]. In this figure,
two objects are connected by a link if they are in contact; for instance,o
1
ando
2
are in
contact at time 0. Assume an application specific constraint such that an item needs to
stayforatleastLtimeinstancesato
i
beforeitcanmovetoo
j
.Theobjecto
4
isreachable
fromo
1
duringtimeintervalof [0,1]whenL=0.Thereasonisthatifanitemisinitiated
4
byo
1
attime 0,itcanpassfromo
1
too
2
attime 0andthenfromo
2
too
4
attime 1.Note
thatinthesamefigure,o
4
isnotreachablefromo
1
during [0,1]whenL≥ 1.
Consider the following examples on how reachability query evaluation plays a fun-
damental role in analyzing item propagationthrough contact networksin the contextof
some application scenarios. In the first example, assume a set of individualsO who are
known to carry a dangerous contagious virus. By performing a batch of reachability
queries between each individual in O and the rest of the population, the persons who
couldhavebeendirectlyorindirectlycontaminatedwithinacertaintimeintervalcanbe
identified by determining the set of people reachable fromO in the same time interval.
An application constraint in this case is that a virus needs to stay for at least L time
instances on its host o such that o can transmit the virus to another individual o
′
. Note
thatthisapplicationrequiresrunningpotentiallynumerousreachabilityqueriesbetween
pairsofindividualswhichcanbeverytimeconsuming. Ontheotherhand,timelymed-
ication administration can save lives with most viral diseases. Next, imagine a set of
personsO, e.g., criminals, on a watch list and need to be monitored. Law enforcement
agencies may need to discover those who have been potentially in contact with any of
the individuals in O. In this case, the individuals contacts are discovered through ana-
lyzing trajectories obtained from aerial imagery, surveillance camera or GPS in mobile
devicesusedbythoseindividuals. Again,thisrequiresperformingbatchofreachability
queries to find all the people reachable from/to any individual inO. Such analysis may
helpinpreventingnewcrimesandtoanalyzetherelationshipbetweencriminals.
Graph reachability problems which verify whether a path exists between two given
vertices of a graph are extensively studied in the recent years, e.g., [YCZ10, YC10,
YCZ13]. Our problem is different from the prior work on graph reachability in two
different ways. First, while previous work on graph reachability assumes the graph
5
Figure1.1: Objectspositionsandcontactsbetweenthemduringthetimeinterval[0,3]
is memory-resident, we focus on very large disk-resident contact networks. Accord-
ingly, we study how to index and place the contact network on disk to enable efficient
queryprocessing. Second,withourproblemobjectsareassociatedwithtimeandspace
information as they move in an environment over time. We show that we can leverage
such information to significantly improve the efficiency of reachability query process-
ing, whereas the existing work on graph reachability only focuses on datasets that are
modeledbyabstractgraphswithnoconnectiontospaceandtime. Inparticular,wepro-
pose a novel multiresolution and bidirectional graph traversal techniques which allow
for unprecedented improvement in the efficiency of state of the art reachability query
processingapproaches.
Inthisdissertation,wefirstsimplifytheproblembystudyingreachabilityquerypro-
cessingincontactnetworkswithnoconstraints. Thisproblemisstillaverychallenging
problem and also studying this problem enables us to identify the challenges and build
promising techniques for processing reachability query in contact networks with con-
strains. Thereafter, we extend our techniques to contact networks with constraints. To
thisend,wefocusonaspecifictypeofconstraint,i.e.,thelatencyconstraint,andadopt
ReachGrid and ReachGraph for efficient query processing. Finally, we discuss how to
generalize ReachGrid and ReachGraph for contact networks with different constraints
basedontheinsightsandideasobtainedbyfocusingoncontactnetworkswithlatency.
6
1.2 ReachabilityQueryProcessinginContactNetworks
withnoConstraints
In this dissertation, we first study reachability query in contact networks with no con-
straints. With such contact networks, two objects can transmit items whenever they are
in contact and no application specific constraint should be satisfied. We propose two
index structures for indexing such contact networks, namely ReachGrid and Reach-
Graph [SMKS12]. Consider a reachability query which verifies whether an object
(query source) can reach another object (query destination) through the contact net-
work, if we consider only the contacts occurring during a given query time interval
(query interval). With ReachGrid, our approach is to compute reachability on-the-fly
by expanding the contact network starting from the query source. However, the na¨ ıve
expansionofthenetworkisprohibitivelycostly. Instead,toenable“guided”expansion,
weleveragethefollowingsimpleandpowerfulobservationaboutcontactnetworks;only
contactsthatoccurinthesamespatialandtemporallocalityarerelevantforexploration
andtherefore,explorationofthecontactscanbeguidedthroughrelevantspatiotemporal
localities and can avoid other localities for enhanced performance. In particular, with
ReachGridweproposeaspatiotemporalgridtoindexallcontactsinthecontactnetwork
dataset into distinct spatiotemporal localities. At the query time, this index is used to
guide on-the-fly expansion of the contact network to verify reachability. We propose a
bidirectional traversal technique which can quickly identify whether query destination
is reachable from query source during query interval or not. Furthermore, we optimize
thequeryprocessingtimebyproposingtoprecalculatespatiotemporaljoinstoavoidthe
complexityofprocessingthematthequerytime.
Ontheotherhand,withReachGraphweusethealternativeapproachofprecomput-
ingthereachabilitybetweenobjects. Itisimpracticaltoprecomputereachabilityforall
7
combinations of query source, destination and interval. Therefore, we propose to pre-
compute reachability query only for carefully selected combinations of query source,
destinationandinterval,andleveragethesecombinationstocomputereachabilityforall
othercombinationson-the-fly. Inturn,atthequerytimethisallowsrecursivelybreaking
the given reachability query to a set of precomputed reachability queries for efficient
queryprocessing.
The placement of index on disk can significantly affect the efficiency of query pro-
cessing. A na¨ ıve approach of placing indexes (graph nodes and grid cells) on random
disk blocks significantly deteriorate query efficiency. Accordingly guided by the two
following observation, we develop enhanced disk placement approaches for ReachGrid
andReachGraph. First,contactsareprocessedorderedbyoccurrencetimeduringquery
processing. Second, during index traversal, an object o
′
is traversed after o, if o
′
is
reachablefromo.
While ReachGrid evaluates reachability by sweeping contacts along space and time
dimensions, ReachGraph computes reachability by traversing a connectivity graph.
Accordingly, one can expect ReachGrid to outperform ReachGraph when query time
intervalissmall,andviceversa. Thisexpectationisconfirmedbyourempiricalstudyin
Section 5.3.4. Moreover, our proposed approaches outperform the existing reachability
queryprocessingalgorithmsby 76%onaverage.
1.3 ReachabilityQueryProcessinginContactNetworks
withConstraints
After studying contact networks with no constraints, we extend reachability query pro-
cessing to the contact networks where there are additional constraints imposed by the
application. Wefirstcategorizedifferentconstraintsincontactnetworksforreachability
8
query processing into two different groups, i.e., contact-level constraints and object-
level constraints. With contact-level constraints, the sequence of contacts between
objects through which an item travels in the contact network should satisfy a set of
constraints. On the other hand, with object-level constraints the objects which making
contactsshouldthemselvessatisfyasetofapplicationconstraints. Thereafter,wefocus
on a contact network constraint which is latency. With latency constraint, an object can
only transmit an item at least L time instances after receiving it. For example, a conta-
gious virus is that the virus should stay in the body of the first individual long enough
to enable him to infect others. As another example, it takes sometime for a malware,
which propagates over Bluetooth, to be installed on a cell phone and be transmitted to
anothercellphone.
We show how to extend ReachGrid and ReachGraph index structures to process
reachabilityqueryincontactnetworkswithlatencyintwosteps. First,wedescribehow
oneneedstomodifyindexconstructionsuchthattheindexesworkforcontactnetworks
with latency. We show that some steps of ReachGraph construction are not applicable
to contact networks with latency. The RichGraph index proposed for contact network
with no constraints outperforms ReachGrid when query interval is large. Motivated by
this, we also improve the efficiency of ReachGrid index structure by proposing a new
disk placement approach which better leverage objects trajectories locality by locating
ReachGrid cells based on Hilbert curve placement rather than the row-wise placement
technique proposed for contact networks with no constraints. A space filling curve is a
function which maps a multi-dimensional space into a one dimensional space. Hilbert
spacefillingcurveisstudiedextensivelyintheliteratureandprovedthatitoutperforms
otherapproachesintermsofpreservingthelocalityofobjects[Jag97]. Next,wedesign
an efficient query processing technique in contact networks with latency by proposing
9
a bidirectional contact network traversal which enables pruning a huge part of con-
tact network comparing with unidirectional contact network traversal. We analyze the
IO complexity of our proposed approaches and empirically evaluate the efficiency of
our proposed techniques. Based on our experimental results, our bidirectional query
traversaltechniqueoutperformsthattheunidirectionaltraversaltechniquebymorethan
41%forBothReachGridandReachGraph. Also,ReachGridtechniqueoutperformsthe
adoptedReachGraphfor 18%onaverageintermsofthenumberofdiskaccesses.
Study of reachability query processing in contact networks with latency provides
us with insights on how to tackle the challenges with the existence of different contact
network constraints. Finally, we discuss the extensions of ReachGrid and ReachGraph
toprocessreachabilityqueryincontactnetworkswithgeneralconstraints. Theprosand
consoftheextensionsarealsodiscussedandstudied.
1.4 ThesisStatement
In this dissertation, we introduce reachability query processing in contact networks
whichisafundamentalbuildingblockforpropagationanalysisincontactnetworks. We
show that real-time evaluation of reachability queries is feasible in large and evolving
spatiotemporal contact networks, enabling new applications in public health analysis,
surveillanceandintelligenttrafficmonitoring.
1.5 RoadMap
The remainder of this dissertation is organized as follows. In Chapter 2 we review the
relatedwork. Theformaldefinitionofreachabilityqueryprocessingincontactnetworks
withconstraintsispresentedinChapter3. Wediscusstheproblemofreachabilityquery
processing in contact networks with no constraints and with constraints in Chapters
10
Chapter 4 and 5, respectively. Finally, we conclude this dissertation with a summary of
ourcontributionsandfutureworkinChapter6.
11
2
RelatedWork
We review the related work in four categories: graph reachability, trajectory indexing
and trajectory join, external graph traversal and graph indexing and finally, contact net-
worksanalysis.
2.1 GraphReachability
Given two vertices u and v in a directed graph G, graph reachability verifies whether
there is a path from u to v [YC10, YCZ10]. Two basic approaches at the two ends of
the spectrum of solutions for graph reachability queries are 1) precompute reachability
betweeneverypossiblepairofverticesofthegraph,i.e.,tofindthetransitiveclosureof
G,whichallowsforansweringqueryinO(1),or2)performon-the-flygraphsearchfor
each query without relying on precomputation. Both of these approaches are impracti-
cal for large graphs. The first approach requires O(|V|
2
) storage space and the second
approachincursO(|V|+|E|)queryprocessingtime,whicharebothunacceptablegiven
thelargesizeofthegraph. Therefore,variousapproacheshavebeenproposedinthelit-
eraturetoreducethestoragecostwhilemaintainingareasonablequeryprocessingtime.
These approaches can be broadly categorized into two different categories [YCZ13],
interval labeling [ABJ89, CC08, TL07] and 2-HOP [CHKZ02] labeling. The first cate-
gory, interval labeling approaches assign either the min-post labeling [ABJ89], or pre-
post labeling [CC08, TL07], to a spanning subtree of the DAG of the original graph.
With pre-post labeling, each vertexv is assigned an interval [b,e] whereb ande are the
12
pre-order and post-order ranks ofv. The rank ofv is calculated by DFS traversal of the
DAG,startingfromtheroot(s)andincrementingtherankwheneverweenteravertexor
back-track from a vertex. On the other hand, in a min-post labeling,e is the post-order
rank of v, and b is the minimum rank of any vertex which is descendent of v. With 2-
Hop labeling [CHKZ02, CYL
+
06] for each vertexv two sets are constructed. The first
setincludesalltheverticeswhicharereachablefromv andtheothersetincludesallthe
verticeswhichv isreachablefrom. Giventwoverticesuandv,theintersectionsetS of
all the vertices reachable fromu and those whichv is reachable from is taken and ifS
isnon-emptythenv isreachablefromu.
The dynamism of the graph, i.e., vertexes and edges being removed and added over
time,isconsideredinreachabilityqueriesaswell[JRXW11]. Thestudyofbasicreach-
abilityhasalsobeenextendedtoincludeconstrains. Reachabilityinuncertaingraphsis
studied in [JLDW11]. The problem of k-hop reachability, whether the exists a path of
length k from a source vertex to a destination vertex, is studied recently in [CSC
+
12].
Reachabilityinedge-labeledgraphsisstudiedin[JHW
+
10]wherethequeryiswhether
vertex u reach vertex v through a path whose edge labels are constrained by a set
of labels. The problem of weight constraint reachability is formalized and studied
in[QCQ
+
12]. Thisproblemaskswhetherthereisapathbetweentwoinputvertices,on
whicheachreal-valuededge(orvertex)weightsatisfiesarangeconstraint.
Although we also reduce our problem to graph reachability by converting the con-
tact network into a hypergraph, our problem is different from previous work on graph
reachabilityinseveralways. First,incontrastwiththepreviousworkwherethefocusis
onmemory-residentgraphs,weconsiderdisk-residentcontactnetworks. Withdiskres-
ident contact networks, one needs to consider the efficient disk placement and traversal
to which affect the disk access time. Second, we focus on ”spatiotemporal” graphs and
accordingly leverage the spatial and temporal properties of such graphs for enhanced
13
index construction and graph traversal. In particular, our graph vertex may repre-
sent multiple objects and moreover an object can be associated with multiple vertices.
Finally, our proposed multi-resolution graph indexing and bidirectional graph traver-
sal approaches are unique and novel, allowing for unprecedented improvement in the
efficiencyofstateoftheartreachabilityqueryprocessingapproaches.
2.2 TrajectoryJoinandTrajectoryIndexing
The research on moving objects data management has traditionally focused mainly
on range and nearest neighbor queries. Recently, trajectory join has also been stud-
ied [BHKT05, AJ06]. The problem of Closest-Point-of-Approach (CPA) is proposed
and studied in [AJ06]. Given a set of trajectories, CPA finds the pair of objects whose
closest distance is less thand. Although CPA problem is different from trajectory join,
the solution to CPA problem can be adopted to solve trajectory join. Although we use
trajectory join algorithms in finding the contacts of a contact network and hence con-
structing the contact network, our focus is on indexing the contact network for efficient
reachabilityqueryprocessing.
Another relevant body of related work on trajectory processing is trajectory index-
ing[CEP03,CMWM10]whichfocusesonindexingtrajectoriesforefficientprocessing
of range queries and its variations. In contrast, our problem is how to index a con-
tactnetworkforefficientreachabilityqueryprocessingwhichismuchmorecomplexas
comparedtorangequeryanditsvariations.
Shortest path on graphs [SSA08, DKSR11, WXD
+
12] is another body of related
work. GivenagraphG=(V,E),theshortestpathfindstheoptimallyshortestpathassum-
ing a traveling cost between each pair of graph vertices. In contrast, with reachability
14
query we are interested in verifying whether a contact path exists between two objects
whichsatisfiesasetofapplication-specificconstraints.
2.3 ExternalGraphTraversalandGraphIndexing
With external memory graph traversal [MM02, Vit08], researches have extended the
classicgraphtraversalapproachessuchasDepth-First-Search(DFS)andBreadth-First-
Search (BFS). As mentioned earlier, both DFS and BFS can be leveraged to answer
reachability queries. However, with our work we try to avoid unnecessary expansion
of the graph nodes by leveraging spatiotemporal information embedded with the con-
tact network and graph and subsequently, designing an efficient multi-resolution index
structureandtraversalapproaches.
Another category of relevant work focuses on indexing temporal graphs. Time
expanded network (TEN) and Time aggregated network (TAN) [SX08] are two models
to represent time varying networks. TEN represents the time dependence by instanti-
ating a snapshot of the network at every time instance. TAN extends TEN where the
time varying attributes are further aggregated over edges and vertices. We utilize TEN
to initially model a contact network but afterward convert it to a more complex index
structure as discussed in Section 4.3. Recently, [EYKS10] studied efficient indexing of
spatiotemporal networks represented by TEN. However, in this dissertation the focus is
on indexing techniques to enable efficient processing of route evaluation and retrieval
queries as opposed to our work which focuses on the complex reachability query pro-
cessing.
15
2.4 ContactNetworksAnalysis
Recentstudies[TMML10,KBV07,KA12]havefocusedonanalyzingcharacteristicsof
the contact networks such as average contact path length between two objects, or time
duration until two objects contact each other again. This area of work is orthogonal
to our work as we are focusing on indexing a contact network for efficient reachability
queryprocessingforcontactnetworkswithconstraints.
Routing in delay-tolerant networks (DTN) which lack continuous network connec-
tivityisanotherbodyofrelevantwork[HCY11,ZXSW13]. Thedifferencebetweenthis
body of work and our work is two fold. First, the goal of routing in DTN is to find a
best path from a source to a destination node based on a cost metric such as messages
delivery ratio. Next, our reachability query is associated with a time interval parameter
which is leveraged during index construction and query processing to enable efficient
reachabilityqueryprocessing.
16
3
ProblemDefinition
Here,wefirstformallydefinecontactnetworkwithconstraints. Afterward,weformalize
thereachabilityquerywhichverifieswhetheranitemcanreachfromanobjecttoanother
object by propagating over a sequence of contacts through the contact network which
mayhaveapplicationspecificconstraints.
3.1 ContactNetworkwithConstraints
Consider a set of objects O moving in an environment E. We say a contact c={o
i
,o
j
}
has happened between two objectso
i
,o
j
∈ O, when they are within a sufficiently close
distance to transmit an item, i.e., when their distance is less than a threshold d
T
and
a set of application specific constraints F = {f
1
,f
2
,f
3
,...,f
n
} are satisfied. The
value of d
T
depends on the application of interest. For example, for disease propa-
gation through human populations d
T
is in the order of meters while with Bluetooth
datatransferthroughasetofmobiledevicesd
T
isintheorderofhundredmeters. Each
constraints f
i
is a function which evaluates a sequence of contacts according to a con-
straintsparametervaluesetα
i
tocheckifthesequencecantransmitanitemfromsource
to destination. For example, with disease propagation, a constraints set can be defined
as F={f
1
,f
2
} with α
1
={7 days} and α
2
={5 minutes} where f
1
checks if a virus stays
for required amount of time (7 days which is derived fromα
1
) ono
i
to make it capable
of transmitting the virus to o
j
while f
2
verifies if o
i
and o
j
stay in contact for enough
time (5 minutes which is derived from α
2
) such that the virus can transmit from one
17
to the other. We call o
i
and o
j
the contacting objects during c, and we define the time
intervalT
c
withinwhichcontactpersiststhevalidityintervalofc.
ConsideratimeintervalT duringwhichobjectsinO aremovinginanenvironment
E, and making various contacts over time. The movement of each object o ∈ O can
be modeled by the trajectory of o which captures the position of o at each time instant
t ∈ T. We term the collection of contacts between pairs of objects in O during the
time interval T as contact network of O during T and represent it by C. For example
with Figure 1.1, c
1
={o
1
,o
2
}, c
2
={o
2
,o
4
}, c
3
={o
3
,o
4
} and c
4
={o
1
,o
2
} are the contacts
occurring during T=[0,3] having validity intervals T
c
1
=[0,0], T
c
2
=[1,1], T
c
3
=[1,2] and
T
c
4
=[2,3].Noticethatwedifferentiatec
1
andc
4
althoughtheyhavethesamecontacting
objects,becausebydefinitionavalidityintervalisrequiredtobecontinuous.
3.2 ReachabilityQuery
Consider a contact network C which is constructed based on the history of movement
of objectsO in an environmentE during a time intervalT. Assume a set of application
specific constraints F={f
1
,f
2
...,f
n
} are required to be satisfied by the application.
Given a pair of objects (o
i
,o
j
),o
i
,o
j
∈ O, a time intervalT
p
⊆ T, a set of constraints
parameters values A={α
1
,α
2
...,α
3
}, the reachability query q verifies whether there
existsacontactpathp
ij
fromo
i
too
j
duringtimeintervalT
p
where
f
i
(p
ij
,α
i
)≡T,∀f
i
∈F. (3.1)
Intuitively, a contact path between two objectso
i
ando
j
consists of a sequence of con-
tacts in the contact network C through which any virtual item i can travel the network
to go from o
i
to o
j
. We define a contact path from object o
i
to object o
j
as a series of
contacts (c
1
,c
2
,...,c
n
) in C, where T
c
i
overlaps T
p
(1 ≤ i ≤ n), and for each pair of
18
contacts c
i
and c
i+1
(1 ≤ i ≤ n− 1) we have 1) the contacts share an object, i.e., if
c
1
={o
1
,o
2
}andc
2
={o
3
,o
4
}theno
2
=o
3
,and2)T
c
i
startsbeforeT
c
i+1
intime.
We further clarify the condition in 3.1 by giving an example. Consider the set of
constraints F={f
1
,f
2
} and constraints parameter values α
1
and α
2
as Section 3.1. To
evaluatewhetherf
1
(p
ij
,α
1
)≡T,oneneedstocheckifanitemcanbeinitiatedfromo
i
at the beginning ofT and it can reacho
j
by traversing the contacts inp
ij
while staying
for at least 7 days on each object o where o is an object in a contact c = {o,o
′
} ∈ p
ij
whichreceivestheitemfromo
i
andtransmitsittoo
′
.
We call o
i
, o
j
, T
p
, F, A query source, query destination, query interval, query
constraints and query constraints values, respectively, and denote such a query by
q :o
i
Tp;F;A
; o
j
.
19
4
ReachabilityQueryProcessingin
ContactNetworkswithnoConstraints
Inthischapter,westudytheproblemofreachabilityqueryprocessingincontactnetwork
withnoconstraints. Thesolutiontothisproblemwillgiveustheinsightsandnecessary
buildingblockstolaterstudythereachabilityqueryprocessingincontactnetworkswith
constraints.
We propose two novel index structures, ReachGrid and ReachGraph which can
efficiently process a reachability query in contact networks with no constraints. With
ReachGrid,thecontactnetworkisexpandedatthequerytimefromquerysourcetoward
query destination. To avoid traversing the unnecessary portions of the contact network,
weleveragethefactthatthecontactswhicharerelevantforqueryprocessingareoccur-
ring in the same spatiotemporal locality. Therefore, we fist index the spatiotemporal
localities of the contact network and at the query time propose a bidirectional traversal
techniquetoenableefficientqueryprocessing. Wefurtherimprovetheefficiencyofthe
CPUtimebyprecalculatingthecontactsaswell.
WithReachGraph,weusethetechniqueofprecalculatingreachabilitybetweenpairs
ofobjectsfordifferenttimeintervals. Subsequently,wecarefullyselectasubsetofquery
source, destination and interval combinations and precalculate reachability for those
selected combinations. The input query is broken into a set of precalculated queries at
thequerytimetoenableefficientqueryprocessing.
20
Duringthequeryprocessingsomepartsoftheindexstructurewhicharerelevantfor
query processing should be retrieved from disk. The efficiency of the query processing
reliesextensivelyontheefficiencyoftheindexplacementofdisk. Weproposeefficient
placement techniques for both index structures based on two principals. First, during
query processing the contacts occurred at time instance t
1
, should be processed before
that of t
2
if t
1
< t
2
. Second, objects reachable to each other should be placed on disk
close to each other as they are retrieved with high probability during query processing.
WepresentourproposeddiskplacementapproachesforReachGridandReachGraphon
diskinSections4.2and4.3,respectively.
The rest of this chapter is organized as follows. We formally define the problem
of reachability query processing in contact networks with no constraints in Section 4.1.
We propose our ReachGrid and ReachGraph index structures in Sections 4.2 and 4.3,
respectively. Finally,wepresentourexperimentalevaluationinSection4.4.
4.1 ProblemDefinition
ConsideracontactnetworkC whichisconstructedbasedonthehistoryofmovementof
objectsO inanenvironmentE duringatimeintervalT.Givenapairofobjects (o
i
,o
j
),
o
i
,o
j
∈ O and a time interval T
p
⊆ T, a reachability query on a contact network with
noconstraintsisdefinedinasimilarwayasthedefinitionpresentedinChapter3except
that F and A are both empty sets. In this case, we represent a reachability query only
byquerysource,destinationandinterval,andaccordinglydenoteitbyq :o
i
Tp
;o
j
.
4.2 ReachGrid: ASpatiotemporalReachabilityIndex
Toevaluateq:o
i
Tp
;o
j
,oneapproachistofirstmaterializethecontactnetworkC
′
,which
captures all contacts that have occurred during T
p
. It is obvious that other contacts are
21
irrelevant to processing q. One can construct C
′
as follows. Suppose trajectory of an
objecto
i
∈OduringT isrepresentedbyr
i
={(
− →
v
1
,t
1
),...,(
− →
v
n
,t
n
)}whichisasequence
of position-vector and time-stamp pairs (
− →
v
j
,t
j
), where
− →
v
j
is the position vector of o
i
at time t
j
∈ T. Accordingly, a segment r
i
(w) of a trajectory r
i
during a time window
w is defined as a subset of (
− →
v
j
,t
j
) pairs from r whose time-stamps belong to w, i.e.,
r
i
(w)={(
− →
v
j
,t
j
)|t
j
∈ w}. Assume that the set of trajectories segments from all moving
objectso∈OduringT
p
isdenotedbyR(T
p
),i.e.,R(T
p
)={r
i
(T
p
)}. Awindowtrajectory
join between two sets of trajectories P and Q, denoted by P ◃▹
d
T
Q, returns tuples
(p,q,w) where p ∈ P and q ∈ Q are within the distance of d
T
during w. C
′
can be
constructed by performing a self spatiotemporal join onR(T
p
), i.e.,R(T
p
)◃▹
d
T
R(T
p
),
and subsequently creating a contact between objecto
i
ando
j
at timet if the join result
includes (o
i
,o
j
,w), where t ∈ w. Once generated, C
′
can be traversed to identify any
existingcontactpathbetweeno
i
ando
j
.
Althoughtheaforementionedapproachcorrectlyanswersreachabilityqueries,itcan
be very inefficient due to redundant processing. In particular, one may not need to
consider all the contacts in C
′
to process a query q in two cases. First, all contacts
between objects which are not reachable from query source o
i
during query interval
T
p
are irrelevant to q. For example for Figure 1.1 and q:o
1
[2;3]
; o
2
, it is unnecessary
to process the contact between o
3
and o
4
as neither can possibly be reachable from o
1
during [2,3]. Second, we observe that o
j
may be reachable from o
i
during T
′
p
⊂ T
p
where|T
′
p
|≪|T
p
|.Inthiscase,thecontactswhosevaliditytimeintervaldonotoverlap
T
′
p
are irrelevant to q and redundant for query processing. For example for Figure 1.1
andq :o
1
[0;3]
; o
4
,thereisnoneedtoprocessthecontactsoccurringduring [2,3]aso
4
is
reachablefromo
1
during [0,1].
Inspired by the aforementioned observations, we introduce an efficient query pro-
cessing approach that given a reachability queryq tries to only construct the portion of
22
C
′
whichisnecessaryforprocessingq.Tothisend,firstduringanofflinephasewecon-
struct a spatiotemporal index structure, dubbed ReachGrid. ReachGrid enables pruning
most of the contacts irrelevant to the query q. During the online processing phase, we
incrementallyfindtheobjectsreachablefromthequerysourceintheorderofbecoming
reachable from query source when sweeping over query interval. We stop the process
eitherifquerydestinationisdiscoveredreachablefromthequerysource,orallthecon-
tacts occurring during query interval and between objects reachable from query source
areprocessed.
4.2.1 IndexConstruction
ReachGridleveragesthelocalityofobjectsoverspaceandtimetoavoidtraversingirrel-
evant contacts to a reachability query. It leverages temporal locality to stop query pro-
cessing as soon as a contact path between query source and destination is discovered
when traversing the contacts ordered by their occurrence time. To this end, the object
trajectories segments are grouped based on the time stamp of the position-vector pairs
intheobjectstrajectories. Acontactbetweentwoobjectsoccurswhentheyareinclose
proximity. Therefore, grouping the objects based on spatial locality tends to aggre-
gate the objects, which are in contact over time, together and in a same group. This
enables traversing a subset of groups which includes only the objects reachable from
querysourcewhenprocessingthequery. ReachGridenablestemporalandspatiallocal-
ity by imposing two grids on the objects trajectories. The first grid partitions the time
interval T (T is the time interval during which all the contacts in C occurred). The
secondgridspatiallypartitionsthetrajectoriessegmentswithineachtimeintervalinT.
WeconstructReachGridasfollows. First,wepartitionthetimeintervalT intoaset
of disjoint time intervals, i.e., T=(T
1
,...,T
n
). Next, we spatially partition the trajecto-
ries segments during each T
i
, the trajectories segments in R(T
i
), based on locality. To
23
this end, for each time interval T
i
we impose a grid C
i
on the environment E which
subsequently partitions the trajectory segments in R(T
i
). In this way, a grid cell c in
C
i
includes trajectories segment which span the area represented by c. Notice that a
trajectorysegmentr
i
(t
i
)∈R(T
i
)mayspanmultiplecellsofC
i
.Thetemporalandspa-
tial grids’ resolutions depend on the input contact network and query workload and we
selectthemempiricallyinSection5.3.4.
AnexampleforconstructedindexisshowninFigure4.2whereT ispartitionedinto
six time intervals. Furthermore, a 4× 4 grid imposed on the environment to spatially
partition the trajectories segments duringT
0
andT
1
.T
0
andT
1
have three and two time
instances,respectively. Thegridcellsforthefirsttwotimeintervals,i.e.,thegridsinC
0
andC
1
,areshownwhiletherestarenotshownforillustrationpurposes. Threedifferent
objectsareinO andrepresentedbycircle,squareandtriangleovertime.
As the query processing progresses by exploring trajectory segments in spatial grid
cells,weproposetoplacethetrajectoriesinacellcinC
i
onconsecutiveblocksondisk
to enable efficient retrieval of necessary trajectories segments during query processing.
Moreover, the position-vector and time stamp pairs (
− →
v,t) of trajectories segments inc
are placed on disk ordered by their time stamps. This enables avoiding processing all
the trajectories segments within c as soon as a contact path between query source and
destination is discovered. Accordingly, placement of the cells in different time grids on
disk, i.e., cells in C
i
versus the cells in C
j
where i < j, should be decided. Based on
thesamegoalofearlyqueryprocessingtermination, weplacethecellsinC
i
beforethe
cellsinC
j
ondisk.
24
Figure4.1: AnexampleofindexconstructedforReachGrid
Figure4.2: ReachGridIndexExample
Figure 4.3: ReachGrid Query Process-
ing
4.2.2 QueryProcessing
With ReachGrid index construction process, we address both aforementioned require-
mentstoavoidunnecessarycontacts. First,spatialclusteringenablestotraversethecon-
tact network directed toward the necessary contacts for query processing. The traversal
direction is decided based on the grid cell coordinates in which objects reachable from
query source are located. Second, temporal clustering enables early termination of the
query processing if a contact path is found from query source to destination, i.e., we
process the trajectories segments in R(T
i−1
) before those of R(T
i
). Query processing
aims to incrementally find the objects reachable from query source by sweeping the
query interval. To this end, at the beginning of the query processing, the query interval
isbrokenintoasubsetoftimeintervalsbyimposingthetemporalgridconstructedinthe
previous section, i.e., T
p
=(T
j
,...,T
k
). Afterward, the grid cell c in C
j
which includes
the query source at the beginning of query interval is located, i.e., cell c includes the
25
query source position at the beginning of query interval. This can be executed in con-
stant number of IOs assuming that an externalhash table maps each object to its trajec-
tory over time. Assume we call the set of objects reachable from query source during
query processing, the seed set. Initially, the seed set includes only the query source. To
process the reachability query, the algorithm iterates over each T
i
in T
p
and discovers
new seeds. To this end, at the beginning of T
i
the grid cells which include the current
seeds are located. Subsequently, objects which are reachable from at least one of the
seedsduringT
i
arefoundandaddedtotheseedset. Noticethatassoonasanewobject
reachablefromquerysourceisdiscovered,itisaddedtotheseedsetandhencethepro-
cess continues with the updated seed set. The order in which new seeds are discovered
isbasedonthetimeordertheybecomereachablefromanyofthecurrentseeds. Insome
cases, T
j
may be an interval whose start point different from the query interval start
point. In these cases, we start processingT
j
from the query interval starting point. We
stopthequeryprocessingifthequerydestinationisaddedtoseedsetorwhentheentire
queryintervalisprocessed.
The main step in the query processing is discovering new seeds during eachT
i
,j ≤
i ≤ k. Assume that the set of current seeds at the beginning of T
i
is S
i
. The goal is
to discover S
i+1
, i.e., the set of seeds at the beginning of T
i+1
which is the same as
that of end of T
i
. Presume the set of grid cells in which the seeds in S
i
are located is
denoted by C
S
i
. We first discover all the other cells which may contain an object o in
contact with a seed duringT
i
. We call such cells potential seeds cells and denote them
byN
i
. The cells withinN
i
can be found efficiently by creating the minimum bounding
regions(MBR)ofthetrajectoriessegmentsofobjectsinS
i
andconsequentlyfindingand
filtering the cells at the distance of maximum d
T
from those MBRs. During the query
processing, wheneverN
i
is updated, the first objecto
′
inN
i
is discovered which is not
inS
i
but becomes reachable from any of seeds inS
i
. Intuitively, we propagate a virtual
26
item i from the objects in the seed set at the beginning of T
i
and find the first object
which receivesi. This can be done by performing spatiotemporal join which works by
sweeping time during the join interval. Consequently, we addo
′
toS
i
and accordingly
find N
i
. Assume that o
′
is found reachable form a seed during [t
1
,t
′
] (T
i
=[t
1
,t
2
]). We
continue the process recursively with the updated sets but during [t
′
,t
2
]. Notice that
duringT
i
, the retrieved cells are buffered to prevent unnecessary future retrievals from
diskandarediscardedattheendofT
i
.
An example is shown in Figure 4.3 for query processing during a T
i
. The objects
o
1
,o
2
,o
3
and o
4
locations at time instance t
0
,t
1
and t
2
, t
0
< t
1
< t
2
, during T
i
are
highlighted. The trajectories for each object are shown by links connecting positions at
theaforementionedtimestamps. Assumethequerysourceanddestinationareo
1
ando
2
,
respectively,andqueryintervalis[t
0
,t
2
].Theshadedareaaroundthetrajectorysegment
ofo
i
denotestheMBRofthetrajectorysegmentwiththewidthofd
T
.ThisMBRshows
that o is in the seed set S
i
and any other object whose trajectory is within the MBR of
the trajectory segment of o will make a contact with o and be added to S
i
. At t
0
, S
i
containso
1
. Att
1
,o
1
ando
3
make a contact and henceo
3
is added toS
i
. During [t
1
,t
2
]
botho
1
ando
3
are inS
i
. Finally, att
2
the cellsc
1
andc
2
in whicho
2
ando
4
are located,
respectively, are added to N
i
and subsequently, o
2
is added to S
i
. Therefore, during
[t
0
,t
2
] query destination is reachable from query source. Due to illustration purposes,
weonlydiscussedhowN
i
changesatt
2
inthisexample.
TheentireonlineprocessingstepissummarizedinAlgorithm1. Thealgorithmgets
query source, destination, interval and the index constructed during the offline process.
First,queryintervalisquantizedintotimeintervalsfromT.Afterward,theinitialization
isperformedinlines2-5. ThealgorithmiteratesoverT
i
inT
p
andforeachT
i
itperforms
a join in line 9 to find the first object reachable from a seed during the interval w.
R
C
S
j
(w) denotes the set of object trajectories segments during w which span the cells
27
in C
S
j
. We adopt the join approach in [AJ06] which sweeps the time interval w and
terminates whenever a new object, not in the seed set and reachable from query source,
is discovered. Consequently, the sets are updated in line 10. Finally, the algorithm
terminateswheno
j
isaddedtotheseedssetoralltheintervalsinT
p
areprocessed.
Assume each cell of C
j
includes the trajectories of n
c
distinct objects on average
and each disk block contains b
c
cells of C
j
on average. Finally, assume T
′
p
=[t
1
,t] ⊆
T
p
=[t
1
,t
2
]isthesmallesttimeintervalduringwhichquerydestinationisreachablefrom
source. If query destination is not reachable from query source during T
p
, we assume
T
′
p
=T
p
. The following theorem proves the complexity of ReachGrid query processing
andindexconstruction.
Theorem4.2.1 ReachGrid can be constructed withO(|O||T|) IOs. The IO complexity
ofqueryprocessingisO(
|O||T
′
p
|
nc×bc
).
Proof The ReachGrid index can constructed by scanning the entire object trajectories
once and assigning trajectories to proper spatiotemporal grid cells. With the query pro-
cessing,thenumberofcellsretrievedfromdiskisupperboundedbythenumberofcells
inC
i
wherej≥i≥k.
The IO complexity of our query processing is much lower that of na¨ ıve approach
which retrieves all the objects trajectories location and time stamp pairs whose time
stamp overlap with T
p
. The na¨ ıve approach has the IO complexity of O(|O||T
p
|). Our
approach significantly outperforms na¨ ıve approach as|T
′
p
| ≤ |T
p
| and moreover, most
ofthecellsretrievedwithReachGridarethosewithobjectsreachablefromquerysource
andhencemanyobjectstrajectoriespairsarenotretrievedduringqueryprocessing.
28
Algorithm1ReachGridQueryProcessing
1: procedure QUERY PROCESSING(o
i
,o
j
,T
p
,I)
2: T
p
=(T
j
,T
j+1
,...,T
k
)
3: S
j
=o
i
◃Initializingtheseedset
4: C
S
j
=FindCells(S
j
,t) ◃Findthecellscontainingtheseed
5: C
S
j
=Update(C
S
j
,N
j
) ◃UpdateC
S
j
basedonthecellsinN
j
6: fori=j tok do
7: w=T
i
= [t
1
,t
2
]
8: repeat
9: (o
′
,t
′
) =R
C
S
i
(w)◃▹
d
T
R
C
S
i
(w) ◃o
′
̸∈S
i
andisreachablefromaseed
10: w=[t
′
,t
2
]
11: UpdateN
i
,C
S
i
andS
i
12: untilo
′
=NULLoro
′
=o
j
◃Terminationcondition
13: ifo
j
∈S
i
then
14: Return‘reachable’
15: endif
16: endfor
17: Return‘notreachable’
18: endprocedure
4.2.3 ReachGridOptimization
Inthissection,weproposetwooptimizationtechniquestofurtherimprovetheefficiency
of the ReachGrid query processing technique introduced in Section 4.2.2. To this end,
wefirstpresentabidirectionaltraversalapproachwhichtraversesReachGridinparallel
in two directions, one starting from query source and the other one from query desti-
nation. Next, we explain our technique which precalculate contacts during ReachGrid
constructiontoavoidthecomplexityofspatiotemporaljoinsatthequerytime.
BidirectionalTraversal
We first present the transitivity property which is the base of the bidirectional traversal
ofReachGrid.
29
Property4.2.2 [Transitivity]Supposeo
j
isreachablefromo
i
duringT
p
=[t
1
,t
2
]ando
k
is reachable from o
j
during T
′
p
=[t
′
1
,t
′
2
]. If t
2
≤ t
′
2
then o
k
is reachable from o
i
during
T
′′
p
=[t
1
,t
′
2
].
Subsequently, with the bidirectional traversal, two instances of ReachGrid traversal
areperformedinparallel. First,thequeryintervalT=[t
1
,t
2
]issplitintotwosubintervals
T
1
=[t
1
,
t
1
+t
2
2
]andT
2
=[
t
1
+t
2
2
,t
2
].Afterward,twoinstancesofReachGridtraversalareini-
tiated. ThefirstinstancetraversestheReachGridforallthecontactsduringT
1
andfinds
all the objects O
1
reachable from query source during T
1
. The second instance runs in
parallel and finds all the object O
2
which query destination is reachable from during
T
2
.Thequeryprocessingisterminatedandreachabilityisverifiedastrueifatleastone
object is visited during both traversals, i.e.,O
1
∩O
2
̸=∅. Otherwise, query destination
isnotreachablefromsourceduringT,i.e.,whenO
1
∩O
2
=∅.Notethattheindexstruc-
ture is the same of ReachGrid but to avoid confusion we term it BiReachGrid index.
The pseudocode for bidirectional ReachGrid traversal is presented in Algorithm 2. The
inputstothequeryprocessingmoduleisthesameasthatofReachGrid. Twopointerst
F
andt
B
areinstantiatedtoenableforwardandbackwardtraversals,respectively. Twosets
O
1
andO
2
areinitiatedwhichincludeobjectsreachablefromquerysourceandtoquery
destination, respectively. Afterward, Reachable(o
i
,t
1
,t
F
) and Reachable
R
(o
j
,t
B
,t
2
)
find all the objects reachable from o
i
during [t
1
,t
F
] and reachable to o
j
during [t
B
,t
2
],
respectively. Accordingly, the setsO
1
andO
2
are updated. The proceduresReachable
andReachable
R
areimplementedbasedontheReachGridalgorithmproposedinAlgo-
rithm2. ThequeryprocessingisterminatedeitherwhenO
1
∩O
2
hasatleastoneelement
orallthenecessarycontactsinqueryintervalareprocessed.
Given a reachability query q : o
i
Tp
; o
j
, the following theorem proves that bidirec-
tionaltraversalaccuratelyverifiesq.
30
Algorithm2BidirectionalQueryProcessing
1: procedure QUERY PROCESSING(o
i
,o
j
,T
p
,I)
2: t
F
=t
1
3: t
B
=t
2
4: O
1
=o
i
5: O
2
=o
j
6: whilet
F
≤t
B
do
7: O
1
=O
1
∪Reachable(o
i
,t
1
,t
F
)
8: O
2
=O
2
∪Reachable
R
(o
j
,t
B
,t
2
)
9: t
F
++
10: t
B
−−
11: ifO
1
∩O
2
=∅then
12: Return’reachable’
13: endif
14: endwhile
15: Return’notreachable’
16: endprocedure
Theorem4.2.3 BiReachGridaccuratelyverifiesthequeryq.
Proof WeprovetheaccuracyofAlgorithm2fortwocases:
• Case a : with this case, query destination is reachable from source during T
p
.
Withthiscase,thereisacontactpath(c
1
,...,c
n
)wherec
1
andc
n
haveo
i
ando
j
as
contactingobjects,respectively. Ifn≥ 2,thenthereisanobjecto
k
whichiseither
reachablefromo
i
duringT
1
=[t
1
,
t
1
+t
2
2
]orreachabletoo
j
duringT
2
=[
t
1
+t
2
2
,t
2
]and
thereforethealgorithmaccuratelyverifiesq.Withn=1theproofissimilar.
• Caseb : with this case, query destination is not reachable from source duringT
p
.
It is easy to prove by contradiction that the algorithm verifies q correctly in this
caseaswell.
Thiscompletestheproof.
31
The worst case complexity of Algorithm 2 is the same as that of ReachGrid in the
previous section (Theorem 4.2.1). However, on average Algorithm 2 significantly out-
perform Algorithm 1 as we experimentally show in Section 5.3.4. Note that the reverse
traversalofindexstructuredoesnotincuranyadditionalstoragecost.
PrecalculatingtheContacts
WithReachGrid,atthequerytimespatiotemporaljoinisperformedtoextractthesubset
of contacts which occurred during query interval. This process takes significant CPU
time for large query intervals. Therefore, here we propose to calculate and store the
contactsinformationbetweenobjectsondisk. Tothisend,ateachtimeinstancetifthere
isacontactbetweeno
i
ando
j
,thepairs(o
i
,o
j
)and(o
j
,o
i
)areintroducedandplacedin
the same cell as the location of their first component at time t. The pairs are placed in
ReachGridindexsimilartotheplacementofobjecttrajectoriesinSection4.2.3. Within
each cell, the pairs are placed on disk ordered by the time the contact corresponding to
thepairoccurred.
In many cases, a same contact occurs first at t and then repeated exactly over t
′
consecutivetimeinstances. Insuchcases,wecanmergeallsuchcontactsandrepresent
thembyatriple(o
i
,o
j
,t
′
)whichshowsthato
i
ando
j
whereinacontactwiththevalidity
intervalof[t,t+t
′
].Inthisway,wecansignificantlydecreasetheamountofinformation
storedondisk. Accordingly,atthequerytimethecontactswhicharenecessaryforquery
processingcanbereconstructed.
32
4.3 ReachGraph: A Connectivity-based Reachability
Index
Here, we first present ReachGraph index construction steps and thereafter discuss
ReachGraphqueryprocessing.
4.3.1 IndexConstruction
ToconstructtheReachGraphforagivencontactnetworkC,westartfromC andapply
a series of transformations to C that eventually converts it to the ReachGraph hyper
graph H
N
. The transformations are performed in two phases, namely reduction phase
andaugmentationphase. First,weobservethatinacontactnetworkC onecanidentify
disjoint subset of nodes, where all nodes in a subset are equivalently reachable or not
reachable to/from any other node v in C. Accordingly, at the reduction phase we pre-
computethesesubsetsandreduceallnodesineachsubset(alongwiththeirconnections)
to a single hyper node. We call the resulting hyper graph D
N
which is a significantly
reducedversionofC insize. Next,attheaugmentationphase,tofurtherimproveReach-
GraphweprecomputethereachabilitybetweenpairsofnodesinD
N
atpredefinedtime
intervals. We perform this precomputation at several time resolutions and accordingly
augment D
N
with a hierarchy of extra links to generate the ReachGraph hyper graph
H
N
. WithH
N
,areachabilityquerycanbeeffectivelybrokenintoasetofprecomputed
reachabilityqueriesforreal-timequeryanswering.
There are two principles in disk placement of H
N
vertices which can improve the
queryprocessing. First,anefficientplacementshouldplaceverticeswhicharereachable
to each other on a same disk block. In this way, while retrieving a vertex during the
query processing, a set of vertices which should be retrieved in the future are read and
buffered as well. Second, there is an order inherited in how the vertices of H
N
are
33
traversedduringqueryprocessingwhichshouldbeleveragedwhenstoringH
N
ondisk.
This ordering is enforced by the time order at which the contacts in the vertices ofH
N
areoccurred. WeexplainhowtoconsiderthesetwoprinciplesinstoringH
N
ondiskto
enableefficientqueryprocessing.
In the rest of this section, we first present our model for C as a so-called time
expandednetwork. Next,weexplaintheaforementionedtransformationsC
Reduction
−→ D
N
andD
N
Augmentation
−→ H
N
indetail. Finally,wediscusshowtostoreH
N
ondisk.
ContactNetworkModel
WerepresentacontactnetworkC withTimeExpandedNetwork(TEN)model[SX08].
TEN captures the time dependency of a network by including a separate instance of
the network at each time instance. Accordingly, each object o
i
at time instance t ∈
T is associated with a separate vertex o
i
(t). To capture contacts, a bidirectional edge
e=(o
i
(t),o
j
(t)) is introduced between o
i
(t) and o
j
(t) if they are in contact at time t.
Such an edge captures the fact that an item can transfer from o
i
to o
j
at t. Note that
we assume transfer delay is negligible and hence,e is bidirectional. Moreover, an edge
is introduced between vertices corresponding to the same object at consecutive time
instances,i.e.,anedgee
′
=(o
i
(t),o
i
(t+1))iscreatedbetweeno
i
(t)ando
i
(t+1)ateach
timet. In this case,e
′
is a directional edge which shows thato
i
can hold an item during
[t,t+1].WedefineagraphG
t
ofallverticesandedgesattimet,i.e.,G
t
=(V,E)where
V={o
i
(t)|o
i
⊆O},asasnapshotofC att.
Figure 4.4 (a) shows an example C which corresponds to the con-
tact network in Figure 1.1. With G
0
in Figure 4.4 (a), V={o
1
(0),o
2
(0),
o
3
(0),o
4
(0)} and E=(o
1
(0),o
2
(0)). It is easy to observe that o
j
is reachable from o
i
34
(a) C (b) D
N
Figure4.4: TENmodelofC (a)andthecorrespondingDAG(b)
duringT
p
=[t
1
,t
2
] if and only if there is a path fromo
i
(t
1
) too
j
(t
2
). This path is repre-
senting the contact path from o
i
to o
j
during T
p
. For example, in Figure 4.4 (a), o
4
is
reachablefromo
1
duringT
p
=[0,1]giventhepath(o
1
(0),o
2
(0),o
2
(1),o
4
(1)).
35
TransformingtheContactNetwork
Transforming by Reduction In the reduction phase, we perform two distinct steps
to convertC into a hypergraphD
N
with significantly smaller size. ReducingC makes
it more efficient to traverse for finding possible contact paths during query processing.
Notice that these reduction steps are lossless and preserve the accuracy of query pro-
cessing. wefirststatetwopropertieswhichareutilizedforreduction.
Property4.3.1 [Snapshot Symmetry] Ifo
j
is reachable fromo
i
during a time instance
t,i.e.,queryintervalT
p
=[t,t],o
i
isreachablefromo
j
atthesameinterval.
At the first step of the reduction phase, the idea is to precompute and materialize
the reachability between objects at each time instance t. According to properties 4.3.1
and 5.3.1, the connected components of C capture the set of objects that are reachable
fromeachotheratt.Forinstance,inFigure4.4(b),c
4
={o
2
(1),o
3
(1),o
4
(1)}whichcap-
turesthefactthatallobjectso
2
,o
3
ando
4
arereachablefromeachotherattimeinstance
t=1. Furthermore, if one object from a connected componentc∈ G
t
is reachable from
another object in a connected competent c
′
∈ G
t
′ during T
p
=[t,t
′
], then it is easy to
deduct from properties 4.3.1 and 5.3.1 that all object in c are reachable from all other
objects in c
′
during T
p
. Accordingly, at the first step of the reduction phase, we trans-
formC to a graphD
N
whose vertices are the connected components ofC. To this end,
first in everyG
t
∈ C we replace all the vertices within the same connected component
c by a single vertex represented by c. Suppose the collection of the connected compo-
nents of G
t
are denoted by C
t
. Next, we create an edge from every c ∈ C
t
to every
other c
′
∈ C
t+1
, if in C we find at least one edge from a vertex in c to a vertex in c
′
.
This transforms C into a directed acyclic graph (DAG) D
N
with significantly smaller
number of vertices and edges as compared toC while preserving reachability between
objects. WithD
N
,o
j
isreachablefromo
i
duringT
p
=[t
1
,t
2
]iftheconnectedcomponent
36
of o
j
(t
2
) is reachable from the connected component of o
i
(t
1
). Therefore to answer a
reachability query, we need to find the corresponding connected components of o
i
(t
1
)
and o
j
(t
2
) given o
i
(t
1
) and o
j
(t
2
) at the query time. As we explain later, we generate
and use external hash table H
t
for each time instance t ∈ T to locate the connected
componentcorrespondingtoeachvertexo
i
(t).
The second step of reduction phase merges identical connected components in con-
secutive G
t
s over time. If a set of objects O
′
⊆ O are reachable from each other (and
only from each other) during a time interval T
′
⊆ T, in D
N
they all belong to snap-
shotsofthesameconnectedcomponentduringT
′
.Therefore,tofurtherreducethesize
of D
N
we can keep one copy of such connected component during T
′
and consider it
as the connected component of objects inO
′
during the entireT
′
. For example, in Fig-
ure 4.4(b) c
5
and c
7
are snapshots of the same connected component during T
′
=[3,4]
and can be merged. To generalize, assume a set of connected components c
t
∈ C
t
,
c
t+1
∈ C
t+1
,...,c
t+n
∈ C
t+n
all have the same members O
′
, and T
′
=[t,t + n]. In
such a case, we removec
t
,...,c
t+n−1
and connect parent ofc
t
inD
N
(say a connected
component in G
t−1
denoted by d) to c
t+n
by a weighted edge e(n). We call e(n) an
aggregated edge where the weight captures the fact that for the next n time instances,
d is only reachable to objects inO
′
. Figure 4.5 showsD
N
from Figure 4.4(b) after this
stepofreduction. c
5
isremoved,c
4
andc
3
areconnectedtoc
7
byaggregatededgese(2)
ande
′
(2). This significantly shrinksD
N
, especially when the sampling rate for objects
positionsishighrelevanttotheobjectsmovingspeed.
TransformingbyAugmentation Inordertofindapathbetweentwoconnectedcom-
ponentsc
i
∈ C
t
andc
j
∈ C
t
′, we can simply expandD
N
starting fromc
i
and check if
wecanfindapaththatreachesc
j
.AlthoughD
N
ismuchsmallerthanC,suchexpansion
37
Figure4.5: D
N
attheendofreductionstep
Figure 4.6: D
N
3
for H
N
whose D
N
1
is the
graphinFigure4.5
can still take a long time to terminate. Hence, we propose to precompute reachability
betweencertainverticesofD
N
toenablequicktraversalofD
N
.
Inparticular,weproposetoprecomputereachabilityduringdifferentpredefinedtime
intervals. To this end, we break T into a set of disjoint intervals I
1
,I
2
,...,I
n
with
equal length L, and precompute reachability between vertices in C
ta
and C
t
b
for each
I
i
=[t
a
,t
b
]. Accordingly, D
N
is augmented with a new directed edge from every con-
nected component c ∈ C
ta
to every other connected component c
′
∈ C
t
b
if there is a
path of lengthL fromc toc
′
. We call such edges the long edges and weight them byL
which indicates the number of time instances that encompass. The resulted augmented
hyper graphH
N
can be considered as the union ofD
N
with a new graph consisting of
long edges each with a weightL. We term the latter graph contact network at theL-th
resolutionanddenoteitbyD
N
L
.Accordingly,D
N
canbeconsideredasthecontactnet-
workatfirstresolutionorD
N
1
.Onecanextendthisideaandprecomputereachabilityat
othertimeintervalstogenerateamulti-resolutiongraphH
N
=D
N
∪D
N
L
1
∪...∪D
N
Ln
.
However, this can significantly increase the number of edges if overdone and hence
adversely reduce the efficiency of query expansion. In Section 5.3.4, we experimen-
tallyselecttheoptimalresolutionsforH
N
.Figure4.6depictsD
N
3
whereD
N
shownin
Figure4.5.
38
DiskPlacement
Wedistinguishtwocasesin traversalofH
N
whichis adisk-resident hypergraph. With
the first case, internal memory can hold c×|V(H
N
)| values where V(H
N
) is the set
of vertices in H
N
and c is a small constant (c ≈ 12.375) [SAM02]. In this case, it is
possible to construct the DFS tree of the graph and maintain it in the internal memory
and traverse it during query processing to verify reachability. With the second case, the
aforementionedassumptiononthenumberofverticesdoesnotsatisfy. Forthiscase,we
adopt the idea of external BFS presented in [MM02] to enable efficient retrieval of ver-
tices during query processing. With [MM02], the authors partition the input graph into
smallsubgraphsandstoretheadjacencylistsoftheverticesinasubgraphcontiguously
on the disk. The input graph is partitioned by choosing master vertices independently
and uniformly with the probability of µ and running a BFS from these master vertices
“inparallel”. TheexpectednumberofseednodesisO(µ|V|)whereV isthesetofgraph
vertices and the optimal value ofµ isµ=min{1,
√
|E|+|V|
|V|B
}, whereE andB are the set
of edges of the input graph and block size, respectively. Creating a partition allows for
retrievalofmultipleverticessimultaneouslyfromthediskwhichotherwiseneedsmulti-
pleseparateIOs. Similarto[MM02], wepartitionH
N
andplacetheverticeswithinthe
samepartitiononconsecutivediskblocks. However,weadoptthetechniquefordirected
graphsasH
N
isaDAG.Tothisend,wefirstsorttheverticesofH
N
intopologicalorder
whichisthesameorderinwhichH
N
istraversedduringreachabilityqueryprocessing.
NoticethatfindingsuchorderistrivialasH
N
verticesarecreatedoverT intopological
order (vertices inC
i
are generated before that ofC
i+1
). Afterward, from each vertexv
we find all the vertices U with the shortest distance of at most d
p
from v, i.e., vertices
at the depth of at mostd
p
fromv. The set of vertices inU∪v are reachable fromv and
forms a partition p
v
. We term v the root of p
v
. We iterate over the vertices and create
a partition rooted at a vertex u if u is not already assigned to a partition. Notice that
39
Figure4.7: ReachGraphforthecontactnetworkinFigure4.5
onlytheedgesinD
N
areconsideredincreatingthepartitionsandhencelongedgesare
ignored during the partitioning process to preserve the temporal locality of graph ver-
tices within the same partition. The partitions are placed on disk in the same order they
aregenerated.
ThefinalindexforourrunningexampleisshowninFigure4.7wherethelongedges
aredenotedbye
i
(3),i = 1,...,4,andtheaggregatededgesbye(2)ande
′
(2).Thehyper
graph H
N
and the hash tables which associate objects with the partitions of H
N
are
located on the disk. Each hash tableH
t
locates the partition which containso
j
(t) given
objecto
j
andthetimeinstancet.Inthisexample,fivepartitionsp
0
,p
1
,...,p
4
aregener-
atedwheretheirconnectedcomponentmembersare{c
0
,c
3
,c
4
},{c
1
},{c
2
},{c
6
,c
8
,c
9
}
and{c
7
}, respectively. The members of the connected components are placed within
the vertices of H
N
as we discuss in next section. Although not shown in the figure,
we store the reverse graph ofD
N
1
on disk as well, i.e., ife=(u,v)∈ D
N
1
then we add
e = (v,u) toH
N
. This enables efficient bidirectional traversal of H
N
as we discuss in
the next section. Finally, a hash table is stored in main memory to enable fast lookup
ofH
t
for a givent and consequently finding the partition ofH
N
which includes query
source(destination)atthebeginning(end)ofqueryintervalondisk.
40
4.3.2 QueryProcessing
Consider a reachability query q:o
i
Tp
; o
j
where T
p
=[t
1
,t
2
]. To process q, one can first
findtheverticesv
1
andv
2
inH
N
whichrepresentingtheconnectedcomponentofo
i
and
o
j
att
1
andt
2
, respectively. Afterward, starting fromv
1
,H
N
can be traversed either by
BFS or DFS techniques to visit all the verticesat the depth of at most|t
2
−t
1
| fromv
1
.
o
j
is reachable fromo
i
duringT
p
if and only ifv
2
is among the visited vertices. Unfor-
tunately, this approach may visit a huge number of vertices specially when t
1
≪ t
2
.
Here, we propose two powerful ideas which significantly reduce the number of visited
vertices duringH
N
traversal. First, we leverage multi-resolution index to traverseH
N
.
Consequently,wheneverpossiblethelongedgeswiththelargestweightsaretakendur-
ing traversal (the traversal is performed on the higher resolutions first) to enable fast
traversal ofH
N
. Second, motivated by transitivity property 5.3.1, we traverseH
N
from
both directions to find a possible contact path between query source and destination
faster. Particularly,H
N
istraversedforwardstartingfromquerysourceandinparallelit
istraversedbackwardonthereverseofD
N
startingfromquerydestination. Thetraver-
sal is terminated in two cases. Either, an object which is reachable from query source
and reachable to query destination is found, orH
N
is traversed in both directions until
thebidirectionaltraversalstopsatthemiddleofthequeryinterval.
Counterpart to traversal algorithm for memory-resident graphs, external graphs
traversal algorithms are studied in the literature as well [MM02, Vit08]. We denote
external BFS and DFS by E-DFS and E-BFS, respectively. Although both E-DFS and
E-BFS can be adopted to traverse H
N
, we adopt E-BFS to enable bidirectional traver-
sal of H
N
. Accordingly, our ReachGraph query processing works by performing E-
BFS in parallel fromv
1
andv
2
where the search fromv
2
traverses the reverse graph of
D
N
1
. Assume the set of objects in vertices visited during forward traversal, i.e., traver-
sal originating from v
1
, is denoted by O
F
. Accordingly, we denote the set of objects
41
in vertices visited during backward traversal by O
B
. The traversal is terminated either
when O
B
∪
O
F
becomes non-empty or when all the vertices reachable from o
i
during
[t
1
,
(t
1
+t
2
)
2
] and reachable to o
j
during [
(t
1
+t
2
)
2
,t
2
] are traversed. In the first case query
destinationisreachablefromsourcewhilethisisnottrueforthelattercase. Apartition
is retrieved and buffered during traversal to enable in-memory lookup of some of the
future vertices. Older partitions in memory can be discarded when there is not enough
space for new partitions. During forward traversal, if a vertex is connected to long
edges, the edges with the largest weight are traversed and the other edges are ignored.
WetermthisapproachBidirectionalMulti-resolutionBFSorBM-BFS.Thepseudocode
ofBM-BFStechniqueispresentedinAlgorithm3. Thealgorithmfirstfindsthevertices
v
1
,v
2
∈ H
N
in lines 2-3. The function FindVertex(p,o,t) gets a partitionp, an objecto
and a time instancet and returns the vertex ofH
N
which containso(t). Afterward, two
queuesareinitializedfortheforwardandbackwardtraversaloftheinputgraphinline4.
O
F
andO
B
arealsoinitializedinline5.Wedenotethesetofobjectwhoseinstancesare
included inv byO
v
. The algorithm runs forward (line 7) and backward traversal (lines
8)inparallelbyrunningProcessQueueprocedureuntilbothQ
F
andQ
B
becomeempty
or reachability is verified. With ProcessQueue procedure, the vertex v
h
in the head of
eitherqueueisextractedinline 2.EachobjectinO
v
h
isexaminedtocheckwhetheritis
alreadyvisitedinthereversetraversal(lines 5-8). Ifthisisthecase,querydestinationis
reported reachable from query source. Afterward, each childrenv ofv
h
is added to the
traversalqueuetoenablethenextstepsoftraversal. Child(v,direction)procedurereturns
the edges at the highest resolution originating fromv whose end points are the vertices
representing time instance t ∈ [t
1
,
t
1
+t
2
2
] and t ∈ [
t
1
+t
2
2
,t
2
] for forward and backward
traversaldirections,respectively. ThefollowingprovesthecorrectnessofBM-BFS.
Theorem4.3.2 BM-BFSverifiesthereachabilityfromquerysourcetodestinationdur-
ingqueryinterval.
42
Algorithm3BM-BFS
1: procedure BM-BFS(o
i
,o
j
,T
p
= [t
1
,t
2
],H
N
)
2: v
1
=FindVertex(H
t
1
(o
i
),o
i
,t
1
)
3: v
2
=FindVertex(H
t
2
(o
j
),o
j
,t
2
)
4: Q
F
.push(v
1
),Q
B
.push(v
2
)
5: O
F
.add(O
v
1
),O
B
.add(O
v
2
)
6: while !Q
F
.isEmpty()||!Q
B
.isEmpty()do
7: ProcessQueue(O
B
,Q
F
,F)
8: ProcessQueue(O
F
,Q
B
,B)
9: endwhile
10: returnfalse
11: endprocedure
1: procedure PROCESSQUEUE(O,Q,direction)
2: if !Q.isEmpty andv
h
=Q.pop()isnotvisitedbeforethen
3: foro∈O
v
h
do
4: ifO.contains(o)then
5: returntrue
6: endif
7: endfor
8: forc∈Child(v
h
,direction) do
9: Q.add(c)
10: endfor
11: endif
12: endprocedure
Proof First, assume that H
N
only includes one resolution, i.e., D
N
1
. H
N
is a DAG
whose vertices are topologically sorted and time stamped. The forward traversal vis-
its all the vertices representing contacts with validity interval subset of [t
1
,
t
1
+t
2
2
] and
reachable from query source. Accordingly, the backward traversalvisits all the vertices
representing contacts with validity interval subset of [
t
1
+t
2
2
,t
2
]. Therefore, if a path p
from v
i
to v
j
exists, then the vertices in p are discovered after forward and backward
traversal ofH
N
. In addition, the vertices inp are time stamped and therefore, the order
of vertices in p are preserved during traversal of H
N
. When we consider long edges
during traversal, some may not visited. However, general connectivity of the graph is
preserved at all the resolutions and therefore by taking long edges the query can still
43
verified correctly. Also, based on the transitive property 5.3.1 the early termination
conditionaccuratelyterminatesthetraversal. Thiscompletestheproof.
Assume that each partition includes instances of n
p
distinct objects and each disk
block holds b
p
partitions on average. The following theorem proves the complexity of
ReachGraphqueryprocessingandindexconstruction(|T
′
p
|isdefinedinTheorem4.2.1).
Theorem4.3.3 The ReachGraph index can be constructed with O(|O||T|) IOs. The
queryprocessingIOcomplexityisO(
|O|||T
′
p
|
np×bp
).
Proof AssumethattheobjecttrajectoriesduringT aresortedbasedonthetimestampof
locationvectorandtimestamppairsoftrajectoriescomponents. TheReachGraphindex
can constructed by scanning the trajectories while sliding a window on the trajectories
to build the multi-resolution graph as well as perform partitioning at the same time.
With the query processing, the number of IOs are bounded by the complexity of BFS
on ReachGraph. The BFS complexity on a graph G = (V,E) where V and E are the
set of vertices and edges, respectively, is O(|V| +|E|). With H
N
each edge is created
between vertices in vertexv ∈ G
t
andu∈ G
t+1
when an objecto is present in bothu
and v. With ReachGraph, the summation of edges and vertices for a time interval T is
bounded by 2|O|T. In addition, we assume with one random IO a partition is retrieved
fromdisk. Consequently,thecomplexityresultsinTheorem4.3.3areobtained.
GRAIL[YCZ10]isoneofthemostefficientgraphreachabilityapproachesformem-
ory resident graphs. It works based on the idea of randomized interval labeling of
graph vertices. Table 4.1 compares the index construction and query time complex-
ity of ReachGrid and ReachGraph with that of GRAIL when adopted on disk-resident
D
N
to process reachability queries. Our approaches significantly outperforms GRAIL
because of efficient disk placement and also early termination of queries (|T
′
p
|≤|T
p
|).
With GRAIL, d is a small constant and it is the number of intervals assigned to each
44
GRAIL ReachGraph ReachGrid
QueryTime O(|O||T
p
|n
r
) O(
|O||T
′
p
|
np×bp
) O(
|O||T
′
p
|
nc×bc
)
ConstructionTime O(d|O||T|) O(|O||T|) O(|O||T|)
Table4.1: ComplexityComparison
graph vertex. n
r
is the average number of objects which are reachable from any object
o∈O ateachtimeinstancet∈T.
4.4 Experiments
We perform our experiments on both synthetic and real datasets modeling the contacts
betweenmovingobjectswhichareeithervehiclesorindividuals. Oursyntheticdatasets
aregeneratedbytwodifferentdatagenerators. Thefirstdatagenerator,GMSF[BLS08],
modelsthemovementofindividualsinanenvironmentof100km
2
assumingtheirmove-
ment patterns follow random waypoint model with the average speed of 2m/s. The
trajectoriessamplesarecapturedevery 6seconds. Randomwaypointisoneofthemost
usedmodelsinliteraturetomodelindividuals’movement. Withthismodel, everyindi-
vidual selects a random destination and speed and then moves toward that destination.
Afterward, she selects another random destination and moves toward it [MPGW07].
The second data generator is the Brinkoff generator which is commonly used for gen-
erating realistic moving objects trajectories [Bri03]. We generated the trajectories of a
constant set of vehicles moving on the road network in San Francisco city covering an
area of approximately 300km
2
. The vehicles locations are recorded on average every
5 seconds. The reason of using two different synthetic data generators is to study the
difference between the case of reachability query processing for different categories of
moving objects, i.e., individuals and vehicles. In particular, vehicles are restricted to
moveonaroadnetworkwhileindividualscanmovetoanyenvironmentpoint. Withthe
45
Dataset Size
RWP
10k
190GB
RWP
20k
380GB
RWP
40k
760GB
VN
1k
23GB
VN
2k
46GB
VN
4k
92GB
Table4.2: DataCollectionSize
MemorySize 4GB
DiskSize 5diskseach 1.36TB
OS Windows 7SP1 64-bit
CPU 3.34GHz
PageSize 4kb
Table4.3: SystemSpecifications
firstsyntheticdatageneratorwegenerate1000,2000and4000vehiclestrajectories. We
denote these datasets by VN
1k
, VN
2k
and VN
4k
, respectively, and term the collection,
VNdatasets. Withthesecondsyntheticdatagenerator,wegenerate10,000,20,000and
40,000 individuals’ trajectories. We term these datasets RWP
10k
, RWP
20k
and RWP
40k
,
respectively, and call the set of these datasets, RWP datasets. The reason of generating
moreobjectstrajectorieswiththeseconddatasetisthattheobjectsaredistributedinthe
entirespacewiththesecondgeneratorasopposedtothefirstgeneratorinwhichobjects
only move on the road network. With both generators, we generate trajectories for the
duration of four months (more than 119 days). Accordingly, RWP and VN datasets
includemorethan1,700,000and2,048,000timeinstances,respectively. Thesizeofthe
dataforeachdatasetisrepresentedinTable4.2.
Our real dataset captures the movements of vehicles in the city of Beijing. This
dataset covers the GPS tracks of more than 2500 distinct vehicles collected during a
day. The vehicles’ GPS tracks cover an area of approximately 600km
2
. The vehicles
locations are recorded every minute and further interpolated to reflect the locations for
every five seconds. Unfortunately, because of the small scale of this datasets we only
useitinasubsetofexperiments.
Our experimental system specification is presented in Table 4.3. For each experi-
mentsetting, werunthealgorithm 400 timestocomputetheaveragevalues. Thequery
sources, destinations are selected randomly and query interval is selected as a random
46
intervalwherethelengthoftheintervalisarandomnumberbetween150and350unless
otherwisestated. Wepresumevehiclesarecontactingeachotherbycommunicatingover
DSRC protocol which has the effective range of 300 meters. Accordingly, we assume
individuals are making contacts by communicating over Bluetooth protocol which has
the typical range of 25 meters. Therefore, we set d
T
=25 for RWP and d
T
=300 for VN
datasets.
Finally,tomeasuretheperformanceofreachabilityqueryprocessingwemeasurethe
numberofrandomIOs. Hence,thesequentialIOsarenormalizedtorandomaccessesby
assuming that each random access costs as much as 20 sequential accesses [CMTV04].
Notice that these numbers are system dependent, however, the general trends in the
results should be obtained for machines with the different settings as well. The rest of
this section is organized as follows. We first evaluate the efficiency of ReachGrid and
ReachGraph approaches, respectively. Thereafter, we present the empirical compari-
son between ReachGrid and ReachGraph. Finally, we present the comparison between
ReachGraphandgraphreachability.
4.4.1 ReachGrid
In this section, we first focus on the efficiency of the index construction and then query
processingstepofReachGrid.
IndexConstruction
The performance of the ReachGrid depends on the resolution of temporal and spatial
gridswhichquantizetimeintervalT andenvironmentE,accordingly. Thereisatradeoff
in selecting both temporal and spatial resolutions. By increasing any of the resolutions,
thenumberofrandomaccessestodiskblocksincreaseswhenprocessingareachability
queryandhencethenumberofIOsincreases. Thereasonisthatthelocalityintimeand
47
spaceisnotfullyleveraged. Ontheotherhand,decreasingtheresolutionofgridsresults
in placement of huge number of trajectory segments within a grid cell. As the result,
many trajectory segments which are irrelevant for query processing are processed for
eachreachabilityquery. ThisincreasesthenumberofIOsduringqueryprocessing.
Here, we empirically optimize the grids resolutions by varying both temporal and
spatialgridsandselectingacombinationwhichminimizesthenumberofIOswhenpro-
cessingreachabilityqueries. Therearehugepossiblenumberofvaluesforthecombina-
tion of temporal and spatial resolutions, and therefore, we assume the same resolution
for all the spatial grids C
i
to reduce the number of possible combinations. We vary
temporal resolution from 5 to 80 for both datasets and spatial resolution from 128m to
10km (17km) for RWP (VN) datasets and select a combination which minimizes the
numberofIOswhileprocessingreachabilityqueries. Wedenotetheoptimalspatialand
temporal resolutions byR
S
andR
T
, respectively. With RWP datasets,R
S
=1024m and
R
T
=20 and accordingly, with VN datasets, R
S
=17km and R
T
=20. With VN datasets,
the optimal ReachGrid indexes have lower resolutions than that of RWP datasets. The
reason is that VN datasets capture the movement of fewer objects as compared to RWP
datasets and hence spatial grids are larger to place more objects within the same cell.
Figures5.5(a)and(b)showhowIOcountvarieswhentemporalandspatialresolutions
vary for RWP datasets, respectively. With Figure 5.5 (a) temporal resolution is 20 and
withFigure5.5(b)thespatialresolutionequals 1024m.VNdatasetsresultsalsofollow
thesamepattern.
We also measured the time required to construct the optimal ReachGrid indices.
TheresultsareshowninFigures4.10(a)and(b)forRWPandVNdatasets. Thex-axis
showsthelengthoftimeperiodT overwhichReachGridindexisconstructed. Allthese
intervalssharethesamestartingpointbutdifferentendingpoint. Overallthecases,the
48
(a) IOcountvs. spatialgridresolution (b) IOcountvs. temporalgridresolution
Figure4.8: ReachGridresolutionsoptimization
(a) RWPdatasets (b) VNdatasets
Figure4.9: ReachGridconstructiontime
index construction time is less than 4.3 hours. As expected, increasing the number of
objectsanddurationofT makesindexconstructionslower.
QueryProcessing
Here, we compare the efficiency of ReachGrid basic traversal (Section 4.2.2), termed
UniReachGridwith that of bidirectional traversal(Section 4.2.3), termed BiReachGrid.
To this end, we count the number of IOs for random queries generated with the size
of 100,300 and 500 query intervals. The results are shown for RWP
20k
and VN
2k
datasets. Overall, BiReachGrid outperforms UniReachGrid for 54% over all the cases
on average. The reason is that with BiReachGrid the traversal is performed in two
directions simultaneously and there is a high chance that an object is visited in both
49
(a) RWP
2
0k (b) VN
2
k
Figure4.10: ReachGridQueryProcessing
directions. However, with UniReachGrid, the traversal is terminated only when the
query destination is visited or all the contacts between objects reachable from query
sourceduringqueryintervalareprocessed.
4.4.2 ReachGraph
Here, we first study the efficiency of index construction and afterward the online query
processing approaches of ReachGraph. Next, the experiments on the performance of
constructionofmulti-resolutionispresented.
IndexConstruction
In this section, we first focus on evaluating the efficiency of index construction for the
basic contact network (D
N
) and afterward, evaluate the efficiency of the augmentation
step. WeconcludethissectionbystudyingtheplacementofReachGraphondisk.
Contact Network Size Here, we empirically measure the contact network size by
counting the number of vertices (|V|) and edges (|E|) of contact network (D
N
) when
generating contact network for different time intervalsT. The results for RWP datasets
areshowninFigures5.8(a)and(b)foredgesandvertices,respectively. Theresultsfor
50
(a) (b)
Figure4.11: Contactnetworkedges(a)andvertices(b)
VNdatasetsfollowthesimilarpattern. Thex-axisrepresentsthelengthoftimeinterval
T during which the contact network is constructed assuming that all time intervals are
starting from the same time instance, i.e., T=[0,|T|]. As expected, both |E| and |V|
increase when|T| increases. The reason is that the number of contacts and accordingly
thenumberofedgesincreasewhen|T|increases. Accordingly,thenumberofedgesand
vertices increases when the number of objects increases as well. The most important
observation from this experiment is that the contact network can become prohibitively
large to reside in the main memory. In particular, the number of edges and vertices are
morethan 17,466and 10,545millionforRWP
40k
,respectively.
Next, we measure the efficiency of the reduction step (Section 4.3). To this end, we
compare the number of vertices and edges ofC
N
and that of D
N
for the same settings
of the experiments in this section. With RWP datasets, over all the cases on average
thenumberofvertices(edges)ofD
N
are 81%(80%)lessthanthatofC
N
,respectively.
Similarly, with VN datasets, over all the cases the number of vertices (edges) of D
N
are 64% (61%) less than that of C
N
. The results show that reduction step significantly
reducesthesizeofcontactnetworkmodeledbyTEN.
51
(a) RWPdatasets (b) VNdatasets
Figure4.12: Contactnetwork(D
N
)constructiontime
Contact Network Construction Time In this section, we measure the construction
timeofD
N
fordifferenttimeintervalsT. TheresultsareshowninFigures4.12(a)and
(b) and for RWP and VN datasets. For all datasets, increasing the number of objects
and|T| increases the construction time. The reason is that more contacts needs to be
processed in order to create the contact network. With our experimental setting, the
construction time for all datasets is less than 14 days. Although this running time is
large,itreflectsthetimeittakestoconstructtheentirecontactnetworkoverT.However,
itisalsopossibletoconstructthecontactnetworkincrementallyovertimebyacquiring
the objects positions at new time instances and appending corresponding new vertices
andedgestothepreviouslyconstructedcontactnetwork.
Multi-resolutionGraph Inthissection,westudytheperformanceofconstructingthe
contact network at various resolutions. To this end, we measure the average degree of
vertices of H
N
at different resolutions (D
N
2
,D
N
4
,...,D
N
32
). The average degree for
D
N
i
only considers vertices which have at least one edge atN
i
th resolution. Table 4.4
shows the results for RWP
40K
and VN
4k
which have the largest number of objects
among RWP and VN datasets and alsoVN
R
which corresponds to our real dataset. As
52
Resolution VN
4k
RWP
40k
VN
R
D
N
2
2.9 3.0 1.5
D
N
4
6.1 8.1 1.7
D
N
8
16.3 33.4 2.3
D
N
16
55.5 75.6 3.69
D
N
32
221.4 322 9.0
Table4.4: AveragevertexdegreeforD
N
i
the contact network resolution increases, the average degree of vertices in the corre-
sponding resolution increases. The reason is that over larger time intervals, objects are
reachable from more objects and hence more long edges are introduced at higher res-
olutions. VN
R
has significantly smaller average vertex degree than the other datasets.
The reason is that the size of contact networkD
N
for this dataset is much smaller than
thatofotherdatasets. WedecidetheoptimalnumberofReachGraphresolutionslaterin
thecurrentsection.
DiskPlacement Here,weempiricallyoptimizetheplacementofmulti-resolutioncon-
tactnetworkgraphondisk. ReachGraphhastwoparameters,i.e.,thenumberofresolu-
tionsandthedepthofpartitioning,whichneedstobeoptimizedinordertoconstructand
place the index on disk. Here, we empirically find the optimal values for both parame-
ters. To this end, we vary partitions depths from 1 to 64 and the number of resolutions
from 1 to 7 and count the number of IOs for both datasets. Based on our experiments,
the optimal partitions depth and the number of resolutions are 32 and 6, respectively,
i.e.,d
p
=32andH
N
=D
N
1
∪D
N
2
∪...∪D
N
32
.
Figure4.13showshowchangingthedepthofpartitionsvariesthenumberofIOsfor
RWP
20k
andVN
2k
datasetswhenprocessingreachabilityqueries(H
N
includescontact
network at the first six resolutions). Increasing the depth of partitions gives the oppor-
tunity to buffer more vertices which will be visited in the future and hence reduces the
53
total number of IOs. On the other hand, if the partitions become too large then many
vertices redundant for query processing are retrieved from disk which will deteriorate
the performance of query processing. Therefore, there is a trade-off between partitions
depth and IOs count. Similar trade-off is present between the number of ReachGraph
resolutionsandIOscount.
QueryProcessing
Here, we evaluate the efficiency of online ReachGraph query processing. The goal of
this experiment is to study how bidirectional traversal and multi-resolution index con-
structiontechniquesimprovetheperformanceofReachGraph. Tothisend,wecompare
the efficiency of bidirectional multi-resolution traversal (BM-BFS) approach with bidi-
rectional traversal (B-BFS) and external DFS (E-DFS) approaches. B-BFS traverses
H
N
similar to BM-BFS but only at the single resolution of D
N
. E-DFS is the na¨ ıve
approach which only checks whether there is a path on H
N
from query source to des-
tination during query interval. We select E-DFS as the baseline approach as it is faster
thanE-BFS.NoticethatE-DFSdoesnotinvestigatethemembersoftheconnectedcom-
ponentsasopposedtoBM-BFSandB-BFSandthereforeitonlyfindsthecontactpaths
with the length of query time interval. The results for RWP
20k
and VN
2k
are shown
in Figure 4.14. BM-BFS is outperforming E-DFS and B-BFS for more than 80% and
15%, respectively, for both datasets. The reason is that it leverages long edges to make
traversal faster and at the same time investigates the objects within connected compo-
nents to stop the traversal as soon as a contact path is found between query source and
destination. B-BFS also outperforms E-DFS significantly because of terminating graph
traversalassoonasacontactpathisdiscoveredbetweenquerysourceanddestination.
54
Figure4.13: IOcountvsdifferentpartitiondepths
Figure4.14: ReachGraphonlinequeryprocessingfordifferentapproaches
4.4.3 ReachGridvs. ReachGraph
Inthissection,wecomparetheefficiencyofReachGrid(bidirectionalReachGridtraver-
saldenotedbyBiReachGrid)andReachGraph. Wegeneraterandomquerieswithvary-
ingqueriesintervalsof 100,300and500timeinstancesandcomparethenumberofIOs
for ReachGrid (BiReachGrid) and ReachGraph (BM-BFS) approaches. The results are
shown in Figure 4.15 (a) and (b) forRWP
20K
andVN
2K
datasets, respectively. Based
onourresults,ReachGridapproachoutperformsReachGraphforthecasesinwhichthe
query interval is small. The reason is that with such cases, a small portion of contact
network should be traversed which is placed on consecutive blocks on disk and can
efficiently retrieved from disk by ReachGrid. Another important observation is that in
additiontothequeryintervalsize,thedistributionofobjectsalsoaffectstheperformance
55
(a) RWP
20k
(b) VN
2k
Figure4.15: ReachGridvs. ReachGraph
(a) RWP
20k
(b) VN
2k
Figure4.16: CPUtime
of ReachGrid. With VN
2k
dataset, the objects are located on road network and within
the small portion of entire environmentE as opposed toRWP
20k
dataset for which the
objects are almost uniformly distributed inE. As the result, withVN
2k
dataset Reach-
Grapth approach significantly outperforms ReachGrid (on average 39%). The reason
is that ReachGrid spatial grid cannot leverage spatial locality for non-uniform objects
distributions.
We also compare the CPU time of both approaches which is the time it takes by the
algorithms while ignoring retrievals from disk. The result is shown in Figure 4.16 for
RWP
20k
. As expected, ReachGraph has lower CPU time because of precalculating the
entireconnectivityoftheobjects. ThetrendissimilarforVN
2k
.
56
(a) MemoryResidentContactDatasets
Dataset GRAIL RG
(runtime) (runtime)
VN
2k
3.5ms 9.0ms
RWP
20k
60ms 39ms
(b) DiskResidentContactDatasets
Dataset GRAIL RG
(IOCount) (IOCount)
VN
2k
213 49
RWP
20k
6790 570
Table4.5: GRAILvs. ReachGraph(denotedbyRG)
4.4.4 ComparisonwithGraphReachability
Here, we compare ReachGraph query processing with the existing graph reachability
techniques. In particular, we compare our approach with GRAIL [YCZ10]. First, we
consider contact datasets which reside in memory. We compare the performance of
ReachGraphandGRAILonRWP
20K
andVN
2K
contactdatasetswith|T|=1000,which
arememoryresidentdatasets. GRAILtakesD
N
asinputandverifieswhetherthequery
source is reachable to the query destination. Table 4.5 (a) shows the results of this
comparison in terms of runtime for random queries with the interval length of 300.
GRAIL converges to simple DFS for reachability queries when source and destination
arereachable. Therefore,ourapproachoutperformsGRAILforVN
2K
whilethisisnot
the case for RWP
20K
because of the existence of more pairs of reachable objects in
VN
2K
thanRWP
20K
.WithRWP
20K
,GRAILis 30%fasterthanReachGraph. Insum,
weconcludethatourapproachiscomparablewithGRAILformemoryresidentcontact
datasets.
Next,weadoptGRAILfordisk-residentcontactdatasetsandsubsequentlycompare
theperformanceofGRAILandReachGraphintermsofnumberofIOsfordisk-resident
contact networks. To this end, we issue the same queries but on the disk resident con-
tact datasets. We assume that with GRAIL the vertices are placed on disk in the same
order they are generated during contact network construction. The results are shown
57
in Table 4.5 (b). As expected, our approach significantly outperforms GRAIL for disk-
resident datasets. In particular, it outperforms GRAIL for 76% and 88% forVN
2K
and
RWP
20K
datasets,respectively.
58
5
ReachabilityQueryProcessingin
ContactNetworkswithConstraints
5.1 Introduction
Reachability query processing in contact networks with no constraints, where no appli-
cation constraint is defined during processing a reachability query, is studied in Chap-
ter 4. In particular, we proposed two index structures for indexing contact networks,
namely ReachGrid and ReachGraph. Consider a reachability query which verifies
whether an object (query source) can reach another object (query destination) through
thecontactnetwork,ifweconsideronlythecontactsoccurringduringagivenquerytime
interval (query interval). With ReachGrid, our approach is to compute reachability on-
the-fly by expanding the contact network starting from the query source. Subsequently,
to construct ReachGrid we propose a spatiotemporal grid to index all contacts in the
contact network dataset into distinct spatiotemporal localities. At the query time, this
index is used to guide on-the-fly expansion of the contact network to verify reachabil-
ity. WithReachGraphweusethealternativeapproachofprecomputingthereachability
between objects. It is impractical to precompute reachability for all combinations of
query source, destination and interval. Therefore, we propose to precompute reacha-
bility query only for carefully selected combinations of query source, destination and
interval, and leverage these combinations to compute reachability for all other combi-
nations on-the-fly. In turn, at the query time this allows recursively breaking the given
59
reachability query to a set of precomputed reachability queries for efficient query pro-
cessing. Based on our experimental results, ReachGrid is outperforming ReachGraph
whenquerytimeintervalissmall,andviceversa.
Inthischapter,wefocusonreachabilityqueryprocessingincontactnetworkswhere
there are additional constraints imposed by the application. We first present a cate-
gorization of different contact networks constraints for reachability query processing.
Thereafter, we focus on the latency constraint and adopt ReachGraph and ReachGrid
indexstructuresforefficientreachabilityqueryprocessing. Withcontactnetworkswith
latency constraint, an object can only transmit an item at least L time instances after
receivingit. Studyingtheproblemofreachabilityqueryprocessingincontactnetworks
withlatencywillgiveusinsightsonhowtoconstructefficientindexstructuresforreach-
abilityqueryprocessingincontactnetworkswithgeneralconstraints.
We adopt ReachGraph and ReachGrid for contact networks with latency in two
steps. First, we show how one need to modify index construction such that the indexes
work for contact networks with latency. We show that some steps of ReachGraph con-
structionproposedinChapter4arenotapplicabletocontactnetworkswithlatency. The
ReachGraph index proposed in Section 4 outperforms ReachGrid when query inter-
val is large. Therefore, we also improve the efficiency of ReachGrid index structure
by proposing a new disk placement approach which better leverage objects trajecto-
ries locality by locating ReachGrid cells based on Hilbert curve placement rather than
the row-wise placement technique proposed in Chapter 4. A space filling curve is a
function which maps a multi-dimensional space into a one dimensional space. Hilbert
spacefillingcurveisstudiedextensivelyintheliteratureandprovedthatitoutperforms
otherapproachesintermsofpreservingthelocalityofobjects[Jag97]. Next,wedesign
an efficient query processing technique in contact networks with latency by proposing
60
a bidirectional contact network traversal which enables pruning a huge part of con-
tact network comparing with unidirectional contact network traversal. We analyze the
IO complexity of our proposed approaches and empirically evaluate the efficiency of
our proposed techniques. Based on our experimental results, our bidirectional query
traversaltechniqueoutperformsthattheunidirectionaltraversaltechniquebymorethan
41%forBothReachGridandReachGraph. Also,ReachGridtechniqueoutperformsthe
adoptedReachGraphfor 18%onaverageintermsofthenumberofdiskaccesses.
The rest of the chapter is organized as follows. We present a categorization of con-
tact network constraints in Section 5.2. Afterward, we focus on a specific contact net-
workconstraint,latencyconstraint,inSection5.3andstudyhowtoprocessreachability
queryincontactnetworkswithlatency. InSection 5.4, wediscussthegeneralizationof
ReachGraphandReachGridfordifferentcontactnetworkconstraints.
5.2 Constraints Categorization for Reachability Query
inContactNetworks
We categorize different constraints on contact paths into two different categories:
contact-level and object-level constraints. With contact-level constraints, either a con-
tact between two objects or sequence of contacts in a contact path are constrained
to satisfy some specific properties. On the other hand, with object-level constraints,
the objects themselves needed to satisfy some constraints such that an item can travel
throughacontactpath. Inthissection,wedefineeachofthesecategories. Anoverview
ofallthecategoriesandsub-categoriesarepresentedinFigure5.1. Noticethatalthough
we define and describe each individual category, it is possible to have a combination of
multipleconstraintsinsomeapplicationsaswell.
61
Figure5.1: ConstraintsClassificationofReachabilityQueriesinContactNetworks
5.2.1 Contact-levelConstraints
We categorize contact-level constraints of contact paths into three sub-categories: tem-
poral constraints, spatial constraints and non-spatiotemporal constraints. We describe
eachofthesecategoriesinthissection.
TemporalConstraints
InChapter4,weconsideredatemporalconstraintindefiningacontactpathpsuchthat
all the contacts in p should have been occurred during a certain time interval termed
query interval. In addition to query interval, in many application scenarios, there could
be other temporal constraints which should be satisfied such that an item can transmit
through a contact path. This temporal constraints can be in the form of inter contact
or intra contact constraints. With inter contact constraint, we consider two cases. In
the first case, two objects o and o
′
should maintain a distance of maximum d for t
d
time instances in order for an item to transmit between o and o
′
and a contact occurs.
Forexample,withfiletransferbetweentwocellphonescontactingoverBluetooth,they
shouldbeinproximityforenoughtimesuchthattheentirefilecanbetransmitted. With
thesecondcaseofintercontactconstraint,anobjectcanonlytransmitanitemitreceives
aftert
d
timeinstances. Consideranindividualcontaminatedbyfluvirusasanexample.
Inthiscase,theindividualwillberecoveredfromthediseaseafterafewdaysandhence
shecannottransmitthevirustoothersafterward. Withintracontactdelay,onceobjecto
62
receivesanitemi,itneedsatleastt
d
timeinstancessuchthatitcantransmititoanother
object o
′
. An example of such temporal constraint is contagious viruses which have a
latencyperiodsuchasaday.
SpatialConstrains
With spatial constraint, a set of constraints holds on the locations at which objects are
makingcontact. Thecontactsinacontactpathpshould(not)occurinasetoflocations
L. For example, imagine an application aims to find the potential sources of sensitive
information leak. With such application, the contacts which occur in areas under video
surveillance should be excluded as individuals will not transmit sensitive information
in such places. In some case, L can be constrained to be an ordered list or a regular
expressionrepresentingtheorderoflocations. Asanexample,thesensitiveinformation
leakshouldbeoriginatedfromsomeparticularlocations.
Non-spatiotemporalConstraints
In some applications, a set of constraints on a contact pathp should be satisfied which
are not spatiotemporal constrains. With such applications, a label is assigned to each
contact and a constraint should be held on sequence of the labels or the value of the
labels themselves. For example, a contact path through which sensitive information
travelsfromacriminaltoanothercriminaldoesnotpassthroughcontactswhichinvolve
policeofficers. Consideruncertaincontactnetworksasanotherexample. Withuncertain
contactnetworks,eachcontactcisassociatedwithaprobabilitywhichistheprobability
thatanitemcanbetransmittedduringc.Insuchcontactnetworks,itisverypossibleto
only include the contacts in a contact path which have the probability above a certain
threshold.
63
5.2.2 Object-levelConstraints
Withobject-levelconstraints,theobjectswhichmakingcontactsshouldsatisfyacollec-
tion of constraints. We consider two object-level constraints, where the first constrains
is on the number of objects which participate in the contacts in a contact pathp and the
other one is on the particular objects which should (not) be involved in the contacts in
p. In some applications, a limited number of objects can be in the contacts in p. For
example, consider a contact path through which criminal activity information traverses
fromacriminaltoanotherone. Thenumberofindividualsonsuchacontactpathshould
be usually small and less than k where k is decided based on the total number of sus-
picious individuals in that region. In this case, the number of objects participating in
the contacts of the contact path is constrained. It is also possible that the objects which
involve in the contacts of a contact path should (not) be from a specific set or regular
expressiondescribingtheirrelativeorder. Forexample,withthepreviousexample,only
peoplewhoaresuspiciouscanbeparticipatinginacontactpathdescribedearlier.
5.3 ReachabilityinContactNetworkswithLatency
In this section, we focus on contact networks with latency. We first formally define the
problem of reachability query in contact networks with latency. Afterward, we study
howtoadoptReachGridandReachGraphforcontactnetworkswithlatencyandfinally,
westudytheIOcomplexityforqueryprocessingofbothindexes.
5.3.1 ProblemDefinition
Consider a contact network C which is constructed based on the history of movement
of objects O in an environment E during a time interval T. For contact networks with
latency constraint, the application specific constraint F is a single function f which
64
evaluates whether a contact path satisfies the latency value ofα=L (Chapter 3), i.e., for
acontactpathp=(c
1
,c
2
,...,c
n
)duringT
p
⊆T fromo
i
too
j
,whether
f(p,α)≡true. (5.1)
Assume the list of objects on the contacts in p are (o
i
,o
1
,o
2
,...,o
m
,o
j
) where the
objects are in the same order as they receive a virtual item from o
i
, if o
i
initiates a
virtual item i at the beginning of T
p
. In this case, f(p,α) ≡ true if i can stay on each
o
k
,1≤k≤mforatleastLtimeinstances.
Given a pair of objects (o
i
,o
j
), o
i
,o
j
∈ O, a time interval T
p
⊆ T, and a latency
valueofL,areachabilityqueryonacontactnetworkwithnoconstraintsisdefinedina
similar way as the definition presented in Chapter 3 except that equation 3.1 should be
satisfied. In this case, we represent a reachability query only by query source, destina-
tion,intervalandlatencyandaccordinglydenoteitbyq :o
i
Tp;L
; o
j
.
5.3.2 ReachGraphandReachGridExtension
Here, we first study the index construction of ReachGraph and ReachGrid for contact
networks with latency and then focus on reachability query processing based on these
indexstructures.
IndexConstruction
ReachGraph In this section, we first overview ReachGraph index for contact net-
works with no constraints and thereafter describe how to extend it for contact networks
withlatency. ToconstructtheReachGraphforagivencontactnetworkC,westartfrom
65
C and apply a series of transformations to C that eventually converts it to the Reach-
Graph hyper graph H
N
. The transformations are performed in two phases, i.e., reduc-
tion phase and augmentation phase. At the reduction phase we precompute the objects
which are reachable to each other and reduce all nodes in each subset (along with their
connections) to a single hyper node. We call the resulting hyper graph D
N
which is a
significantlyreducedversionofC insize. D
N
isconstructedbyreplacingtheconnected
components of the TEN model ofC. At the next step of reduction, identical connected
components in consecutive time instances are merged. Afterward, at the augmentation
phase, to further improve ReachGraph, we precompute the reachability between pairs
ofnodesinD
N
atpredefinedtimeintervals. Weperformthisprecomputationatseveral
timeresolutionsandaccordinglyaugmentD
N
withahierarchyofextralinkstogenerate
the ReachGraph hyper graph H
N
. WithH
N
, a reachability query can be broken into a
setofprecomputedreachabilityqueriesforefficientqueryanswering.
Unfortunately,thefirststepofthereductionstepisnotextendabletothecaseofcon-
tact networks with latency. For example, consider the ReachGraph index constructed
after the first reduction step which is shown in Figure 5.2. Assume two reachability
queries,q
1
andq
2
with the same query interval of [0,2] and the query source and desti-
nationofo
2
ando
4
,respectively. Forq
1
andq
2
presumethatL=0andL=2,respectively.
It is easy to show that the edge between o
2
(1) and o
4
(1) is valid for q
1
and not for q
2
.
As the result it is not possible to replace the edges within a same connected compo-
nent with a hyper vertex. On the other hand, the second step of the reduction can be
applied to merge the contacts (not connected components) and hence reduce the size of
theReachGraphindex.
The augmentation step of ReachGraph precomputes reachability for different com-
binations of query source, destination and interval. For contact networks with latency
the reachability depends on latency as well. For example, in the ReachGraph index in
66
Figure5.2: ReachGraphattheendofreductionstep
Figure 5.2, o
3
is reachable from o
1
during [0,1] if L=0 but not if L=1. Therefore, this
stepalsocannotbeappliedtocontactnetworkswithlatency.
Consequently, the ReachGrid index for contact networks with latency will be con-
structedafterapplyingthesecondstepofthereductionontheTENmodelofthecontact
network. We use the same disk placement discussed in Chapter 4 to place ReachGraph
ondiskasthatapproachisvalidregardlessofthevalueofL.
ReachGrid We first present an overview of ReachGrid construction for contact net-
workswithoutlatencywhichproposedinChapter4. ReachGridleveragesthelocalityof
objectsoverspaceandtimetoavoidtraversingirrelevantcontactstoareachabilityquery.
Itleveragestemporallocalitytostopqueryprocessingassoonasacontactpathbetween
querysourceanddestinationisdiscoveredwhentraversingthecontactsorderedbytheir
occurrence time. To this end, the object trajectories segments are grouped based on the
time stamp of the position-vector pairs in the objects trajectories. A contact between
two objects occurs when they are in close proximity. Therefore, grouping the objects
based on spatial locality tends to aggregate the objects, which are in contact over time,
togetherandinasamegroup. Thisenablestraversingasubsetofgroupswhichincludes
only the objects reachable from query source when processing the query. ReachGrid
enables temporal and spatial locality by imposing two grids on the objects trajectories.
67
The first grid partitions the time interval T (T is the time interval during which all the
contacts in C occurred). The second grid spatially partitions the trajectories segments
withineachtimeintervalinT.
TheReachGridindexstructureproposedinChapter4canbereadilyappliedtocon-
tact networks with latency. However, in this section we propose a new disk placement
technique for ReachGrid which enhances the query processing by placing the Reach-
Grid cells which are necessary for query processing on disk blocks which are closer to
each other than the approach proposed in Chapter 4. In particular, the ReachGrid cells
inChapter4wereplacedinrow-wiseorderondisk. WeproposetoplacetheReachGrid
cellsbasedonHilbertfillingcurvetechniquetobetterpreservethelocalityoftheobjects
trajectories. WefirstpresentanoverviewoftheHilbertspacefillingcurveandafterward
discusshowtoplaceReachGridcellsondiskbasedonHilbertspacefillingcurve.
A space filling curve maps ann dimensional space into a one dimensional space by
assigning an integer number to each point of the space. The Hilbert space-filling curve
is a space filling curve which and is constructed recursively as follows [Jag97]. We
start from Figure 5.3(a) as the level 1 curve (the first resolution). The level 1 curve is
replicated in each quadrant of the next level. The lower left quadrant is rotated clock-
wise 90degreesduringreplication. Correspondingly,thelowerrightquadrantisrotated
anti-clockwise 90 degrees during replication. The sense (or direction of traversal) of
both lower quadrants is reversed during the replication process. The two upper quad-
rants have no rotation and no change of sense. Consequently, Figure 5.3(b) is obtained
(the second resolution). A repetition of this step gives rise to Figure 5.3(c) (the third
resolution)byconsideringthatallrotationandsensecomputationsarerelativetoprevi-
ously obtained rotation and sense in a particular quadrant. Further repetition results in
Figure5.3(d)(theforthresolution).
68
Figure5.3: HilbertConstruction
Figure5.4: HilbertCurveatResolution 2toMap 2dSpaceinto 1dspace
A Hilbert curve at resolution i can map a 2
(i+1)
grid space into a one dimensional
space by assigning an integer to each grid cell. An example is shown in Figure 5.4 for
i=2. Consider a ReachGrid cells C
i
with the resolution of m×n corresponding to the
temporalcellT
i
.WeuseHilbertcurveatresolutionx−1toorderthecellsinC
i
ondisk
wherexisthemaximumofmandn.
69
QueryProcessing
In this section, we propose a bidirectional traversal framework to traversal both Reach-
GridandReachGraph. WecallthisframeworkB-Traversal. WithB-Traversal,twopar-
allel instances of index traversal initiated where the forward traversal instance explores
the index starting from the query source and the backward traversal instance explored
theindexstartingfromthequerydestinationandinthebackwarddirection. Bothtraver-
salinstancestraversetheportionofthecontactnetworkwhichcorrespondstothequery
interval. The query processing is terminated either when a same object is visited dur-
ing both the forwardand backwardtraversalsor the entire query interval is exploredby
either of the traversal instances. In the former case the query destination is confirmed
to be reachable from query source while this is not true for the latter case. In the rest
of this section, we first provide the implementation of B-Traversal and thereafter prove
thatitaccuratelyverifiesareachabilityqueryincontactnetworkswithlatency.
In Chapter 4, we proposed a bidirectional traversal technique to traverse the Reach-
Graphindexstructurebasedonthetransitivitypropertywhichisasfollows.
Property5.3.1 [Transitivity]Supposeo
j
isreachablefromo
i
duringT
p
=[t
1
,t
2
]ando
k
is reachable from o
j
during T
′
p
=[t
′
1
,t
′
2
]. If t
2
≤ t
′
2
then o
k
is reachable from o
i
during
T
′′
p
=[t
1
,t
′
2
].
However, the transitivity in the above form does not hold for the contact networks
with latency. In particular, it is possible that o
j
is reachable from o
i
during T
p
=[t
1
,t
2
]
(the first reachability query) and o
k
is reachable from o
j
during T
′
p
=[t
′
1
,t
′
2
] (the second
reachabilityquery)buto
k
isnotreachablefromo
i
duringT
′′
p
=[t
1
,t
′
2
].Thishappenswhen
the latency equals to L > 0. Based on the reachability definition for contact networks
withlatency,avirtualitemicanbeinitiatedbyo
i
andtraveltoo
j
duringT
p
andavirtual
itemi
′
can be initiated byo
j
and travel too
k
duringT
′
p
. However, a virtual itemi
′′
may
70
notbeabletotravelfromo
i
too
k
duringT
′′
p
,ifi
′′
isinitiatedatthebeginningofT
′′
p
.The
reason is thati
′′
need to stay forL > 0 time instances ono
j
beforeo
j
can transmit it to
another object. This happens because the second reachability query does not consider
the latency and assume o
j
can transmit and item at the beginning of the query interval
T
′
p
.
Weaddressthis issueby consideringlatencyduringquery processing. In particular,
each objecto is assigned an activation timea
o
which is the earliest timeo can transmit
an item it receives from query source to another object. We call an object which can
transmitanitemreceivedfromsourcetoanotherobjectanactiveobject. Theactivation
timeofthequerysourceisinitializedasthestartingtimeinstanceofthequeryinterval.
The activation time is utilized in the forward traversal. Correspondingly, we define
reverse activation timea
′
o
for each objecto as the earliest timeo can transmit an itemi
itreceivesfromquerydestinationbutinthereversetraversaloftheindex,assumingthe
querydestinationinitiatesiattheendofqueryinterval.
A priority query Q
F
data structure is used to prioritize objects based on activation
time where the object with the least activation time has the highest priority. Q
F
holds
the (object, activation time) pair and prioritize the pairs based on the second element.
Givenaqueryq :o
i
Tp=[t
1
;t
2
];L
; o
j
,Q
F
isinitializedwith(o
i
,t
1
)andsubsequentlyutilized
to discover the objects reachable from query source at each time instance. During the
forwardtraversal,objectsvisitedattimeinstancetandreachablefromanobjecto∈Q
F
with the activation time ofr
o
≤ t are added to aQ
F
with the activation time oft +L.
Symmetrictoforwardtraversal,apriorityqueueQ
B
isinitializedwith(o
j
,t
2
)toprocess
the backward traversal. ForQ
B
, the pairs are prioritized in the reverse order of that of
Q
F
. In particular, the objects with the highest reverse activation time are placed on the
top of the queue. During the backward traversal, objects visited at the time instance t
71
and reachable from an objecto∈ Q
B
are added toQ
B
with the reverse activation time
oft−L.
During both the forward and backward traversals, whenever an objecto is added to
Q
F
(Q
B
) with the activationtime ofa, it is verified whethero is inQ
B
with the reverse
activation time of a
′
where a ≤ a
′
. If such a case is discovered, then based on the
transitivityproperty,thereachabilityqueryisverifiedtobetrue. Otherwise,thequeryis
terminated whenever the forward traversal meets the backward traversal and the entire
queryintervaliscovered. Inthiscase,thequeryisverifiedtobefalse.
The following theorem proves that B-Traversal accurately verifies a reachability
queryinacontactnetworkwithlatency.
Theorem5.3.2 B-traversal accurately verifies a reachability query in a contact net-
workwithlatency.
Proof We need to prove two statements. First, if a contact path discovered by B-
traversal,thiscontactpathreallyexistsinthecontactnetwork. Second,ifacollectionof
contact paths (at least one contact path) exists in the contact network, it will be discov-
ered by B-traversal. The first statement can be easily verified as we expand the contact
network and at the same time take into consideration the latency delay by considering
the activation time and the reverse activation time. We prove the second statement by
contradiction. Assume that B-Traversal does not find any contact path from the query
source to destination. Assume the reachability query is denoted by q : o
i
Tp;L
; o
j
. In
this way, the forward traversal will expand the contact network for a portion of query
interval T
1
and the backward traversal will expand the rest of the query interval, i.e.,
T
2
=T
p
/T
1
.Theforwardtraversalwillexpandthesameportionofthecontactnetworkas
that of any algorithm which expands the contact network in only the forward direction
as it injects a virtual item into the contact network and considers the contact network
latency. Assume that the backward traversal expanded the contact network but did not
72
findanycontactpathfromquerysourcetodestinationwhileatleastone,saycontactpath
p, exists. However, this is not possible as any object on the contact path p can receive
an object from the query source and after activation time the item can travel to query
destination. Therefore, in the reverse direction the objects on p should be discovered
whentraversingthecontactnetworkinthebackwarddirection.
5.3.3 IOComplexity
In this section, we analytically study the complexity of query processing. The analysis
is similar for both ReeachGraph and ReachGrid and hence we focus on ReachGraph in
thissection.
The set of contacts retrieved during a time interval T is denoted by C
T
. We count
thenumberofretrievedcontacts|C
T
′
p
|duringthetimeintervalT
′
p
= [t,t
′
]whenexpand-
ing the contact network starting from query source o
i
. T
′
p
is either the entire query
interval if query destination is not reachable from source or the smallest time inter-
val T
′
p
= [t,t
′′
] ⊆ T
p
, T
p
is the query interval, during which one can verify that query
destination is reachable from source. The value of|C
Tp
| is proportional to the number
of IOs during the same query interval. Although with the bidirectional traversal the
contact network is traversed in two directions, still each traversal is performs during a
time interval and hence the same logic can be applied to each case as well. We assume
that each object at each time instancet∈ T
p
is making contacts withx new objects on
average and subsequently find the number of objects which are retrieved during query
processing. WealsoassumewithoutlossofgeneralitythatthequeryintervalT
p
=[0,T].
Subsequently, we find the number of retrieved contacts over time follows. During the
timeintervalT
0
= [0,L−2]onlythequerysourceisactiveandcantransmitanitemto
another object. Therefore, for each time instancet∈ T
0
,x new objects are discovered
to be reachable from query source and hence x new contacts are retrieved from disk.
73
At the end of T
0
, x(L− 1) contacts are retrieved. At time instance L− 1, the objects
which were reachable from query source at time instance 0 are become activated and
hence the number of new contacts to be retrieved becomes (x+1)×x since the query
sourceandtheobjectsactivatedatL−1arecreatingcontactseachwithxnewobjects.
Subsequently,thetotalnumberofcontactsretrievedbecomesLx+(x+1)x.Similarly,
at the time instanceL, the new objects which were found reachable from query source
during the time interval [0,1] are activated and make (2x + 1)×x. As the result, the
totalnumberofretrievedcontactsbecomeLx+(x+1)x+(2x+1)x.Thetotalnumber
ofcontactsretrievedduringatimeinterval [t
1
,t
2
]canbeobtainedas,
|C
[t
1
;t
2
]
| =|C
[t
1
;t
2
−1]
|+x×|C
[t
1
;t
2
−L]
| (5.2)
where|C
[t
1
;t
1
+L−2]
|=x× (L− 1). Assuming that contacts are distributed uniformly on
disk and each disk block size is denoted by B, on average
|C
[t
1
;t
2
]
|
B
IOs are required to
processthequeryforthetimeinterval [t
1
,t
2
]whentheentiretimeintervalisprocessed.
Based on Equation 5.2, increasing the value of L will result in retrieving less contacts
from disk for the worst case of query processing during a specific time interval and
hencethenumberofIOswillbedecreasedaswellandviceversa.
5.3.4 Experiments
In this section, we present our experimental results for reachability query processing in
contactnetworkswithlatency. Theexperimentalsettinginthissectionisthesameasthat
of Section 4.4. In addition, we need to select the value of latency for our experiments.
The value of latency is 20 time instances (two minutes) unless otherwise stated. This
selection of value for L represents applications for which the latency is low such as
74
small file transfer via Bluetooth or message transfer in vehicular networks via DSRC
protocol.
In the rest of this section, we first study ReachGrid and ReachGraph efficiency for
reachability query processing in contact networks with latency. Thereafter, we present
theempiricalcomparisonbetweenReachGridandReachGraph.
ReachGrid
In this section, we first focus on the efficiency of the index construction and then query
processingstepofourReachGrid.
IndexConstruction TheperformanceoftheReachGriddependsontheresolutionof
temporal and spatial grids which quantize time intervalT and environment E, accord-
ingly. Thereisatradeoffinselectingbothtemporalandspatialresolutions. Byincreas-
ing any of the resolutions, the number of disk random accesses increases during the
query processing and subsequently the number of IOs increases. The reason is that the
localityintimeandspaceisnotfullyleveraged. Ontheotherhand,decreasingthereso-
lutionofgridsresultsinplacementofhugenumberoftrajectorysegmentswithinagrid
cell. As the result, many trajectory segments which are irrelevant for query process-
ing are processed for each reachability query. This increases the number of IOs during
queryprocessing.
We empirically optimize the grids resolutions by varying both temporal and spatial
grids and selecting a combination which minimizes the number of IOs when process-
ingreachabilityqueries. Wevarytemporalresolutionfrom 5to 80forbothdatasetsand
spatialresolutionfrom128mto10km(17km)forRWP(VN)datasetsandselectacom-
binationwhichminimizesthenumberofIOswhileprocessingreachabilityqueries. We
denote the optimal spatial and temporal resolutions by R
S
and R
T
, respectively. With
75
(a) IOcountvs. spatialgridresolution (b) IOcountvs. temporalgridresolution
Figure5.5: ReachGridresolutionsoptimization
RWP datasets, R
S
=1024m and R
T
=20 and accordingly, with VN datasets, R
S
=17km
and R
T
=20. With VN datasets, the optimal ReachGrid indexes have lower resolutions
than that of RWP datasets. The reason is that VN datasets capture the movement of
fewer objects as compared to RWP datasets and hence spatial grids are larger to place
more objects within the same cell. Figures 5.5 (a) and (b) show how IO count varies
when temporal and spatial resolutions vary for RWP datasets, respectively. With Fig-
ure5.5(a)temporalresolutionis20andwithFigure5.5(b)thespatialresolutionequals
1024m. Because of lack of space and the fact that VN datasets results also follow the
samepattern,wedonotshowtheresultsforVNdatasets.
We also measured the time required to construct the optimal ReachGrid indices.
Over all the cases, the index construction time is less than 4.3 hours. We omit further
detailsofthisexperimentasitisthesameastheReachGridconstructiontimediscussed
inChapter4.
QueryProcessing Here,wefirstevaluatetheefficiencyofReachGriddiskplacement
technique and thereafter, compare the efficiency of B-traversal technique proposed in
Section5.3.2withthatofunidirectionaltraversal.
76
Figure 5.6: IO count for the row-wise and Hilbert-based disk placement techniques for
RWP
20k
dataset
We first compare query processing (B-traversal technique) efficiency of Hilbert-
based disk placement proposed in Section 5.3.2 with that of row-based disk placement
proposedinChapter4. Figure5.6showsthenumberofIOsforprocessingquerieswith
different query interval lengths and for RWP
20k
considering two different disk place-
ment techniques. With the row-based disk placement, the grid cells corresponds to a
same temporal cell are placed on disk in the row-wise order. As the results show, the
Hilbert-baseddiskplacementoutperformsrow-wisediskplacementonaveragefor15%
in terms of the number of IOs. The reason is that with Hilbert-based disk placement
the locality of objects trajectories and hence the contacts are better preserved on disk.
Similarpatternisobservedfortheotherdatasetsaswell.
Next,wecomparetheefficiencyofB-traversalqueryprocessingtechniquewiththat
of unidirectional traversal. The difference is that with unidirectional traversal the con-
tact network is traversed only starting from the query source. Figure 5.7 (a) shows
the number of IOs for processing queries with the query intervals of the 250 and for
different latency values. The unidirectional traversal is denoted by U-Traversal. Two
observations can be made from this figure. First, the B-traversal technique outperforms
unidirectionaltraversalfor41%onaverage. ThereasonisthatwiththeB-traversaltech-
nique,thenumberofobjectsreachablefromquerysource(intheforwardtraversal)and
77
(a) ReachGrid(RWP
20k
) (b) ReachGraph(RWP
20k
)
Figure5.7: IOcountforB-traversalandU-traversalanddifferentcontactnetworklaten-
cies
to query destination (in the backward traversal) grows during the query processing the
chancethattheyintersectincreasesifquerydestinationisreachablefromquerysource.
Moreover, the guided traversal from both directions decrease the number of contacts
whicharetraversedduringthequeryprocessing. Thenextobservationisthatforacon-
stantqueryinterval,increasingthevalueoflatencyresultsinthedecreaseofthenumber
ofIOs. Thereasonisthatifavirtualitemisgeneratedbyquerysourceatthebeginning
ofthequeryinterval,fewerobjectsreceiveitwhenthelatencyincreases.
ReachGraph
Here, we first study the efficiency of index construction and afterward the online query
processingapproachesofReachGraph.
IndexConstruction Here,weempiricallymeasurethecontactnetworksizebycount-
ingthenumberofvertices(|V|)andedges(|E|)ofReachGraphafterthereductionstep
when the index is generated for different time intervals T. Notice that in comparison
with the ReachGraph construction in Chapter 4, the first step of reduction is not pos-
sible for contact networks with latency. The results for RWP datasets are shown in
Figures 5.8 (a) and (b) for edges and vertices, respectively. The results for VN datasets
78
(a) (b)
Figure5.8: Contactnetworkedges(a)andvertices(b)
followthesimilarpatternandomittedduetospaceconstraints. Thex-axisrepresentsthe
lengthoftimeintervalT duringwhichthecontactnetworkisconstructedassumingthat
alltimeintervalsarestartingfromthesametimeinstance,i.e.,T=[0,|T|]. Asexpected,
|E|and|V|increaseswhen|T|increases. Thereasonisthatthenumberofcontactsand
accordinglythenumberofedgesincreasewhen|T|increases. Accordingly,thenumber
ofedgesandverticesincreaseswhenthenumberofobjectsincreasesaswell. Themost
importantobservationfromthisexperimentisthatthecontactnetworkcanbecomepro-
hibitively large to reside in the main memory. In particular, the number of edges and
verticesaremorethan 26,199and 17,815millionforRWP
40k
,respectively.
Query Processing The disk placement technique proposed in Chapter 4 is readily
applicable for contact networks with latency. Therefore, we use the same approach
for ReachGraph placement on disk. Here, we compare the efficiency of B-traversal
query processing technique with that of unidirectional traversal in the same setting as
Section 5.3.4. The result is shown in Figure 5.7 (b). Similar to Section 5.3.4, the B-
traversal approach is outperforming the unidirectional (U-traversal) technique for 49%
on average. Furthermore, for a constant query interval, increasing the value of latency
79
(a) RWP
20k
(b) RWP
20k
Figure5.9: ReachGridvs. ReachGraph
results in the decrease of the number of IOs. The reasoning of both observations are
similartothatofReachGridinSection5.3.4.
ReachGridvsReachGraph
In this section, we compare the efficiency of ReachGrid and ReachGraph. We generate
random queries with varying queries intervals of 100, 300 and 500 time instances and
compare the number of IOs for ReachGrid and ReachGraph (B-traversal) approaches.
TheresultsareshowninFigure5.9(a)and(b)forRWP
20k
andVN
2k
datasets,respec-
tively. Basedonourresults,ReachGridoverallthecasesisoutperformingReachGraph
for 18% on average. The reason is the complexity of the ReachGraph index in compar-
ison with ReachGrid. In particular, ReachGraph needs to store the reverse of the graph
for the backward traversal during the B-traversal approach while this is not the case
for ReachGrid. Furthermore, the disk placement of ReachGraph does not consider spa-
tial locality in placement while ReachGrid consider this by leveraging a Hilbert-based
placement technique. As the figure shows, With VN
2k
dataset, for some cases Reach-
Graph is outperforming ReachGrid. The reason is that ReachGrid spatial grid cannot
fullyleveragespatiallocalityfornon-uniformobjectsdistributions. Infutureweplanto
considerotherspatialindexingtechniqueswithReachGridsuchasR-treeorquad-tree.
80
5.4 Extension to Contact Networks with Other Con-
straints
In this section, we discuss the challenges in expanding ReachGraph and ReachGrid
indexstructuresforcontactnetworkswithconstraintsotherthanlatency. Wefirstdiscuss
the necessary changes to the index construction of both indexes and thereafter focus on
thequeryprocessingmodifications.
ReachGraph index for contact networks with no constraints is constructed after
applying a series of transformation on the TEN model of the contact network. These
transformationsinclude,(1)precalculatingthecontactsoftheTENmodelofthecontact
network, (2) replacing the connected components of the TEN model by hyper vertices,
(3) reducing the redundancy by removing the connected components which are exactly
repeated over consecutive time instances and (4) augmentation of the resulted graph by
precalculatingthereachabilityovertimeintervalsatdifferentresolutions.
Let us first focus on contact network constraints which are not spatial constraints.
Based on the type of the contact network constraints, some of the transformation steps
may not be readily applicable. Complete precalculation of the contacts is not possible
whenthecontactdefinitiondependsontheinputqueryaswell. Forexample,withinter
contact temporal constraint, two objects need to be in proximity for t
d
time instances
wheret
d
is determined by the input query. In this way, it is not possible to precalculate
the contacts before query for all the possible values oft
d
. A possible solution can be to
createcontactnetworkatdifferentresolutionsandutilizeacombinationofresolutionsat
thequerytimesimilartotheaugmentationstepoftheReachGraphindexconstruction.
The second step of ReachGraph construction may not be applicable for all the con-
tactnetworkconstraintsaswell. Inparticular,ifthetransitivitypropertyforqueryinter-
vals with the length of zero does not hold then it is not possible to replace a connected
81
component with a hyper vertex. For example, with the contact networks with latency
greater than zero this will be the case. As the result, in the second step of Reach-
Graph construction connected components are not formed for some contact network
constraints. Subsequently, the third step of ReachGraph construction, i.e., reduction by
removing the connected components which are exactly repeated over consecutive time
instances, cannot be applied. However, the third step can be applied in a different form
for example by removing redundant contacts or objects trajectories which are exactly
repeatedoverconsecutivetimeinstances.
The final step of the ReachGraph construction is the augmentation step which may
notbealsoapplicableforsomecontactnetworksconstraints. Inparticular,ifthereach-
ability between pair of objects for a specific time interval depends on some parameter
which is the part of input query. For example, for contact networks with latency, the
augmentation step is not applicable. However, one may apply augmentation step for
different resolutions of both time and latency in this example. We are focusing on such
extensionaspartofourfuturework.
Forthecontactnetworkconstraintswhicharespatialconstraints,oneneedstoasso-
ciate each contact with spatial information as well to enable correct query processing.
Furthermore, we may impose a spatial index structure on the top of ReachGraph to
enable efficient query processing. The type of the index structure should be selected
basedonthetypeoftheconstraint.
Finally, the placement of ReachGraph on disk should be also optimized to consider
theconstraintsaswell. Inparticular,theobjectswhicharereachabletoeachotherbased
onthequeryconstraintparametersshouldbeplacedonconsecutivediskblocksandthis
placementshouldconsiderthefactthatqueryparameterdiffersfromaquerytoanother
one.
82
With ReachGrid, because of the fact that the precalculation of the contacts are
avoided, it can be readily applied for contact networks with different constraints. How-
ever,stilltheplacementoftheindexondiskandthechoiceofindexstructure(i.e.,spa-
tiotemporalgrid)maybemodifiedbasedonthecontactnetworkconstrainsformoreeffi-
cient query processing. For example, with a contact network constraint which enforce
thatobjectsonacontactpathshouldbeonaparticularsequenceofpredefinedlocations,
it is more efficient for query processing to index those predefined locations as well to
prunelargeportionofthesearchspaceduringthequeryprocessing.
Forthequeryprocessing,thefirststepistodefinehowtoevaluatetheconstraintona
contact path. For example, with uncertainty constraint which rejects contact paths with
probabilitylessthanathresholdp
T
,oneneedstodefinehowtheprobabilityofacontact
path is defined. Accordingly, the query processing needs to consider the probability of
each contact path during index traversal for more efficient query processing. This can
be done by utilizing graph traversal techniques such as shortest path algorithms which
prioritize graph expansion. Finally, the bidirectional query processing is based on the
transitivitypropertywhichdoesnotholdforsomeoftheconstraints. Forexample,con-
sider the object-level constraint which enforces that the number of objects on a contact
path should be less than k. It is easy to prove that with this constraint, the transitivity
propertydoesnotholdnecessarily. AlthoughitisstillpossibletotraversetheReachGrid
indexintwodirectionsandinparallel,itisnotpossibletoleveragetransitivityproperty
forearlierterminationofthequeryprocessing.
83
6
ConclusionsandFutureWork
Inthisdissertation,forthefirsttimeweintroducedandstudiedtheproblemofreachabil-
ity query in disk-resident spatiotemporal contact networks with the existence of appli-
cation constraints. We first focused on contact network with no-constraints. For this
category of contact networks, we proposed two different indexing approaches, Reach-
Grid and ReachGraph, to enable efficient reachability query processing. We have con-
ductedanempiricalstudywithbothrealandsyntheticdatasetstoevaluateourproposed
techniques. The experimental results show that our proposed techniques outperform
the existing reachability query processing approaches in contact networks by 76% on
average. Next,weextendedthereachabilityqueryprocessingforcontactnetworkswith
constraints. We introduced and categorized the problem of reachability query process-
ing in contact networks with constraints. We also focused on the latency constraints
and showed how to extend ReachGrid and ReachGraph index structures to solve this
problem. In particular, we optimized ReachGrid index construction process and also
proposed a bidirectional query processing framework for both index structures. Based
on our extensive experimental results, ReachGrid outperforms ReachGraph in terms of
number of disk accesses during query processing. The solution proposed for contact
networks with latency provides insights on how one can tackle the problem of reacha-
bility query processing in contact networks with general constraints. Accordingly, we
discussed how one can extend ReachGrid and ReachGraph for contact networks with
generalconstraints.
84
There are many interesting possible directions for the future work. First, in this
dissertationwefocusedontheproblemofreachabilityofoneobject(querydestination)
from another object (query source) over a period of time (query interval). In future,
we plan to extend this to consider reachability of many objects from one object (many
query destinations and one query source), one object from many objects (one query
destinationbutmanyquerysources)andfinally,manyobjectsfrommanyobjects(many
querydestinationsfrommanyquerysources).
Second, we plan to extend our proposed approaches to be applicable in cloud-
computing environments to further enhance the efficiency of query processing. In par-
ticular, the bidirectional contact network traversal can be extended to be performed in
parallelinbothdirections. Moreover,theextensionofthereachabilityqueryprocessing
discussedabovecanbeextendedtocloud-computingenvironmentaswell.
Last but not least, we intend to implement and evaluate contact network with other
constraints than latency as well. Two particular examples are uncertain contact net-
works and non-immediate contact networks. With uncertain contact networks, two
objects transmit an item, if they are in contact, with a probability of p. The value of
p depends on various factors such as the distance between the contacting objects. With
non-immediatecontactnetworks,anitemisinitiatebyanobjectandremainintheenvi-
ronment for a while until another object receives the item. For example, an individual
maybecontaminatedbyavirusvwhenusingthesharedmaterialsalreadycontaminated
byv fromanotherindividual.
85
ReferenceList
[ABJ89] R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of
transitiverelationshipsinlargedataandknowledgebases. InProceedings
of the 1989 ACM SIGMOD international conference on Management of
data,SIGMOD’89,pages253–262,NewYork,NY,USA,1989.ACM.
[AJ06] Subramanian Arumugam and Chris Jermaine. Closest-point-of-approach
joinformovingobjecthistories. InICDE,pages86–95,2006.
[BHKT05] PetkoBakalov,MariosHadjieleftheriou,EamonnJ.Keogh,andVassilisJ.
Tsotras. Efficient trajectory joins using symbolic representations. In
MDM,pages86–93,2005.
[BLS08] RainerBaumann,FranckLegendre,andPhilippSommer. Genericmobil-
ity simulation framework (GMSF). In MobilityModels, pages 49–56,
2008.
[Bri03] Thomas Brinkhoff. Generating traffic data. IEEE Data Eng. Bull.,
26(2):19–25,2003.
[CC08] YangjunChenandYibinChen.Anefficientalgorithmforansweringgraph
reachability queries. In Proceedings of the 2008 IEEE 24th International
ConferenceonDataEngineering,ICDE’08,pages893–902,Washington,
DC,USA,2008.IEEEComputerSociety.
[CEP03] V. Prasad Chakka, Adam Everspaugh, and Jignesh M. Patel. Indexing
largetrajectorydatasetswithSETI. InCIDR,pages164–175,2003.
[CHKZ02] Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. Reachability
anddistancequeriesvia2-hoplabels. SODA’02,pages937–946,2002.
[CMTV04] Antonio Corral, Yannis Manolopoulos, Yannis Theodoridis, and Michael
Vassilakopoulos. Multi-way distance join queries in spatial databases.
Geoinformatica,8(4):373–402,December2004.
86
[CMWM10] P.Cudre-Mauroux,E.Wu,andS.Madden. Trajstore: Anadaptivestorage
system for very large trajectory data sets. In Data Engineering (ICDE),
2010IEEE26thInternationalConferenceon,pages109–120,2010.
[CSC
+
12] JamesCheng,ZechaoShang,HongCheng,HaixunWang,andJeffreyXu
Yu. K-reach: Who is in your small world. PVLDB, 5(11):1292–1303,
2012.
[CYL
+
06] Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, and Philip S.
Yu.Fastcomputationofreachabilitylabelingforlargegraphs.InProceed-
ings of the 10th international conference on Advances in Database Tech-
nology, EDBT’06, pages 961–979, Berlin, Heidelberg, 2006. Springer-
Verlag.
[DKSR11] Ugur Demiryurek, Farnoush Banaei Kashani, Cyrus Shahabi, and Anand
Ranganathan. Online computation of fastest path in time-dependent spa-
tialnetworks. InSSTD,pages92–111,2011.
[EYKS10] MichaelR.Evans,KwangSooYang,JamesM.Kang,andShashiShekhar.
ALagrangianapproachforstorageofspatio-temporalnetworkdatasets: a
summaryofresults. InGIS,pages212–221,2010.
[HCY11] Pan Hui, Jon Crowcroft, and Eiko Yoneki. BUBBLE Rap: Social-
basedforwardingindelay-tolerantnetworks. IEEETrans.Mob.Comput.,
10(11):1576–1589,2011.
[Jag97] H. V. Jagadish. Analysis of the Hilbert Curve for Representing Two-
DimensionalSpace. InformationProcessingLetters,62:17–22,1997.
[JHW
+
10] Ruoming Jin, Hui Hong, Haixun Wang, Ning Ruan, and Yang Xiang.
Computing label-constraint reachability in graph databases. In Proceed-
ings of the 2010 international conference on Management of data, SIG-
MOD’10,pages123–134,2010.
[JLDW11] RuomingJin,LinLiu,BolinDing,andHaixunWang. Distance-constraint
reachability computation in uncertain graphs. Proc. VLDB Endow.,
4(9):551–562,June2011.
[JRXW11] Ruoming Jin, Ning Ruan, Yang Xiang, and Haixun Wang. Path-tree: An
efficient reachability indexing scheme for large directed graphs. ACM
Trans.DatabaseSyst.,36(1):7:1–7:44,March2011.
[KA12] HyoungshickKimandRossAnderson. Temporalnodecentralityincom-
plexnetworks. Phys.Rev.E,85:026107,Feb2012.
87
[KBV07] Thomas Karagiannis, Jean-Yves Le Boudec, and Milan Vojnovic. Power
lawandexponentialdecayofintercontacttimesbetweenmobiledevices.
InMOBICOM,pages183–194,2007.
[MM02] Kurt Mehlhorn and Ulrich Meyer. External-memory breadth-first search
withsublinearI/O. InESA,pages723–735,2002.
[MPGW07] Atulya Mahajan, Niranjan Potnis, Kartik Gopalan, and An-I Andy Wang.
Modeling VANET deployment in urban settings. In MSWiM, pages 151–
158,2007.
[QCQ
+
12] Miao Qiao, Hong Cheng, Lu Qin, JeffreyXu Yu, PhilipS. Yu, and Lijun
Chang. Computing weight constraint reachability in large networks. The
VLDBJournal,pages1–20,2012.
[SAM02] Jop F. Sibeyn, James Abello, and Ulrich Meyer. Heuristics for semi-
external depth first search on directed graphs. In SPAA, pages 282–292,
2002.
[SMKS12] Houtan Shirani-Mehr, Farnoush Banaei Kashani, and Cyrus Shahabi.
Efficient reachability query evaluation in large spatiotemporal contact
datasets. PVLDB,5(9):848–859,2012.
[SSA08] Hanan Samet, Jagan Sankaranarayanan, and Houman Alborzi. Scalable
network distance browsing in spatial databases. In SIGMOD Conference,
pages43–54,2008.
[SX08] Shashi Shekhar and Hui Xiong, editors. Encyclopedia of GIS. Springer,
2008.
[TL07] Silke Trissl and Ulf Leser. Fast and practical indexing and querying of
very large graphs. In Proceedings of the 2007 ACM SIGMOD interna-
tionalconferenceonManagementofdata,SIGMOD’07,pages845–856,
NewYork,NY,USA,2007.ACM.
[TMML10] John Tang, Mirco Musolesi, Cecilia Mascolo, and Vito Latora. Charac-
terising temporal distance and reachability in mobile and online social
networks. SIGCOMM Comput. Commun. Rev., 40(1):118–124, January
2010.
[Vit08] Jeffrey Scott Vitter. Algorithms and data structures for external memory.
Found.TrendsTheor.Comp.Sci.,2:305–474,2008.
88
[WXD
+
12] Lingkun Wu, Xiaokui Xiao, Dingxiong Deng, Gao Cong, Andy Diwen
Zhu, and Shuigeng Zhou. Shortest path and distance queries on road net-
works: an experimental evaluation. Proc. VLDB Endow., 5(5):406–417,
January2012.
[YC10] Jeffrey Xu Yu and Jiefeng Cheng. Graph reachability queries: A survey.
InManagingandMiningGraphData,pages181–215.2010.
[YCZ10] HilmiYildirim,VineetChaoji,andMohammedJ.Zaki. GRAIL:scalable
reachabilityindexforlargegraphs. Proc.VLDBEndow.,3(1-2):276–284,
September2010.
[YCZ13] Hilmi Yildirim, Vineet Chaoji, and Mohammed J. Zaki. Dagger: A scal-
able index for reachability queries in large dynamic graphs. arXiv Com-
putingResearchRepository,abs/1301.0977,2013.
[ZXSW13] Ying Zhu, Bin Xu, Xinghua Shi, and Yu Wang. A survey of social-based
routing in delay tolerant networks: Positive and negative social effects.
CommunicationsSurveysTutorials,IEEE,15(1):387–401,2013.
89
Abstract (if available)
Abstract
In many application scenarios, an item, such as a message, a piece of sensitive information, contagious virus or a malicious malware, passes between two objects, such as moving vehicles, individuals or cell phone devices, when the objects are sufficiently close (i.e., when they are, so-called, in contact), and some application specific constraints are satisfied. An example of ""constraint"" in the transmission of a malware is that it takes some time such that the malware is activated on a cell phone and then it can be transmitted to another one via Bluetooth. As another example for constraint, a message passes between two vehicles with a probability which depends on various conditions such as the distance between the vehicles. In such applications, once an item is initiated, it can penetrate the object population through the evolving network of contacts among objects, termed ""contact network"". A reachability query evaluates whether two objects are ""reachable"" through the contact network. In this dissertation, we define and study reachability query in large (i.e., disk resident) contact datasets which verifies whether two objects are reachable through the contact network represented by such contact datasets. The main characteristics of our problem are the large scale of the contact dataset as well as the dynamism of the network which models the contact dataset. This underlying network evolves over the time period during which the contact dataset is constructed as the objects are moving in the environment and subsequently new contacts appear and old contacts disappear over time. ❧ In this dissertation, due to the complexity of the general problem, we first simplify the problem by focusing on reachability in contact datasets with no-constraints. With such contact datasets, an item passes between two objects when they are close enough. We propose two contact dataset indexes, termed ReachGrid and ReachGraph, for efficient reachability query processing. With ReachGrid, at the query time only a small necessary portion of the contact dataset is constructed and traversed. With ReachGraph, we precompute and leverage reachability at different scales for efficient query processing. We optimize the disk placement of both indexes for efficient query processing. ❧ Afterward, we extend ReachGrid and ReachGraph for contact networks with constraints. To this end, as a case study we focus on a specific type of constraint, i.e., the latency constraint, and adopt ReachGraph and ReachGrid for efficient reachability query processing. Furthermore, we discuss how to generalize ReachGraph and ReachGrid for contact networks with general constraints based on the insights we obtain from focusing on contact networks with latency.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Efficient updates for continuous queries over moving objects
PDF
Generalized optimal location planning
PDF
Ensuring query integrity for sptial data in the cloud
PDF
Query processing in time-dependent spatial networks
PDF
WOLAP: wavelet-based on-line analytical processing
PDF
Enabling query answering in a trustworthy privacy-aware spatial crowdsourcing
PDF
Efficient graph learning: theory and performance evaluation
PDF
Scalable data integration under constraints
PDF
Spatiotemporal traffic forecasting in road networks
PDF
Tensor learning for large-scale spatiotemporal analysis
PDF
Approximate query answering in unstructured peer-to-peer databases
PDF
Modeling intermittently connected vehicular networks
PDF
Behavior understanding from speech under constrained conditions: exploring sparse networks, transfer and unsupervised learning
PDF
Domical: a new cooperative caching framework for streaming media in wireless home networks
PDF
Robust video transmission in erasure networks with network coding
PDF
Location-based spatial queries in mobile environments
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
MOVNet: a framework to process location-based queries on moving objects in road networks
PDF
A function approximation view of database operations for efficient, accurate, privacy-preserving & robust query answering with theoretical guarantees
PDF
Machine learning techniques for perceptual quality enhancement and semantic image segmentation
Asset Metadata
Creator
Shirani-Mehr, Houtan
(author)
Core Title
Efficient reachability query evaluation in large spatiotemporal contact networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/31/2013
Defense Date
06/14/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
contact networks,OAI-PMH Harvest,query processing,reachability query
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shahabi, Cyrus (
committee chair
), Kuo, C.-C. Jay (
committee member
), Narayanan, Shrikanth S. (
committee member
)
Creator Email
hshirani@gmail.com,hshirani@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-307725
Unique identifier
UC11294866
Identifier
etd-ShiraniMeh-1904.pdf (filename),usctheses-c3-307725 (legacy record id)
Legacy Identifier
etd-ShiraniMeh-1904.pdf
Dmrecord
307725
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Shirani-Mehr, Houtan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
contact networks
query processing
reachability query