Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling intermittently connected vehicular networks
(USC Thesis Other)
Modeling intermittently connected vehicular networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING INTERMITTENTLY CONNECTED VEHICULAR NETWORKS
by
Quynh Nguyen
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2018
Copyright 2018 Quynh Nguyen
Dedication
To my beloved family: Nguyen Tan Sinh, Nguyen Thi Xuan, Nguyen Thi
Phuong Thao, Nguyen Duy Linh and Titi.
ii
Acknowledgments
I would like to express my sincere gratitude to those signicant individuals for your
help through my PhD study.
Firstly, I would like to thank Professor Bhaskar Krishnamachari, my dear Advi-
sor. Without his guidance, I would never ever been able to go through this journey.
In spite of the setbacks and the doubts, it was an intellectual enrichment and a per-
sonal development opportunity. I have learnt a lot from him not only about academic
research but about attitude toward life also. He is understanding, perseverant, in-
spiring and patient with all of the students. He is truly as professional as a teacher,
a researcher and an advisor.
Secondly, I would like to thank all of my thesis committee: Professor Leana Gol-
ubchick and Professor Cyrus Shahabi for their insightful comments to my thesis. I
also would like to thank Dr. Fan Bai and Dr. Ugur for giving me valuable intuition
during the research projects.
My sincere thanks also goes to Professor John Silvester, Professor Leana Gol-
ubchick, Professor Murali Annavaram and Professor Cyrus Shahabi for being in my
qualication committee, understanding and giving me helpful comments for my work.
iii
I also thank my ANRG lab-mates for sharing the journey with me through ups and
downs, for all of their encouragement, memorable discussion and fun moments. I also
thank all the stas from USC and specically EE department for making my PhD
life run smoother and easier.
I would like to show my appreciation for all the funding support coming from NSF
(this work was supported in part by the US National Science Foundation under grant
CNS-1217260), GM Research, IMSC and Vietnam Education Foundation during my
PhD study.
Last but not least, I would like to dedicate this thesis to my family: my parents,
my sister for all of their love and their support in my whole life.
iv
Table of Contents
Dedication ii
Acknowledgments iii
List Of Figures viii
List Of Tables xi
Abstract xii
Chapter 1: Introduction 1
1.1 General Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Opportunistic network statistics . . . . . . . . . . . . . . . . . 4
1.1.2 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Study 1: On Computing the Encounter Distributions of Multi-
ple Random Walkers on Graphs . . . . . . . . . . . . . . . . . 6
1.2.2 Study 2: Area-Based Dissemination in Vehicular Networks . . 7
1.2.3 Study 3: A Study of Contact Durations for Vehicle to Vehicle
Communications . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.4 Study 4: Enhanced Random Walk Model . . . . . . . . . . . . 9
Chapter 2: Background 10
2.1 Random Walk, Markov Chain and Markov Jump Process . . . . . . . 10
2.2 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Two sample hypothesis test . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Kolmogorov-Smirnov Test . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Anderson Darling Test . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Traditional models in trac engineering . . . . . . . . . . . . . . . . 14
2.5 Simulation of Urban MObility (SUMO) - Trac Simulator . . . . . . 17
v
Chapter 3: Related Work 19
3.1 On Computing the Encounter Distributions of Multiple Random Walk-
ers on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Area-Based Dissemination in Vehicular Networks . . . . . . . . . . . 26
3.3 A Study of Contact Durations for Vehicle to Vehicle Communications 27
Chapter 4: On Computing the Encounter Distributions of Multiple
Random Walkers on Graphs 29
4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Computation of distribution of PET . . . . . . . . . . . . . . 34
4.1.2 Approximation of IAET distribution . . . . . . . . . . . . . . 36
4.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Pairwise Encounter Time . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 Inter-Any Encounter Time . . . . . . . . . . . . . . . . . . . . 41
4.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Walkers on a circular strip . . . . . . . . . . . . . . . . . . . . 42
4.3.2 Walkers on a general connected graph . . . . . . . . . . . . . . 44
4.4 PET-IAET computation of multiple communities on general connected
graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 5: Area-Based Dissemination in Vehicular Networks 52
5.1 Simulation methodology . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 6: A Study of Contact Durations for Vehicle to Vehicle Com-
munications 67
6.1 Dataset introduction and approach methodology . . . . . . . . . . . . 70
6.1.1 Shanghai Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 Factual observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.1 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.2 Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.3 Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.4 Parameter evaluation . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 7: Enhanced Random Walk Model 86
7.1 Random Walk simulation on SUMO . . . . . . . . . . . . . . . . . . . 86
7.1.1 Normal Random Walk . . . . . . . . . . . . . . . . . . . . . . 86
7.1.2 Birth Death Random Walk . . . . . . . . . . . . . . . . . . . . 88
7.2 Conventional Origin Destination simulations using SUMO . . . . . . 89
7.2.1 Improve the t between RW and ODE . . . . . . . . . . . . . 91
vi
7.2.2 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2.2.1 Multiple transition probability matrixes . . . . . . . 93
Chapter 8: Conclusions 94
Reference List 96
vii
List Of Figures
2.1 SUMO LA simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 SUMO LA zoom-in simulation . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Mobility modelling taxonomy . . . . . . . . . . . . . . . . . . . . . . 20
4.1 PET, IAET extraction for a road network scenario . . . . . . . . . . 33
4.2 Circular Random Walk example . . . . . . . . . . . . . . . . . . . . . 39
4.3 Corresponding Markov Chain for the example . . . . . . . . . . . . . 39
4.4 PET analysis, n=100, k=20, T=10000 . . . . . . . . . . . . . . . . . 43
4.5 PET and IAET computation for circular random walks . . . . . . . . 44
4.6 PET and IAET analysis of random walks on a connected graph, n=12,
k=6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 PET and IAET of a general connected graph, n=12, k=6 . . . . . . . 46
4.8 PET and IAET of a general connected graph, n=k=36, m=630 . . . 47
4.9 PET and IAET for a 3-communities network, k1=k2=k3=12 . . . . . 50
4.10 PET and IAET for a 3-communities network, k1=k2=k3=1 . . . . . 50
5.1 Grid placement partitioning scheme . . . . . . . . . . . . . . . . . . . 61
5.2 Random Voronoi partitioning scheme . . . . . . . . . . . . . . . . . . 61
viii
5.3 Density-based Voronoi partitioning scheme . . . . . . . . . . . . . . . 62
5.4 K-Means partitioning scheme . . . . . . . . . . . . . . . . . . . . . . 62
5.5 Example network of 4 areas . . . . . . . . . . . . . . . . . . . . . . . 63
5.6 Corresponding constructed Markov chain . . . . . . . . . . . . . . . . 63
5.7 Area Distribution (M=632, N=42) . . . . . . . . . . . . . . . . . . . 64
5.8 Propagation time versus number of cars . . . . . . . . . . . . . . . . . 65
5.9 Propagation time versus number of areas . . . . . . . . . . . . . . . . 66
6.1 Shanghai Covered Region . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Shanghai 3x3 grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.3 Contact duration over 10 days at 9 areas - 7am . . . . . . . . . . . . 80
6.4 Contact duration over 10 days at 9 areas - 11am . . . . . . . . . . . . 80
6.5 Contact duration general statistics of area 5 . . . . . . . . . . . . . . 81
6.6 Contact duration over 10 days at 9 areas - 7am . . . . . . . . . . . . 82
6.7 Contact duration over 10 days at 9 areas - 11am . . . . . . . . . . . . 82
6.8 Contact duration at 9 areas over 10 days - 7am . . . . . . . . . . . . 83
6.9 Contact duration at 9 areas over 10 days - 11am . . . . . . . . . . . . 83
6.10 CCDF of aggregated and directed contact duration - 7am . . . . . . . 84
6.11 CCDF of aggregated and directed contact duration -11am . . . . . . 84
6.12 Aggregated contact duration at 1 area on 1 day . . . . . . . . . . . . 85
6.13 Aggregated contact duration for all areas on all days . . . . . . . . . 85
7.1 Normal RW simulation versus empirical data - Feb 05th . . . . . . . . 87
ix
7.2 Birth death RW simulation versus empirical data - Feb 05th . . . . . 89
7.3 Birth death RW simulation versus empirical data over multiple days . 90
7.4 OD based SUMO simulation versus empirical data . . . . . . . . . . . 91
7.5 Encounter distribution Comparison for multiple communities . . . . . 92
7.6 RW model (2 transition probability matrices) versus empirical data . 93
x
List Of Tables
3.1 Existing models and their characteristics . . . . . . . . . . . . . . . . 22
5.1 Mean total propagation time (5 areas) . . . . . . . . . . . . . . . . . 59
6.1 Statistic result: portion of pair of days having similarity over 9 areas 74
6.2 Statistic result: portion of 9 areas having similarity over 10 consecutive
days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Aggregated contact duration statistics . . . . . . . . . . . . . . . . . 77
xi
Abstract
Vehicular communication has gained an increasing worldwide interest and IEEE
802.11p has been standardized to support Wireless Access for Vehicular Environ-
ments. Several consortia (EU C2C-CC [1], US-ITS [2]) have focused on developing
key network technologies for communications, security, wireless, routing, etc. for fu-
ture vehicular communication system. Such vehicular communication systems will
enable ecient new safety and infotainment applications for in-vehicle consumption
by ooading trac away from increasingly crowded cellular infrastructure.
In the thesis, we have tried to study dierent perspectives of vehicular networks
including inter-encounter statistics, contact duration, data dissemination expectation
time utilizing Markov chain, random walk model and simulation using vehicular data
trace of Beijing and Shanghai. First, we model mobility of nodes and extract their
encounter information in a network by Random Walk model approach. We have tried
to generate random walks from statistics extracted from real data trace to reduce the
gap between randomness nature of random walks model and mobile characteristics
of real data traces. After that, we look into a case study of estimating data dissemi-
nation using area based models in vehicular network. Estimating data dissemination
xii
time for a vehicle network can be considered an application for utilizing encounter
information of opportunistic networks. The third study investigates how time, space
and driving directions condition having impact on aggregated contact duration dis-
tribution from Shanghai data trace. The last work considers dierent approaches to
enhance the random walk model accuracy in capturing the inter-encounter intervals
as well as inter-any encounter intervals for vehicles in the network. For all problems,
vehicle traces are processed such that the whole interested region are divided into
multiple zones. Zones can be chosen as major intersections from a given street map,
geographical zones, hot spots, separate streets or even partial streets. Zones are gen-
erated dierent for dierent area partitioning algorithms. Under coarse grained data
processing, we turn the bounded region into a random walk with zones considered
vertexes in the graph, and vehicles moving from zones to zones following transition
probability matrix extracted from the vehicle traces. In the data dissemination case
study, there is one more useful parameter which is the average staying time for vehicles
in every zones. Vehicles staying at the same vertexes are considered in contact with
each other. Whenever there are encounters, data can be transferred among the cars.
Generated random walks provide movement information of vehicles in the network as
well as their encounter information. Having all required information, we could pro-
ceed to study pairwise inter-encounter time, individual-to-any inter-encounter time,
data dissemination expected time, as well as contact duration.
xiii
Chapter 1
Introduction
Intermittently Connected Mobile Networks (ICMNs) has become an emerging area
attracting extensive research eort in the past few years. ICMNs possess a dy-
namic structure of mobile nodes communicating with each other through wireless
links, self-organization to form temporary topology and connectivity. In ICMNs, it
is challenging to create an end-to-end path between any pair of nodes due to either
extremely dynamic and node-behavior dependent mobility or very sparse network
architecture [3].
Eventhough the works in this thesis are applicable broadly to Intermittently Con-
nected Mobile Networks (ICMNs), the motivation and evaluation are entirely based
on Vehicular Networks (VANETs) which are likely to be deployed in the near future,
enabled by short range vehicle-to-vehicle radios. Dedicated short range communica-
tions (DSRC) and standardized IEEE wireless access vehicular environments (WAVE)
1
communication stack allows vehicles to share both safety information (accident pre-
vention, post-accident investigation, trac jams warnings) and travel eciency in-
formation (entertainment, weather or spots of interests information ). Some unique
characteristics of VANETs incudes predictable mobility (vehicles movements are con-
strained by road topology and map layout), no power constraints, large scale and
their high computational ability. VANETs are also characterized by their variable
network density and rapid changes in network topology [4]. The trac density can
be extremely high at rush hours at suburban areas but very sparse and low in rural
areas or even at night. Intermittent connectivity has become a critical challenge for
VANETs applications.
In these opportunistic vehicular networks, a contact (an encounter) between any
pair of vehicles is generated when the pair is within communication range of each
other. However, vehicles in the networks keep moving in their own ways (dierent
speed and directions) so communication links among mobile nodes are on and o
continuously; the pair will no longer be in contact when one car moves out of the
other car's vicinity area or its link quality
uctuates. Therefore, there are hardly any
existing complete path from sources to destinations, and even discovered complete
paths are very unstable and of short term duration. In those intermittently connected
vehicular networks (ICVNs) whose links among nodes are generated highly dynam-
ically and intermittently, store-carry-forward techniques and opportunistic data dis-
semination framework are introduced as a way to overcome intermittent connectivity,
2
although relative high delay is sometimes the cost to get information from one source
to another destination. Data transfer is performed by intermediate cars over multi-
hop routing paths, in which a relaying vehicle keeps the data locally, travels around
the network and transmits the message whenever it contacts with another vehicle.
The transfer-on-contact process continues from node to node in the network until the
message reaches the destination.
Understanding how the contact process really works and knowing related en-
counter statistics information can improve data distribution demand in terms of either
reducing the overall delay, saving relaying bandwidth, local storage buer and energy
cost at intermediate nodes or increasing data throughput and transfer reliability, etc.
A pair of vehicles having higher meeting frequency and shorter inter-encounter time
compared to the others will have better opportunity to transmit data to each other.
Any cars having higher encounter frequency with larger group of neighboring nodes
within a shorter time duration can become better candidates as intermediate nodes
for relaying data in routing algorithms. Any pair having contact duration longer than
those of other pairs is a sign that the contact link between them is more stable and
data transferred between them is also more likely to reach the destination successfully.
3
1.1 General Settings
1.1.1 Opportunistic network statistics
A few critical encounter statistics have been introduced [5]:
hitting and meeting times: the expected time until there is a new contact or
encounter between two nodes
contact time (encounter duration): the period of time starting from which two
nodes are in contact until they move out of the radio range of each other
pair-wise inter meeting time (pair-wise inter encounter time): the period of time
from which a pair of nodes starting to move out of range of each other until
they are able to encounter again.
inter any encounter time: the period of time from which a specic node starting
to move out of range of its neighboring nodes (it starts becoming isolated) until
it encounters any of other nodes in the network.
Encounter statistics are such crucial metrics in modelling opportunistic networks
that a good prediction of these statistics can help in the design, optimization, anal-
ysis and evaluation of algorithms and protocols for data communication, routing,
dissemination, and storage.
4
1.1.2 Random Walk
We have tried to generate random walks from statistics extracted from real trac
data trace to reduce the gap between randomness nature of random walks model and
mobile characteristics of real data traces. Vehicle traces are processed such that the
whole interested region are divided into multiple zones. Zones can be chosen as major
intersections from a given street map, geographical zones, hot spots, separate streets
or even partial streets. Zones are generated dierent for dierent area partitioning
algorithms. Under coarse grained data processing, we turn the bounded region into
a random walk with zones considered vertexes in the graph, and vehicles moving
from zones to zones following transition probability matrix extracted from the vehicle
traces. Vehicles staying at the same vertexes are considered to be in contact with
each other
1
. Whenever there are encounters, data can be transferred among the cars.
Generated random walks provide movement information of vehicles in the network as
well as their encounter information.
1.2 Contributions
Our research to date has consisted of four studies: the rst one focuses on a methodol-
ogy to compute exact or near-exact distributions of encounter statistics for multiple-
random walkers in polynomial time, the second one focuses on analyzing information
dissemination in a large-scale vehicular network using a coarse-grained random walk
1
This is a modelling approximation for tractability
5
model in which the states correspond to large areas, the third one analyzes the con-
tact duration statistics along with some factors having impact on it using Shanghai
taxis data traces, and the last one compares the proposed random walk model and the
traditional origin-destination model in terms of encounter statistics using both real
world data traces and the well-known Simulation of Urban MObility trac simulator
(SUMO [6]) as well as enhancement in the current random walk model so as to reduce
the gap between the two models.
1.2.1 Study 1: On Computing the Encounter Distributions
of Multiple Random Walkers on Graphs
For intermittently connected mobile networks such as sparsely-deployed vehicular net-
works, it is of great interest to characterize the distribution of encounter times. While
prior work has focused on characterizing the encounter times observed in empirical
traces, the theoretical understanding of how to compute encounter time distributions
for a given model has lagged behind. We consider a very general mobility model in
which each device is assumed to be moving through a given graph (which could repre-
sent, for instance, a given street map) following a general random walk with arbitrary
transition probabilities. We consider rst the pairwise inter-encounter time distribu-
tion for a particular pair of random walkers and present a recursive polynomial-time
computation that yields the exact solution. We then consider the individual-to-any
inter-encounter time (i.e., the time between contacts of a particular walker with any
6
of the other walkers in the population). For this harder problem, we give an ap-
proximate computation that is also polynomial time. We validate the accuracy of
the presented solutions using numerical simulations. We show how the model can
be generalized to consider multiple populations of devices, each following a dier-
ent transition probability matrix. A special case of this generalization is where each
device has a distinct mobility pattern.
1.2.2 Study 2: Area-Based Dissemination in Vehicular
Networks
Pure opportunistic dissemination of content in a vehicular network can incur high
delays if the number of vehicles is relatively low. We consider in this study an area-
based approach to information broadcast in which vehicle to vehicle (V2V) communi-
cations is supplemented with vehicle to infrastructure (V2I) communications in order
to improve the delay performance. We show how area-based dissemination can be an-
alyzed mathematically using a Markovian (random walk) model. We also investigate
through trace-based simulations how dierent area-partitioning approaches aect the
total dissemination time.
7
1.2.3 Study 3: A Study of Contact Durations for Vehicle to
Vehicle Communications
Understanding how the contact process really works and knowing related encounter
statistics can improve data distribution demand in terms of either reducing the overall
delay, saving relaying bandwidth, local storage buer and energy cost at intermediate
nodes or increasing data throughput and transfer reliability, etc. A pair of nodes
having higher meeting frequency and shorter inter-contact time [7, 8, 9] compared to
the others will have better opportunity to transmit data to each other. Any nodes
having higher encounter frequency with larger group of neighboring nodes within a
shorter time duration can become better candidates as intermediate nodes for relaying
data in routing algorithms. Any pair that has contact duration longer than those
of other pairs, which is a sign that contact link between them is more stable and
data transferred between them, is also more a reliable candidate to reach a given
destination successfully.
Among multiple crucial parameters of the contact process, contact duration is a
critical parameter [10, 11]. We consider in this study an intensive study about vehic-
ular contact duration based on the data obtained by Shanghai Jiao Tong University
[12]. We make two sets of contributions:
We analyze the aggregated contact duration distribution based on the real taxi
data trace, and
8
We quantify the contact duration conditioned on dierent parameters including
time, location and vehicle directions.
1.2.4 Study 4: Enhanced Random Walk Model
While the rst study approximates the encounter distributions applying the multiple
random walkers on graphs; in this study, we have tried dierent approaches to enhance
the accuracy and bring down the gap between estimated distributions using random
walk models compared to the empirical distributions obtained from the real trac
trace. A comparison between the random walk model and the traditional trac
forecasting model using simulations generated from SUMO - a well-known trac
simulation suite is presented. We can conclude that the random walk model with
integrated birth death process has increased the performance of the model.
9
Chapter 2
Background
This chapter includes some theoretical background as well as basic denition for the
corresponding four works presented in the thesis.
2.1 Random Walk, Markov Chain and Markov
Jump Process
Random Walk [13] is a fundamental stochastic process by which objects move ran-
domly away from their point of departure. Given a sequence of independent identically
distributed discrete random variables X
1
;X
2
;:::;X
n
, for each n, we dene:
S
n
=
n
X
i=1
X
i
(2.1)
The sequence S
n
1
n=1
is called the random walk.
Markov chain, according to [14], is a random process including a set of states
S = s
1
;s
2
;:::;s
r
in which the process starts from one of the states s
i
and moves to
10
another states
j
in the state spaceS according to transition probabilityp
ij
described
in transition probability matrix P . The process begins at an initial distribution
dened on S. The Markov chain must satisfy the Markov property, the memoryless
property of the stochastic process; the conditional probability distribution of future
states of the process only depends on the present state.
Markov Jump Process [13] is a Markov process dened on discrete state space but
continuous in time. The time in each state follows an exponential distribution.
Finally, a random walk on graph including the sequence of random nodes on a
given graph is a Markov chain [15].
2.2 K-means Clustering
In chapter "Area based dissemination", we have used k-means clustering as a tech-
nique to partition the whole region. K-means clustering is a famous unsupervised
clustering algorithm allowing classifying a given dataset into K groups based on their
attributes and features [16]. Picking an initial set of K centroids:
For each observation of the dataset, we will calculate the distance from it to all
of the centroids. Choosing the minimum distance for each observation means
that the observation belongs to the cluster of the corresponding centroid.
Update the new means to be the locations of K centroids.
11
This process is repeated until there is a convergence such that no observations
moves to new clusters.
2.3 Two sample hypothesis test
Hypothesis testing is a procedure which use probability theory to determine whether
the hypothesis is a reasonable statement (not rejected) or not reasonable (rejected).
2.3.1 Kolmogorov-Smirnov Test
Kolmogorov-Smirnov Test [17, 18] is a nonparametric hypothesis testing (distribution-
free tests in which the interested distribution under investigation is not necessarily
assumed to follow a specic distribution) to prove whether the samples come from a
population with known distribution (one-sample KS test) or two populations come
from the same distribution (two-sample KS test). In the Shanghai contact duration
study, we are interested in the two-sample KS test because we want to verify if contact
duration distribution under dierent factors (time, location, direction) is consistent
or not.
Let:
x
1
;x
2
;:::;x
m
be independent and identically distributed observation samples with
cumulative distribution function F .
y
1
;y
2
;:::;y
n
be independent and identically distributed observation samples with
cumulative distribution function G.
12
The stated null hypothesis is stated as:
H
0
:F =G (2.2)
The alternative hypothesis is:
H
1
:F6=G (2.3)
Furthermore, let F
m
(x) and G
n
(x) are corresponding empirical cdfs:
D
m;n
=
r
mn
m +n
sup
x
jF
m
(x)G
n
(x)j (2.4)
Moreover, in [17], in case that both of the underlying distribution are continuous,D
m;n
does not depend on F and G. Furthermore, when H(t) is the Kolmogorov-Smirnov
distribution:
P (D
m;n
t) =P
r
mn
m +n
sup
x
jF
m
(x)G
n
(x)jt
!H(t) (2.5)
H(t) = 1 2
1
X
i=1
(1)
i1
e
2i
2
t
(2.6)
If the null hypothesis is true, the distribution D
m;n
only depends on m and n. And
if m;n are large enough, D
m;n
can be approximately by Kolmogorov-Smirnov distri-
bution. If the null hypothesis is failed, for large m;n, and small enough we will
have:
D
m;n
=
r
mn
m +n
sup
x
jF
m
(x)G
n
(x)j>
r
mn
m +n
!1asm;n!1 (2.7)
13
To test the null hypothesis, we dene threshold c:
=
8
>
>
<
>
>
:
H
0
:D
m;n
c
H
1
:D
m;n
c
(2.8)
Given the level of signicance :
=P (D
m;n
cjH
0
) (2.9)
2.3.2 Anderson Darling Test
Another powerful two-sample test on comparing data distribution is Anderson Darling[19,
20]. This test is more suitable for data sets of smaller size and better at detecting
very small dierence between two distribution observations.
AD =
1
mn
n+m
X
i=1
((N
i
Z
n+mni
)
2
)
1
iZ
n+mi
(2.10)
In the above equation, Z
n+m
represents the combined and ordered samples X
n
and
Y
m
, andN
i
represents the number of observations inX
n
which are equal to or smaller
than the ith-observation in Z
n+m
. The null hypothesis is rejected if the computed
AD is larger than the corresponding critical value in [20].
2.4 Traditional models in trac engineering
Traditional forecasting trac models includes two major trends: four-step models
(urban transportation planning) and activity based models. The rst one aims at
trac forecasting for dierent zones in the whole region while the latter performs
prediction for individuals along with their specic activities.
14
In trac engineering, the conventional four-step transportation forecasting process
includes [21, 22]:
Trip Generation: identifying number of trips originating from or destined for
a particular trac analysis zone. In this step, these attractions or productions
statistics are often estimated using household surveys.
Destination Choice: developing a trip table, a matrix displaying number of
trips going from each origin to each destination. How the trips are distributed
among these zones in the whole region are predicted in this step. The Origin
Destination matrix estimation is very important. The most common method
to estimate OD matrix is the gravity model.
Mode Choice: allowing the modeler to determine what mode of transport will
be used. The result of this step is dierent OD matrices for dierent travel
modes.
Route Choice: describing the selection of routes between origins and destina-
tions in transportation networks. Trac constraints include capacity, speed
limit, speed-delay curves as well as human behaviors are taken into account in
this step for optimization.
Similarly, activity based models procedure also includes:
Activity Generation: activities are generated
Destination Choice: destinations for the activities are identied
15
Mode Choice: travel modes are determined
Route Choice: route used for each trip is predicted
We can see clearly that activity based models enable individual activity details
like constrained time and space conditions specically for dierent activities while
four-step models focus more on zone aggregation.
In this thesis, because of the coarse grained embedding of time-series charerter-
istics of random walk, we only consider the traditional four-step forecasting trans-
portation model. In the real world, vehicles typically start from a specic origin
aiming for its corresponding destination. Each vehicle tends to have its own origin
and destination. For example, public utilities vehicles, taxis and commercial vehicles
leave their own stations for a regular stop following a scheduled route (bus) or for
temporary targets during the day (taxis) and nally return to their home stations.
Private vehicles leave home for schools, oces in the morning and head home in the
evening. Consequently, Origin-Destination (OD) estimation (the second step in the
above model) is the crucial step in transportation planning, design and implemen-
tation. OD matrices provide vehicle
ow information from one geographical area to
another.
16
2.5 Simulation of Urban MObility (SUMO) -
Trac Simulator
SUMO (Simulation of Urban MObility) is a widely used open trac simulation suites
allowing modelling trac systems [6]. SUMO is a microscopic trac simulation built
on an extension of the car-following model developed by Stefan Krau. Since the rst
open source release in 2002, SUMO has oered a full featured suite of trac mod-
elling utilities and allowed to investigate several research topics including vehicle to
vehicle (V2V) and vehicle to infrastructure (V2X) communication, route choice and
dynamic navigation, trac light algorithms as well as evaluation of trac surveillance
systems [23]. SUMO has also been used in many trac research projects: TracOn-
line (determining travel times using GSM telephony data), iTETRIS (coupling the
communication simulator ns3 and SUMO, in-depth descriptions of V2X-based traf-
c management), VABENE (implement a system that supports public authorities to
decide which action should be taken on big events or catastrophes), CityMobil (dier-
ent scenarios of automated cars or personal rapid transit are set up) [24], or ongoing
works on emission and noise modeling, etc.
Figure 2.1 shows the imported Open Street Map for Los Angeles using SUMO
while the gure 2.2 presents a snapshot of trac simulation by SUMO of trac in
Los Angeles.
17
Figure 2.1: SUMO LA simulation
Figure 2.2: SUMO LA zoom-in simulation
18
Chapter 3
Related Work
Mobility modelling in general and encounter modelling in particular for Intermit-
tently Connected Mobile Networks have gained popular interests of scientists and
researchers in the opportunistic network area because of its critical importance in
studying, analyzing, developing network protocols and forwarding algorithms as well
as their practical validity. There are dierent proposed taxonomies for mobility mod-
elling in Intermittently Connected Mobile Networks [25, 26, 27]. We combined them
in the following diagram: In mobility modelling for Delay tolerant networks, there
are three major factors in
uencing mobility characteristics including temporal, spa-
tial and social constraints [28]. From temporal perspective, humans and vehicles can
only be active (in movement) for specic time periods throughout a day; there are
rush hours and o-peak trac periods at night in vehicular networks. From spatial
perspective, humans do not move randomly but follow strictly routes on a map as
well as restricted by building structures and other obstacles; they also tend to have
some preferred locations which they move back and forth. From social constraints,
19
Figure 3.1: Mobility modelling taxonomy
humans have their own communities and relationships; they also have their own driv-
ing behaviors as well as interests and preferences which aect their mobility patterns.
Social relationships and communities forming rules should be studied and referenced
carefully from social networks data trace and human agenda surveys while spatial con-
straints should be obtained from real maps. However, we can only consider temporal
and spatial perspectives. Social constraints should be obtained from social networks
data trace and human agenda surveys in the future.
Random walk and its various types have been applied in Delay Tolerant Networks
because of its tractable mathematical analysis. Nonetheless, most of the traditional
random walk models belong to Random Mobility Models, in which movements are
considered simple with simplied constraints and random speed, random directions or
20
random pauses. Representatives of these type of traditional random walks are Ran-
dom Walkway Point [29] and its two variants including Random Walk and Random
Direction models. Specically, in Random Walk model, every node in the network
moves with random speed, direction and destination; and on reaching the destination,
the nodes pick new random parameters to perform the new route. Random Walkway
Point is another famous model in which each node also moves randomly according to a
constant speed, wait for a random pause time, and then chooses another destination.
Similarly, in Random Direction model, a mobile node only pauses when it reaches
the boundary of the simulation area and travels towards a new location. In the re-
search series studying intermittently connected mobile adhoc networks, Spyropoulos
et al. also assume these types of mobility models but more generally [30, 31, 32]: all
nodes move according to some stochastic mobility models whose meeting time are
approximately exponentially distributed.
A group of map-constrained mobility models with maps as geographical constraint
can help add reality to mobility modelling despite the fact that those inputs have
become diculty for mathematical analysis. Some existing works and their charac-
teristics [27] are summarized in the following table:
We have tried to integrate random walks and map based input aiming at providing
a tractable mobility model studying encounter statistics.
Numerous empirical studies have tried to reveal characteristics of inter-contact
time in various categories of wireless networks. Investigations into wireless local area
21
Table 3.1: Existing models and their characteristics
Model Speed Direction Pause time Travel length Area pattern Routes
Random Walk random random NA NA NA NA
Random Walk Waypoint random random random NA NA NA
Random Direction random random random NA NA NA
Levy Walks random random power-law power-law NA NA
Map-Based random random NA NA map NA
Shortest Path-Based Map random random NA NA map shortest path
Route-Based Map random random NA NA map predetermined
Manhattan Mobility random random NA NA grid predened probability
network (WLAN) users' encounters based on USC WLAN traces have shown Bi-
Pareto distribution, and connectivity richness enables potential in information
ood-
ing without infra-structure [33]. Passarella et al. have provided a statistical analysis of
pairwise inter-contact patterns in 3 dierent Delay Tolerant Network data set (Dart-
mouth, iMote and MIT) and have proven the well-tness of log-normal distribution
and exponential curves to the inter-contact time distribution [34] while the researchers
have investigated the characteristics of inter-contact time of Mobile Adhoc Networks
through empirical observation of taxis and buses traces in Shanghai [35]. They have
also concluded the exponential tail of the inter-contact time which contrasts to the
previous well-known results with power law distribution conclusion [36].
While there is still some controversy among previous works analyzing the statistics
from real traces, many theoretical mathematical models have also been proposed to
characterize mobility characteristics of such opportunistic networks from the ground-
up. Among both analytical and simulated approaches, the tractable correlation-
capturing randomness inherent in the random walk model has made it a famous
candidate for random movement simulation of mobile nodes in Wireless Ad Hoc
22
Sensor Networks [25] and Vehicular Ad hoc Networks [26]. Exact mean encounter time
is another crucial topic for many authors using random walk model [37], [38], [39].
Moreover, James et al. investigated the encounter probability for pairs of random
walkers in 1D,2D and 3D on a lattice in his work [40]. Another analytical model looked
into distribution of inter-contact time as well as aggregated inter-encounter time in
a general heterogeneous network [41]. In our work, we also choose random walk as a
tool to tackle the problem, but we aim at numerically estimating the full aggregated
pairwise inter-encounter time and inter any encounter time distribution in our work
on encounter statistics modeling. We also aim to use multiple random walk models
to analyze information dissemination and routing mechanisms, as demonstrated in
our prior work on area-based dissemination.
3.1 On Computing the Encounter Distributions
of Multiple Random Walkers on Graphs
Inter-contact time has gained popular interests of scientists and researchers in the net-
work area because of its critical importance in developing forwarding algorithms and
their practical validity. Numerous empirical studies have tried to reveal character-
istics of inter-contact time in various categories of wireless networks. Investigations
into wireless local area network (WLAN) users encounters based on USC WLAN
traces have shown BiPareto distribution, and connectivity richness enables potential
23
in information
ooding without infrastructure [33]. Passarella et al. have provided
a statistical analysis of pairwise inter-contact patterns in 3 dierent Delay Tolerant
Network data set (Dartmouth, iMote and MIT) and have proven the well-tness of
log-normal distribution and exponential curves to the inter-contact time distribu-
tion [34] while the researchers have investigated the characteristics of inter-contact
time of Mobile Adhoc Networks through empirical observation of taxis and buses
traces in Shanghai [35]. They have concluded the exponential tail of the inter-contact
time which contrasts to the previous well-known results with power law distribution
conclusion [36].
While it still remains controversial among previous works of realistic traces and
statistical approach, many other theoretical mathematical models have tried to char-
acterize mobility characteristics of such opportunistic networks. Because of its tractable
mathematical analysis, random walk model and its various types have become fa-
mous candidate for random movement simulation of mobile nodes in Wireless Ad
Hoc Sensor Network [42], in Vehicular Ad hoc Networks [26] and in Delay Tolerant
Network [27]. Kalay [38] studied the statistics (including mean and variance) of the
rst passage time (the time such that a specic node encounters its target) in a nite
1D lattice partitioned into domains and a 2D lattice for a single random walker and an
immobile target. Furthermore, rst passage time distribution for the prior case was
presented. Moreover, Colin Cooper et al. gave precise results of cover time (the time
24
to broadcast a piece of information to all of the particles given that they can commu-
nicate with each other when meeting at a vertex) by using multiple random walkers
on a random regular graphs under dierent scenarios [39]. Furthermore, James et
al. investigated the encounter probability for individual pair of random walkers in
1D, 2D and 3D lattices [40] while another analytical model looked into aggregated
inter-encounter time given that distribution of individual pair inter-contact time is
known in advance for both unied and general heterogeneous network (when not all
pairs contact patterns are the same) [41]. Finally, Sanders in [37] provided the exact
mean time of a given particle in a system of multiple particles undergoing random
walks and indicated that:
\the full probability distribution of encounter times, and the eect of dif-
ferent network structures on those results are subjects for future studies"
While all of previous works presented statistical results including mean and vari-
ance of rst hitting time, mean of cover time, individual pair's encounter probability,
mean of inter-contact time and distribution of aggregated pair inter-contact time
with prior knowledge of individual pair inter-contact time, our work presents the rst
time method to numerically calculate aggregated pairwise inter-encounter time and
approximate inter any encounter time distribution for a particular walker with other
walkers in a general graph given their movement patterns.
25
3.2 Area-Based Dissemination in Vehicular
Networks
Mobility has been considered a crucial aspect which aects accurate connectivity
and performance analysis of VANETs [43]. Markov mobility models have been used
widely as a strong mathematical tool in evaluating MANETs mobility[44][45]. In
VANETs, [46] utilized Markov Chain to predict vehicle directions and velocities, and
in a closely-related work, [47] show through validation over real data sets that the
Markov Jump mobility model could be used to predict network performance param-
eters accurately. Further, data dissemination in networks have attracted numerous
researchers. An epidemic model based on extracted spatial user mobility of network
of devices is developed in [48]. In [49], the authors described a data dissemination
scheme by data buering and retransmitting at intersections of the network. The au-
thors of [50] addressed the
uctuated connectivity of V2V communication by propos-
ing a Voronoi-based placement of RSUs algorithm in V2I networks while in [51], a
forwarding algorithm is presented to extend the coverage of Road Side Units in V2I
infrastructure network. Our work builds on some of these prior works and makes a
complementary novel contribution by analyzing the area-dissemination process taking
advantage of both V2I and V2V communications.
26
3.3 A Study of Contact Durations for Vehicle to
Vehicle Communications
Contact duration has become an important parameter applied by multiple researchers
in dierent protocols and techniques to enhance data access, data transfer and net-
work connectivity in opportunistic in general delay tolerant networks as well as ve-
hicular networks. The authors in [52] presented Link Contact Duration-based Rout-
ing Protocol to deliver as many messages as possible within a short time. In [53],
PRoPHET, a probabilistic routing protocol for Delay Tolerant Networks, both inter-
meeting time and contact duration are taken into account to compute the delivery
probability to improve performance of the proposed routing protocol. Another work
applying contact duration and meeting frequency so as to estimate message delivery
probability and present a novel routing algorithm is [54]. X. Zhuo et al. also used con-
tact duration to enhance the traditional cooperative caching protocol to improve the
performance of data access in DTNs. In [11], the same technique is used to improve
data replication for data sharing in DTNs.
Furthermore, there are also other works trying to characterize contact duration
patterns in a vehicular network. Y. Li et al. [10] has carried out experiments using
Beijing and Shanghai traces to study the contact duration characteristics. They
concluded that the contact duration obeys an exponential distribution, while beyond
a characteristic time point it decays as a power law one. While many works such
27
as this have been utilizing contact duration as a crucial factor to help enhance the
performance of their protocols in DTNs and VANETs, our work mainly brings into
focus the impact of multiple factors (time, location and direction) on contact duration
distribution after intensively analyzing and studying contact information obtained
from the Shanghai taxi trace.
28
Chapter 4
On Computing the Encounter Distributions of
Multiple Random Walkers on Graphs
Intermittently connected mobile networks (ICMN) are networks in which com-
munication links and contacts among mobile nodes are not xed but created highly
dynamically and intermittently. Objects in ICMN keep moving so contact links be-
tween them are on and o continuously. There may be no end to end existing path
for any pairs of source and destination in such a network. Such networks are relevant
for instance for peer to peer mobile device applications and for sparsely-deployed
vehicular networks with short range vehicle to vehicle communication links.
Modelling of ICMN aims at prediction of dierent parameters and features of
the mobility networks in order for proposing corresponding algorithms, enhancing
performance of routing as well as reducing total delay of data dissemination in network
The work in this chapter is based on [55].
29
of vehicles. In ICMN, due to unstable links among objects and devices, store and
forward scheme is used in which a node keeps the data, travels and transmits the
data whenever there is a contact. If two nodes meet each other more often, they have
higher probability to transmit data to each other. And if a node meets any other node
quickly after an encounter in which it received some data itself, it can help disseminate
that data faster through the network. Therefore, among other metrics, inter encounter
times are a crucial key metric characterizing such opportunistic networks.
We consider in this work a general model of multiple random walkers (represent-
ing the mobile devices) travelling with a random movement pattern in a network
represented by a graph. All of the walkers make movement decision every time slot;
each walker follows a transition probability matrix staying at its original location or
moving from one vertex to another. We consider in this paper two important random
variables relate to the encounter process: the Pairwise Encounter Time (PET), the
inter-encounter time between any pairs of walkers, and the Inter-Any Encounter Time
(IAET), the inter-encounter time between a particular walker and any of the other
walkers in the network.
The following are our key contributions:
We present the rst exact computations for the PET distribution.
We present the rst approximate-computation for the IAET distribution on a
general connected graph and show through numerical simulations that this is a
validated approximation.
30
For greater generality, we further extend our results from a single community
case (all walkers follow the same movement pattern) to multiple communities
case (multiple groups of walkers in the network, wherein the walkers within
each group follow the same movement pattern but dierent groups have dier-
ent patterns.) For this generalization also the complexity of our computations
remains polynomial. In addition, we validate this extension through numerical
simulations.
4.1 Problem Formulation
We formulate the problem as a random walk model. Vehicles are considered ran-
dom walkers while road map is illustrated as a directed connected graph dened by
G(V;E). There are totalk walkers walking on the connected graph, in which there are
n =jVj vertexes and m =jEj edges. Given the connected graph, a walker starting
from a vertex can choose any of its neighbor vertexes or choose to stay at that vertex
following a transition probability matrix P . The available location set of one single
walker is V =f1; 2; 3;::;ng. P (x;y) is the probability that a vehicle moves from
vertex x to vertex y. We want to study PET (Pairwise Encounter Time) and IAET
(Inter Any Vehicle Encounter) in a vehicular network. PET is the inter-encounter
time considered for a pair of vehicles while IAET is the inter-encounter time between
a particular walker and any other walkers in the network.
31
Assume that we have a road network scenario described in Figure 4.1a. There are
total 4 areas A1;A2;A3;A4 identied by the 4 squares in the map. There are also 3
car paths coded as the solid line, the dash line and the dot line along with timestamp
specifying their current corresponding position (current corresponding area A1, A2,
A3 or A4) at the time. From the road network scenario, we can generate a corre-
sponding movement graph as in Figure 4.1b, in which each area is a node, and each
car is a random walker. Every car can make decision to move to another area or
continue staying at the same area in the beginning of each time slot. Any two cars at
the same node at a given time slot are considered encountering each other. Having
all the position information of all vehicles in the system at all the time slot, we can
derive transition probability matrix illustrated by Figure 4.1c which is used to capture
the movements of the vehicles in the system. Consequentially, we could proceed to
perform simulations using a random walk on graphs and calculate the corresponding
Pairwise Encounter time (PET), and Inter-any encounter time (IAET). Finally, gure
4.1d illustrates an example of PET and IAET calculation for the scenario. The dash
line represents walker 1, the dot line represents walker 2 while the solid path shows
the movement of walker 3. Collecting all PET for any pairs of cars and all IAET for
any car in the network helps us to estimate the corresponding distribution for the
given scenarios.
In the following subsections, we will demonstrate how to exactly compute PET
and approximately calculate IAET distribution.
32
(a) Road network scenario example
(b) Corresponding movement graph (c) Corresponding transition probability matrix
(d) Corresponding encounters
Figure 4.1: PET, IAET extraction for a road network scenario
33
4.1.1 Computation of distribution of PET
PET is the inter-encounter time for aggregated pair of walkers
2
.
In order to solve the above problem, we dene a Markov chain, in which each
state stores locations of both walkers. Each walker's location can be any vertex
of set V of graph G. Let the state space of the Markov chain be S = V V =
f(1; 1); (1; 2); (1; 3);:::; (n;n)g. Let the transition matrix of the Markov chain be M.
Since all walkers follow the same transition probability matrix P over graph G and
independent of each other, we can derive the corresponding transition probability
M
(x;y)(x
0
;y
0
)
, the probability for walker 1 to move from x to x
0
, and walker 2 to move
from y to y
0
in one time step:
M
(x;y)(x
0
;y
0
)
=P (x;y)P (x
0
;y
0
) (4.1)
Consider a random variable X as the place the pair (walker 1 and walker 2)
currently meet. Also consider a random variable Y as the inter-encounter time, the
time it takes to meet again. We want to nd the distribution ofY . LetP (x;y;t) be the
probability given that the walker 1 initially stays at vertex x, walker 2 initially stays
at vertex y, they can rst meet after t time steps. Then, P (z;z;t), the probability if
that the two walkers meet again for the rst time after t steps given that they start
at z, is dened as:
2
Since all walkers follow the same transition probability matrix P, inter-encounter time for ag-
gregated pairs are the same as for any individual pairs.
34
P (z;z;t) =P (Y =tjX =z) (4.2)
Moreover, we can generate a recursive set of equations in order to estimate
P (z;z;t):
P (x;x; 0) = 1;8x2V
P (x;y; 0) = 0;8x;y2V;x6=y
P (x;y; 1) =
X
z2V
P (x;z)P (y;z);x;y2V
P (x;y;t) =
X
x;y;x
0
;y
0
2V
x
0
6=y
0
P (x
0
;y
0
;t 1)P (x;x
0
)P (y;y
0
)8t 2
(4.3)
Let
z
be the steady state distribution for a walker to be at location z. Because
walkers on the graph perform independent movements,
xy
, the steady state prob-
ability such that a pair of walkers stay at vertex (x;y);x;y2 V will be calculated
as:
xy
=
x
y
(4.4)
A pair of walkers can meet at any vertex inV . Therefore, the steady state proba-
bility such that a pair of walker stay at locationz;z2V is
zz
=
2
z
. Consequentially,
the proportion that the pair meet at location z;z2V is:
P (X =z) =
zz
P
x
xx
(4.5)
35
Finally, the PET distribution is calculated in the following equation:
P (Y =t) =
X
z2V
P (Y =tjX =z)P (X =z)
=
X
z2V
P (z;z;t)
2
z
P
x
2
x
(4.6)
In order to estimate the distribution, we need to calculateP (Y =t);t = 0; 1;:::;T
for an interested T. GivenT , the time complexity of the above procedure isO(V
2
T ).
However, it takes O(V
4
) to compute M.
4.1.2 Approximation of IAET distribution
Based on the PET modelling, we can proceed on computationally approximating
IAET, the inter-encounter time between a particular walker and any other walkers on
the graph. To approximate IAET, we assume that in the beginning, the particular
walker meets only one of the remaining walkers, and all of the others are distributed
following their steady state distribution. We already compute P (x;y;t), the proba-
bility any two walkers meet aftert time slots given initial location atx;y respectively,
following the equation set (4.3). Based on P (x;y;t), we can compute P (x;y;t), the
probability that with the same initial location prole, the pair has not met for t time
slots:
P (x;y;t) =P (x;y;t 1)P (x;y;t);x2V;y2V
(4.7)
Let X
i
, i2f1; 2;:::;kg be the current location of walker i on the graph. In the
beginning, k walkers start with an initial location prolefX
1
;X
2
;:::;X
k
g, X
i
2 V .
Without loss of generality, we denote the particular walker as the rst walker meeting
36
the second walker at location z;z2V . Therefore, the initial location prole is indi-
cated asfz;z;X
3
;:::;X
k
g. All walkers move independently on the graph. Therefore,
meeting any other walkers is independent for the particular walker.
Let dene L
z
as the random variable indicating the elapsed time since the par-
ticular walker meet one other walker at location z and hasn't met any other walkers.
For a given initial prolefz;z;X
3
;:::;X
k
g, the probability that the particular walker
hasn't met any other walkers since then up to time slot t starting:
P (L
z
=tjz;z;X
3
;:::;X
k
) =P (z;z;t)
k
Y
i=3
P (z;X
i
;t) (4.8)
There are total n
k2
proles, for which each remaining walker can be at any
location in the network. L
z
probability can be approximated as
3
:
P (L
z
=t) =
P
X
3
;:::;X
k
P (L
z
=tjz;z;X
3
;:::;X
k
)
n
k2
(4.9)
The probability for such an initial prole,fz;z;X
3
;:::;X
k
g, X
i
2V for i2f3::;kg, is
in fact very dicult to obtain; however, we can rewrite equation (4.9) according to
the next equation:
P (L
z
=t) =P (z;z;t)
(
n
P
X
i
=1
P (z;X
i
;t))
jk2j
n
k2
(4.10)
Let deneL as the random variable indicating the elapsed time since the particular
walker meet one of remaining walkers and hasn't met any other walkers. Because
the initial meeting location can be any of n vertexes, and the meeting probability
3
We will show numerically that this approximation still yields accurate results
37
conditioned for each location is calculated according to the equation (4.10). Then we
have:
P (L =t) =
X
z2V
P (L =tjX
1
=z)P (X
1
=z) =
X
z2V
P (L
z
=t)
z
(4.11)
Finally, let U be the IAET random variable. The U distribution is calculated
according to the following equation:
P (U =t) =P (L =t 1)P (L =t) (4.12)
In order to estimate the distribution, we need to calculateP (U =t);t = 0; 1;:::;T
for an interested T. GivenT , the time complexity of the above procedure isO(TNV
2
+
V
4
).
4.2 Illustrative Example
We present an example for the above model by considering of k walkers walking
through the circular strip having n cells. In every time step, one walker can move
right or move left or stay in the same cell with probability 1=3. Figure 4.2 draws the
scenario of the Illustrative example when k = 3 and n = 7. Figure 4.3 shows the
constructed Markov Chain for the circular random walk.
38
Figure 4.2: Circular Random Walk example
Figure 4.3: Corresponding Markov Chain for the example
39
4.2.1 Pairwise Encounter Time
Letd
n
be the maximum distance between 2 arbitrary node. Given a n-length circular
strip, d
n
is:
d
M
=d
n
2
e (4.13)
Similarly, consider a random variable X as the current distance between 2 walkers,
walker 1 and walker 2. Also consider a random variable Y as the inter-encounter
time, the time it takes to their rst meet. We also construct the Markov Chain,
in which the states are distances between 2 walkers. The set of available distances
is D =f0; 1;:::;d
M
g. Let P (d;t) be the probability that 2 walkers having initial
distance d will meet after t time steps.
P (d;t) =P (Y =tjX =d) (4.14)
Based on computed transition probabilities from the constructed Markov Chain,
we can generate the following recursive set of equations, The equation set (4.15) shows
a few initial steps for the illustrated example n-length circular strip.
P (0; 0) = 1;P (d; 0) = 0;8d 1;d2D
P (0; 1) = 1=3;P (1; 1) = 2=9;P (2; 1) = 1=9
(4.15)
The recursive equation set (4.16) illustrated the steps to calculate the correspond-
ing P (d;t) derived from previous time steps and distance.
40
P (0;t) = 4=9P (1;t 1) + 2=9P (2;t 1)
P (1;t) = 4=9P (1;t 1) + 2=9P (2;t 1) + 1=9P (3;t 1)
P (2;t) = 2=9P (1;t 1) + 3=9P (2;t 1) + 2=9P (3;t 1) + 1=9P (4;t 1)
P (d;t) = 1=9P (d 2;t 1) + 2=9P (d 1;t 1) + 3=9P (d;t 1)
+ 2=9P (d + 1;t 1) + 1=9P (d + 2;t 1); 3d (d
M
2)
(4.16)
Given n-length circular strip, we also have to consider the special case when two
walkers are separated half of the strip in the equation (4.17).
P (d
M
1;t) = 2=9P (d
M
;t 1) + 4=9P (d
M
1;t 1)
+ 2=9P (d
M
2;t 1) + 1=9P (d
M
3;t 1)
P (d
M
;t) = 1=3P (d
M
;t 1) + 4=9P (d
M
1;t 1) + 2=9P (d
M
2;t 1)
(4.17)
Finally, P (0;t) is the needed PET probability.
4.2.2 Inter-Any Encounter Time
Based on the PET modelling, we can proceed on IAET modelling for a particular
walker. We can compute P (d;t) following equation (4.16).
LetP (d;t) be the probability that 2 walkers having initial distanced haven't met
for t time steps.
P (d;t) =P (d;t 1)P (d;t) (4.18)
41
To make an approximation for IAET computation, we assume that amongk walkers,
2 of them stay in the same cell, and the otherk 2 walkers are distributed uniformly
over other cells. The initial distance prole isf0;d
1
;d
2
;:::;d
k2
g, in which d
i
is the
distance between walker i and walker i + 1, d
i
2D.
Let L be the random variable indicating the elapsed time such that the particular
walker initially meets one of the other walkers and hasn't met anyone. Given such an
initial prole:
P (L =tjf0;d
1
;d
2
;:::;d
k2
g) =P (0;t)
k2
Y
i=1
P (d
i
;t) (4.19)
Then we have:
P (L =t) =
P
d
1
;d
2
;:::;d
k2
P (L =tjf0;d
1
;d
2
;:::;d
k2
g)
d
k2
(4.20)
Finally, let U be the IAET random variable, the elapsed time since the particular
initially meeting one of the other walkers and encountering another walker.
P (U =t) =P (L =t 1)P (L =t) (4.21)
4.3 Simulation results
4.3.1 Walkers on a circular strip
Figure 4.4 describes CCDF for the PET distribution when k = 20 and n = 100. We
can see that simulation and theory calculation match well.
42
Figure 4.4: PET analysis, n=100, k=20, T=10000
43
Figures 4.5 shows the distribution of PET and IAET for some scenarios: n=16
and k=4, n=16 and k=7, n=24 and k=4, n=24 and k=7. The methodology works
ne for the special case of circular random walk.
(a) n=16, k=4 (b) n=16, k=7
(c) n=24, k=4 (d) n=24, k=7
Figure 4.5: PET and IAET computation for circular random walks
4.3.2 Walkers on a general connected graph
Given number of vertexesn and number of walkersk, we vary number of edgesm and
generate a random connected graph (characterized by n and m) as well as a random
44
Figure 4.6: PET and IAET analysis of random walks on a connected graph, n=12,
k=6
transition probability matrix P on the corresponding graph. We follow the proposed
computation for PET and IAET.
Figure 4.6 shows the comparison of PET and IAET when the walkers are random
walkers, which means at each vertex, the probability to stay at the same vertex or
moving to any neighboring vertex is the same.
Figure 4.7 shows the comparison of PET and IAET when the walkers follow a
randomly generated transition probability matrix.
In both cases in the above gures, number of vertexes are 12, number of random
walkers are 6. We varied number of edges ranging from 20 to 66 (the fully connected
graph). We could see that despite the density of the graph, PET and IAET are
approximated correctly.
45
Figure 4.7: PET and IAET of a general connected graph, n=12, k=6
Furthermore, we want to verify the estimation approach for large networks. Let m
be number of edges of the graph. Figure 4.8 illustrates the PET-IAET computation
for large-input case: n = 36, k = 36, m = 630 (36 vertexes, 36 walkers).
4.4 PET-IAET computation of multiple
communities on general connected graphs
So far, we only considered one single transition probability matrix P , which means
all of the walkers in the system move in the same manner. To extend our results,
let's say the system has total c communities indexed by 1; 2;:::;c, and walkers for
each communities follow its own transition probability matrix. The available location
46
Figure 4.8: PET and IAET of a general connected graph, n=k=36, m=630
set of all walkers is V =f1; 2; 3;::;ng. There are also c transition probability matrix
P
1
;P
2
;:::;P
c
for c communities correspondingly. Moreover, there are k
i
walkers in
communitiesi and totalk walkers. We can say that there are totalc (c 1)=2 types
of encountering pair indexed by 11; 12;:::;cc. Using the same approach in 4.1.1, let
Y
11
;Y
12
; ;Y
cc
be the pairwise-inter encounter time random variables for all pairs
of communities correspondingly. We compute P (Y
11
=t);P(Y
12
=t); ;P(Y
cc
=t)
for all types of encountering pair. Moreover, proportion for a pair coming from the
same community i denoted as
ii
and is calculated following the equation:
ii
=
k
i
2
k
2
(4.22)
Similarly, the proportion for a pair coming from two dierent communitiesi andj is:
ij
=
k
i
k
j
k
2
(4.23)
47
Finally, let Y be the PET random variable for all the communities. Then the PET
distribution is calculated:
P (Y =t) =
i;j=c
X
i;j=1
P (Y
ij
=t)
ij
(4.24)
To analyze IAET for multiple communities, without loss of generality, let consider
the particular walker as the rst person coming from community s, and he meets the
rst walker from community d at location z initially (s = 1; 2;:::;c; d = 1; 2;:::;c).
We also denoteX
ij
is the location ofjth person from communityi. All walkers move
independently on the graph. Therefore, meeting any other walkers is independent for
the particular walker.
We also denote L
s;d;z
as an initial prole having the particular walker as the rst
person coming from communitys meeting the rst person coming from communityd
at location z (which means X
s1
=X
d1
=z):
L
s;d;z
= (:::;z;X
s2
;:::;X
sks
;:::;z;X
d2
;:::;X
dk
d
;:::;X
c1
;:::;X
ckc
) (4.25)
Let denoteY
s;d;z
as the random variable of elapsed time that the particular walker
from community s meet one of walkers from community d at location z and hasn't
met any other walkers since then up to time slot t. We recall that P (x;y;t) is the
probability of a pair staying at x;y initially and has not met for t time slots. Then
we have:
P (Y
s;d;z
=tjL
s;d;z
) =
k1
Y
i=1
P (z;X
1i
;t):::
ks
Y
i=2
P (z;X
si
;t):::
:
k
d
Y
i=2
P (z;X
di
;t):::
kc
Y
i=1
P (z;X
ci
;t)
(4.26)
48
We would assume a uniform distribution of such an initial prole, X
ij
2 V for i =
1; 2; 3::;c, j = 1; 2;:::;k
i
accordingly:
P (Y
s;d;z
=t) =
P
L
s;d;z
P (Y
s;d;z
=tjL
s;d;z
)
jnj
k2
(4.27)
However, we have to consider the proportion of communities from which the initially
meeting pair is coming. Furthermore, proportion for the particular walker and another
walker coming from the same community i denoted as
ii
:
ss
=
k
s
(k
s
1)
k (k 1)
(4.28)
Proportion for the particular walker and another walker coming from two dierent
communities s and d:
sd
=
k
s
k
d
k (k 1)
(4.29)
Finally, let U be the IAET random variable. The approximated IAET distribution
is calculated as:
P (U =t) =
s=c;d=c
X
s=1;d=1
P (Y
s;d;z
=t)
sd
(4.30)
Figure 4.9 shows the result for a 3-community case of large input case with k1 =
12;k2 = 12;k3 = 12;n = 36. In the case that each of walker follows his own movement
transition pattern, which is a special case of multiple communities, gure 4.10 shows
the result for k = 3.
In general, we will assume a community of k walkers waking on a general con-
nected graph with n vertexes, and each of them follows his own transition pattern,
49
Figure 4.9: PET and IAET for a 3-communities network, k1=k2=k3=12
Figure 4.10: PET and IAET for a 3-communities network, k1=k2=k3=1
50
the complexity of the above procedure is O(Tk
2
jnj
2
). The time complexity for c
communities is O(Tck
2
jnj
2
).
51
Chapter 5
Area-Based Dissemination in Vehicular Networks
Fleet-wide information broadcast for both non-urgent and urgent data is one of the
critical building blocks for vehicular networks. Data transferred process between two
nodes is performed only when they are within contact of each other through epidemic
propagation using purely vehicle-to-vehicle (V2V) communications [57]. However,
precisely due to the sparsity, epidemic propagation can in fact be very slow process.
We consider the possibility of using in addition vehicle to infrastructure (V2I) commu-
nication (between vehicles and road-side units) to speed up the dissemination process.
We consider a hybrid V2V/V2I system, in which the infrastructure nodes deployed
in a city are organized into distinct sub-networks that we refer to as \areas." We
make the assumption that the communications within each area are organized so as
to enable very fast propagation of the content to all infrastructure nodes within that
The work in this chapter is based on [56].
52
area. Vehicles provide the functionality of connecting the areas together, by carrying
data between areas.
The information broadcast in this area-based dissemination model works as fol-
lows: initially, the content to be distributed is available on a vehicle or infrastructure
node within one area (in the former case, the information is rst sent from that
vehicle to the rst infrastructure node it encounters within the area). Once an infras-
tructure node in that area has the information, it is very rapidly propagated to all
infrastructure nodes and vehicles within that area and the information continues to
persist within the area. Now, when a vehicle from that area moves to a dierent area,
it communicates its information to other cars and infrastructure nodes it encounters
within that area. Again, the information from an infrastructure node in the new
area is propagated rapidly to all other infrastructure nodes within that area, and the
process continues. In this area-based approach to information dissemination, then,
the propagation within areas is very fast, and the latency is primarily dominated by
the time taken for vehicles to move from one area that has the content to reach a
new area that does not. We are essentially transforming the problem from epidemic
propagation between vehicles alone to one of epidemic propagation between areas
(with rates determined by the movement of vehicles between them).
Intuitively, the fewer the number of areas, the quicker the dissemination process
will be (since we assume rapid information propagation within the infrastructure for
each area). However, in practice, the number of areas will be determined by the cost
53
of establishing the V2I and I2I infrastructure for the sub-networks (initially, each
area may even consist of only a single or small number of proximate infrastructure
nodes, and slowly expanded, with the number of areas reducing as more infrastructure
nodes are deployed over time.) We set aside the issue of how the number of areas
should be determined as outside the scope of our investigations, but explore how the
propagation time varies with the number of areas and the number of cars. We also
examine how the location and structure of the areas aect the speed of information
dissemination.
5.1 Simulation methodology
All our simulations and evaluations in this work are based on a real data set consist-
ing of GPS traces of taxis in Beijing. The data trace contains 272253 GPS points
spanning from 39.7082
N to 40.0963
N in latitude, and from 116.2010
to 116.5983
.
We extracted critical input parameters for the model (transition probability matrix,
service) from the data set coordinates of 453 taxis in the high level trac period
from 09:00 am till 07:00 pm recorded approximately every minute in Beijing. The
input information is used to convert the model into a Jump Markov Process based
on equations 5.4 and 5.5 as the input parameters for our simulation.
The number of areas N takes value of 4, 9, 16, 25, 36, 49, 64 areas in the sim-
ulations. The number of cars M takes values of 200, 300, 400 and 453; we choose
random subsets of the total data set for the smaller numbers of cars.
54
Area Partitioning Schemes: First, we mark 500 major intersections in the
bounded area. The whole bounded region is divided into N areas, each area con-
taining many base stations in multiple major intersections in the same neighborhood
cooperating with each other. In the following simulations, we conduct experiments
based on 4 dierent ways of choosing the areas:
Grid placement Scheme (Figure 5.1)
Random Voronoi Scheme with random list of intersections [58] (Figure 5.2)
Density-based Voronoi Scheme: with the choices of intersections in the most
busiest areas (Figure 5.3)
K-means Scheme [59] (Figure 5.4)
For the grid placement scheme, the whole region is divided into a grid of cells
with equal edge length. For the next two schemes, given coordinates of the N major
intersections, Voronoi partition helps to decide the specic area (Voronoi cell) for
each vehicles in the real data trace at specic time. The only dierence is that the
major intersection is chosen randomly in the random Voronoi scheme but based on
densities of vehicles in the latter scheme. For the last scheme, onlyN is specied. The
standard K-means algorithm is used to partition the whole region into N clusters.
For a general framework, K-means algorithm can be used as an approach because
only N is required.
55
5.2 Problem Formulation
The whole region (say a city) is divided intoN areas each of which contains one major
intersection. Let the areas be numbered A
1
;A
2
;:::;A
N
. There are total M vehicles
in the system moving independently among areas in the system.
Vehicles stay in area A
n
for an exponentially distributed amount of time with an
average time period of 1=
n
and switch from areaA
m
to areaA
n
with probabilityp
mn
(m;n = 1; 2;:::;N). The number of cars in each area, denoted as W
1
;W
2
;W
3
;:::;W
N
(w
i
= 1; 2;:::;M), follows the steady state distribution
1
;
2
;
3
;:::;
N
correspond-
ingly. Let's consider the stable network in which vehicles in the system follow the
global balance equation:
X
j6=i
j
(p
ji
j) =
X
j6=i
i
(p
ij
i) (5.1)
The probability that there are k vehicles in region i:
P (W
i
=k) =
M
k
k
i
(1
i
)
Mk
(5.2)
Given that a vehicle has the message initially, the goal is to estimate the epidemic
time when the message is disseminated throughout the whole network. Vehicles move
around the system from one area to another. We assume that whenever an area
gets infected, all infrastructure nodes within the area get the message rapidly and
broadcast to all vehicles within the area from that time on (i.e. an \infected" area
stays infected). Whenever a vehicle carrying information from an infected area enters
a new non-infected area, it sends the piece of information to the nearest road side
56
unit / base station in that area, infecting that area rapidly
2
. This process can be
modeled as a Markov Process in which states are the infection status of all areas.
Let say I be the current state. There are currentlyL
I
infected areas in state I, let
S
I
be the set of infected areas. We already denoted W
i
as number of vehicles in area
i in the set S
I
. We can compute the joint vehicular distribution of all infected areas
for state I for a specic case W = (w
1
;w
2
;:::;w
L
I
) as:
I
(W ) =P (W
1
=w
1
;W
2
=w
2
;:::;W
L
I
=w
L
I
) =
L
I
Y
i=1
P (W
i
=w
i
) (5.3)
Here, w
i
can be any value of 0; 1;::;M. Let us denote all the possible combination
set of values for state I as M
L
I
.
As we consider the transition from state I to J, there will be one more newly
infected area, which means obviously L
J
=L
I
+ 1. Denote the new infected area as
s
J
. Denote the area from which the infection happens as s. Because there are L
I
infected areas in state I, area s
J
can be infected by an infected vehicle coming from
any areas in set S
I
. The rst infected car leaving the infected area s for the non-
infected area s
J
will spread the infection. Therefore, we can compute the transition
probability from state I to state J as:
P
IJ
=
X
W2M
L
I
I
(W = (w
1
;w
2
:::;w
L
I
))
X
s2S
I
(
s
w
s
P
k2S
I
(
k
w
k
)
P
s;s
J
)
(5.4)
In the above equation,
P
k2S
I
(
k
w
k
) is the rate of the rst vehicle in the whole region
leaving an area for another area, P
s;s
J
indicates the probability of the vehicle leaving
2
Forsimplicity,weconsiderthatthistimeisnegligible; howeveritisstraightforwardtoextendthe
model to allow for a non-negligible (possibly random) time for intra-area information propagation
57
area s for area s
J
. Knowing the area distribution of the system, we could compute
the mean sojourn time at state I:
T
I
=
X
W2M
L
I
I
(W = (w
1
;w
2
:::;w
L
I
))
1
P
k2S
I
(
k
w
k
)
(5.5)
E
I
is the expected propagation time starting from state I till it reaches the nal state,
and let say L is the number of all states in the problem.
An illustrative example is given in Figure 5.5. There are total of 4 areas. Figure
5.6 shows an example of a corresponding Jump Markov Chain diagram. Infection
status of each area in Figure 5.6 is marked as 1 (all of vehicles in the area have
the message) or 0 (there is at least one vehicle in the area which does not have the
information). For example, state 2 \1100" implies that area 1 and 2 are already
infected, but area 3 and 4 are not.
We can construct a linear equation system of L equations:
E
I
(1P
II
) =
X
I;J2L
I6=J
(T
I
+E
J
)P
IJ
(5.6)
If I
0
is the initial state where exactly one area is infected, the nal epidemic time is
given as:
EP =T
I
0
+E
I
0
(5.7)
From the theoretical analysis, the computational complexity of calculating the ex-
pressions in equation (5.4) and equation (5.5) areO(MN
3
),O(MN
2
) respectively.
The complexity for solving the linear equation system (5.6) isO(4
N
).
58
Table 5.1: Mean total propagation time (5 areas)
M Model sim. time Model cal. time Empirical time
200 cars 548 555 447
300 cars 341 348 302
400 cars 265 261 254
453 cars 229 227 198
5.3 Simulation results
Table 5.1 compares the mean total propagation time obtained from the model via ex-
act numerical calculations, via model-based simulation (both using tted parameters
from the Beijing data set) with the total propagation time obtained from directly
simulating the process on the empirical real data traces. As expected, the model-
based calculations and simulations match exactly (providing a further verication of
correctness of the analysis). The empirically obtained total propagation times are
found to be within 3 to 20% of the model-predictions in all cases.
We can also say from Figure 5.7 that increasing number of areas (N=42) and
number of cars (M=632) show clear improvement in correctness of the model.
Figure 5.8 shows the comparison results among 4 schemes with varying number of
divided areas. We can see clearly that following the density-based Voronoi placement
scheme obtains the best performance. The total propagation time for utilizing this
scheme is the least among all of the 4 presented schemes. Grid placement is much
worse compared to all other 3 schemes in which
ooding time for a given information
piece to reach all of the nodes in the network is much longer on all cases of number
of areas due to the fact that intersections in the Beijing area do not match the strict
59
architecture of grid placement, and the vehicle densities area also vary signicantly
through the grid cells. Moreover, it's also harder to predict the performance of this
scheme. The other 2 schemes have the average performance, even though the random-
based Voronoi placement seems to have a better performance in some cases.
There is a common trend for all schemes, as anticipated: the less the number of
areas, the less the total propagation time needed to spread the information throughout
the network. As mentioned before, in practice the number of areas will be determined
by extrinsic factors such as the cost of deploying and connecting the infrastructure
nodes in the V2I network.
Figure 5.9 shows the comparison results among 4 schemes with varying number
of cars selected from the data set. The more number of cars, the less time required
to
ood the information piece through the network. The reason is easy to see: the
more connectivity among vehicles in the network, the easier it is for one of the cars
to move from one area to another to transfer the message.
60
Figure 5.1: Grid placement partitioning scheme
Figure 5.2: Random Voronoi partitioning scheme
61
Figure 5.3: Density-based Voronoi partitioning scheme
Figure 5.4: K-Means partitioning scheme
62
Figure 5.5: Example network of 4 areas
Figure 5.6: Corresponding constructed Markov chain
63
Figure 5.7: Area Distribution (M=632, N=42)
64
Figure 5.8: Propagation time versus number of cars
65
Figure 5.9: Propagation time versus number of areas
66
Chapter 6
A Study of Contact Durations for Vehicle to
Vehicle Communications
Vehicular network (VANETs) has become an emerging area attracting extensive
research eort in the past few years. VANETs possess a dynamic structure of mobile
nodes communicating with each other through wireless links, self-organization to
form temporary topology and connectivity. In VANETs, it is challenging to create
an end-to-end path between any pair of nodes due to either extremely dynamic and
node-behavior dependent mobility or very sparse network architecture [3].
In these opportunistic VANETs, a contact (an encounter) between any pair of
nodes is generated when the pair is within communication range of each other. How-
ever, nodes in the networks keep moving in their own ways so communication links
among mobile nodes are on and o continuously; the pair will no longer be in contact
The work in this chapter is based on [60].
67
when one node moves out of the other node's vicinity area or its link quality
uctuates.
Therefore, there are hardly any existing complete path from sources to destinations,
and even discovered complete paths are very unstable and of short term duration.
In those VANETs whose links among nodes are generated highly dynamically and
intermittently, store-carry-forward techniques and opportunistic data dissemination
framework are introduced as a way to overcome intermittent connectivity, although
relative high delay is sometimes the cost to get information from one source to another
destination. Data transfer is performed by intermediate nodes over multi-hop routing
paths, in which a relaying node keeps the data locally, travels around the network
and transmits the message whenever it contacts with another node. The transfer-on-
contact process continues from node to node in the network until the message reaches
the destination.
Understanding how the contact process really works and knowing related en-
counter statistics can improve data distribution demand in terms of either reducing
the overall delay, saving relaying bandwidth, local storage buer and energy cost at
intermediate nodes or increasing data throughput and transfer reliability, etc. A pair
of nodes having higher meeting frequency and shorter inter-contact time [7, 8, 9]
compared to the others will have better opportunity to transmit data to each other.
Any nodes having higher encounter frequency with larger group of neighboring nodes
within a shorter time duration can become better candidates as intermediate nodes
for relaying data in routing algorithms. Any pair that has contact duration longer
68
than those of other pairs, which is a sign that contact link between them is more
stable and data transferred between them, is also more a reliable candidate to reach
a given destination successfully.
Among multiple crucial parameters of the contact process, contact duration is a
critical parameter [10, 11]. We consider in this work an intensive study about vehicular
contact duration based on the data obtained by Shanghai Jiao Tong University [12].
We make two sets of contributions:
We analyze the aggregated contact duration distribution based on the real taxi
data trace, and
We quantify the contact duration conditioned on dierent parameters including
time, location and vehicle directions.
The rest of this work is organized as follows: section 3.3 lists related work; section
6.1 gives an introduction about the data set as well as the methodology for the vehic-
ular contact duration study; section 6.2 shows observed fact and draws conclusions
about vehicle contact duration distribution and statistics and impact of time, location
and direction on vehicle contact. And nally, we present a concluding discussion in
section 6.3.
69
6.1 Dataset introduction and approach
methodology
6.1.1 Shanghai Dataset
SUVnet-Trace Data, obtained from Wireless Sensor Networks Lab (WnSN) Shanghai
Jiao Tong University includes GPS information of roughly 2400 taxis in Shanghai
city from January 31 to February 27th. The coverage of the area is 6340 km square.
Figure 6.1 shows the total region covered by the Shanghai data set.
We have studied contact duration for 10 weekdays from Feb 5 to Feb 16 at two
dierent time slots 7am-9am and 11am-1pm. The rst slot is the rush period while
the second one represents the normal hours. After dividing the whole region into
1kmx1km grid, we compute densities of all cells, and have picked the busiest cells
for further investigation. The chosen region is a rectangular grid of 3x3 1km-1km
cells and indicated by Figure 6.2. Area 1,4 and 7 are mainly parts of the Shanghai
Hongqiao International Airport. Area 5 and 6 covers the Shanghai Zoo. Area 2,3,8
and 9 are civil areas. We have made the standard assumption that the eective V2V
communication radio range is 300m using 802.11p [61].
6.1.2 Methodology
There are a few parameters which impact contact duration that we want to analyze,
including time, location and direction.
70
120 120.5 121 121.5 122
30
30.2
30.4
30.6
30.8
31
31.2
31.4
31.6
31.8
32
Figure 6.1: Shanghai Covered Region
Figure 6.2: Shanghai 3x3 grid
71
Analyzing the selected 3x3 cells area described in 6.1.1, we can collect all empirical
contact duration giving GPS and timestamp information for each cell within the
interesting period. In the study, we used two sample Kolmogorov Smirnov test and
two sample Anderson Darling test which are nonparametric tests for validation if the
two samples generated from the same distribution. Both of them are statistic tools
used to test whether two underlying one-dimensional probability distributions dier.
However, according to [19][20], Anderson-Darling test is used more reliably for smaller
number of samples while Kolmogorov{Smirnov test is suitable for larger number of
samples. Based on number of samples (larger or smaller than 10000), we performed
correspondingly suitable validation test to decide if any two empirical contact duration
set at dierent time, location or direction are generated from the same distribution
meaning that they share the same contact duration characteristics. Based on those
results, we quantied time, location and direction impact on contact duration by
estimating portion of which they have the same distribution given corresponding
conditions.
We will make the contact duration statistics dataset publicly available at
http://anrg.usc.edu/www/Downloads/
6.2 Factual observations
This section shows the observation of encounter duration statistics varying time,
location and direction conditions, and examines their impact.
72
6.2.1 Time
Figures 6.3, 6.4 describe the CCDF (complimentary cumulative density function)
result of aggregated contact duration at 9 areas over 10 days at 7am and 11am
correspondingly. Dierent colors represent dierent days. For both periods of week
days, we can see that even though there is little variation for larger duration, the
encounter duration information at the same area on dierent days tend to match
each other quite well, especially for smaller values. Moreover, dierent CCDF results
of the same area generally show the same maximum contact duration | long for Area
4, medium for Area 1,2,3,5,6 and quite short for Area 7,8,9; it appears that there are
stable trac
ow characteristics within a specic area even for dierent weekdays,
resulting in a stable contact duration pattern for each area.
For a particular region, Figure 6.5 shows mean value and condence interval for
contact duration distribution of the two dierent periods of time (7am-9am and 11am-
1pm) side by side. Besides the extreme variance peak on day 9 on the second time
period, there is not much variation in contact duration mean and condence interval
for all of the studied days. The duration mean ranges from 35 second to 50 second
for the rst period and from 50 seconds to 60 seconds for the latter one. This may
be attributed to higher (and hence slower) trac in the second period compared to
the rst time interval.
Furthermore, the bar charts in Figures 6.6 and 6.7 illustrate the changing of
encounter duration through 10 days for all areas. Following the same trend, in spite
73
of the peak mean at Area 9 on day 5, we can still see the steady mean and very little
variation across dierent days within the same area even for both periods. The only
exception case is Area 9 in which its mean contact duration almost triple around noon
compared to early morning. Area 9 appears to have higher trac due to high urban
road density Figure 6.2.
Finally, Table 6.1 shows an average of results from applying 95%-condence val-
idation test (Kolmogorov-Smirnov and Anderson Darling, depending on the number
of samples available in each case) which indicate whether the contact duration dis-
tributions are similar, over dierent pairs of days. Higher numbers indicate more
similar distributions for a given area across dierent pairs of days. There is little
dierence in portions of similarity between the two periods, except Area 5 covering
all the Shanghai Zoo, and Area 9 where there is a big gap.
Location 7am-9am 11am-1pm
Area 1 0.5333 0.5556
Area 2 0.2444 0.2667
Area 3 0.7333 0.5333
Area 4 0.0667 0
Area 5 0.5333 0.1778
Area 6 0.5111 0.3556
Area 7 0.5333 0.8
Area 8 0.7111 0.6667
Area 9 0.6222 0.3111
Table 6.1: Statistic result: portion of pair of days having similarity over 9 areas
In summary, we can give some conclusions:
74
Each area has its own contact distribution pattern on diferrent days. Dierent
days do not have large impact on contact distribution within the same area.
Some areas show a greater variation in the contact distribution over time than
others.
The variation in contact distribution can also depend on particular time of the
day
6.2.2 Location
Similarly, the bar charts in Figures 6.8 and 6.9 illustrate the changing of encounter
duration through 9 areas for all 9 days. In these gures, we can see clear variation
across dierent areas within the same day for both periods. The mean encounter
duration for Area 2 is much higher compared to all other areas over 9 days. The
only exception case is Area 9 in which its mean contact changes unpredictably high
on early morning of Feb 9th. Area 9 also has much higher contact duration mean
around noon. In all other areas, the mean value is roughly 50-60 seconds for the rst
period and 60-70 seconds for the latter one. There is not much variation in terms of
mean for neighboring areas like Area 1,3,4,5,6,7,8 and exception for Area 2 and 9.
Furthermore, Table 6.2 shows proportion of pairs of areas where a 95%-condence
KS and AD-test indicating that the contact duration distributions are identical.
Higher numbers indicate more similar distributions across areas on a given day. There
is little dierence in portions of similarity between the two periods, except Feb 14th.
75
However, portions of similarity among dierent locations are quite slow compared to
the calculated portions among days. Therefore, we can see a strong impact of location
on contact duration. There is low coherence in terms of contact duration at dierent
locations.
Day 7am 11am
Feb 5th 0.3889 0.1667
Feb 6th 0.2778 0.1389
Feb 7th 0.1944 0.2500
Feb 8th 0.1667 0.1667
Feb 9th 0.2222 0.3333
Feb 12th 0.3056 0.1667
Feb 13th 0.3333 0.2778
Feb 14th 0.4722 0.0833
Feb 15th 0.2778 0.2222
Feb 16th 0.2222 0.1389
Table 6.2: Statistic result: portion of 9 areas having similarity over 10 consecutive
days
In summary, we can give some conclusions:
Each area has its own contact distribution pattern. Dierent locations do have
large impact on contact distribution within the same day.
The variation of the mean contact duration across areas depends on the partic-
ular day.
6.2.3 Direction
Figures 6.10, 6.11 describe the CCDF (complimentary cumulative density function)
result of aggregated contact duration as well as contact distribution generated for
76
same direction, opposite direction and perpendicular direction at 9 areas over 10
days at 7am and 11am correspondingly. Vehicles travelling on dierent directions
East, West, South and North are identied and contact duration is estimated for
vehicles going on the same, opposite or perpendicular direction. For all areas and all
days in both time intervals, CCDF for dierent types of relative directions are quite
dierent. The CCDF generated from the same direction
ow of vehicles seems most
close to the aggregated CCDF compared to the other two types.
Furthermore, Table 6.3 gives statistics information (mean and standard deviation)
for aggregated contact duration as well as directed contact duration (same direction,
opposite direction and perpendicular direction).
Direction Mean-7am Std-7am Mean-11am Std-11am
Same 117.472 233.62 227.204 493.80
Opposite 92.9967 184.00 201.288 461.4
Perpendicular 51.767 83.845 102.857 290.66
Aggregated 73.5851 168.51 179.269 411.62
Table 6.3: Aggregated contact duration statistics
In summary, we can give some conclusions:
The contact duration distributions are sensitive to the pairwise direction of the
two vehicles contacting each other. The shortest contact duration is for vehicles
going in perpendicular direction to each other.
77
The contact durations corresponding to opposite direction contacts have the
least variation across days. The contact durations for same direction contact
have the most variation across days.
6.2.4 Parameter evaluation
Figure 6.13 and 6.12 plots the empirical PDF and CDF of contact duration while
the radio range is 300m for all areas on all days and 1 area (Area 5) on 1 day
correspondingly. In the case of 1 area, there exist two peaks around 20 seconds and
1 minute.
6.3 Conclusion
This is one of the rst studies in the literature to undertake a ne-grained analysis of
contact duration from a real set of vehicular traces. Besides presenting the aggregate
contact duration, we have examined and presented how contact durations vary with
area, time of day, dierent days, as well as relative pairwise vehicular directions. We
nd that the contact durations tend to be somewhat stable over dierent days, but
vary over areas. In general, the contact durations in the 11am-1pm time period are
longer than the contact durations observed from 7am-9am time period due to higher
(hence, slower) trac. The pairwise direction of vehicles is shown to have a signicant
impact on contact duration: vehicles going in the same direction tend to have the
78
longest contact duration, while vehicles going in a perpendicular direction to each
other were found to have the shortest contact duration.
79
10
0
10
5
10
−4
10
−2
10
0
Area 1
10
0
10
5
10
−4
10
−2
10
0
Area 2
10
0
10
5
10
−4
10
−2
10
0
Area 3
10
0
10
5
10
−4
10
−2
10
0
Area 4
10
0
10
5
10
−4
10
−2
10
0
Area 5
10
0
10
5
10
−4
10
−2
10
0
Area 6
10
0
10
5
10
−4
10
−2
10
0
Area 7
10
0
10
5
10
−4
10
−2
10
0
Area 8
Contact Duration CCDF 7am−9am
10
0
10
5
10
−4
10
−2
10
0
Area 9
Figure 6.3: Contact duration over 10 days at 9 areas - 7am
10
0
10
5
10
−4
10
−2
10
0
Area 1
10
0
10
5
10
−4
10
−2
10
0
Area 2
10
0
10
5
10
−4
10
−2
10
0
Area 3
10
0
10
5
10
−4
10
−2
10
0
Area 4
10
0
10
5
10
−4
10
−2
10
0
Area 5
10
0
10
5
10
−4
10
−2
10
0
Area 6
10
0
10
5
10
−4
10
−2
10
0
Area 7
10
0
10
5
10
−4
10
−2
10
0
Area 8
Contact Duration CCDF 11am−1pm
10
0
10
5
10
−4
10
−2
10
0
Area 9
Figure 6.4: Contact duration over 10 days at 9 areas - 11am
80
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
0
50
100
150
200
250
300
350
10 weekdays
contact(sec)
Contact duration of Area 5
7
7
7
7
7
7
7
7
7
7
11
11
11
11
11
11
11
11
11
11
Figure 6.5: Contact duration general statistics of area 5
81
160
320
480
1 2 3 4 5 6 7 8 910
Area 1
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 2
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 3
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 4
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 5
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 6
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 7
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 8
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 9
0
50
100
150
seconds
Contact Durations over 10 days at 9 areas (7am-9am)
Figure 6.6: Contact duration over 10 days at 9 areas - 7am
160
320
480
1 2 3 4 5 6 7 8 910
Area 1
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 2
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 3
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 4
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 5
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 6
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 7
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 8
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 910
Area 9
0
50
100
150
seconds
Contact Durations over 10 days at 9 areas (11am-1pm)
Figure 6.7: Contact duration over 10 days at 9 areas - 11am
82
160
320
480
1 2 3 4 5 6 7 8 9
Feb 5th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 6th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 7th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 8th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 9th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 12th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 13th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 14th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 15th
0
50
100
150
seconds
Contact Durations at 9 areas over 9days (7am-9am)
Figure 6.8: Contact duration at 9 areas over 10 days - 7am
160
320
480
1 2 3 4 5 6 7 8 9
Feb 5th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 6th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 7th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 8th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 9th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 12th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 13th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 14th
0
50
100
150
seconds
160
320
480
1 2 3 4 5 6 7 8 9
Feb 15th
0
50
100
150
seconds
Contact Durations at 9 areas over 9days (11am-1pm)
Figure 6.9: Contact duration at 9 areas over 10 days - 11am
83
Contact duration CCDF given direction− area 5 − 7 am
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 5
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 6
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 7
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 8
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 9
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 12
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 13
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 14
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 15
same direction
opposite direction
perpendicular
aggregate
Figure 6.10: CCDF of aggregated and directed contact duration - 7am
Contact duration CCDF given direction− area 5 − 11 am
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 5
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 6
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 7
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 8
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 9
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 12
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 13
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 14
same direction
opposite direction
perpendicular
aggregate
10
0
10
1
10
2
10
3
10
−3
10
−2
10
−1
10
0
Time (second)
CCDF
Feb 15
same direction
opposite direction
perpendicular
aggregate
Figure 6.11: CCDF of aggregated and directed contact duration -11am
84
−100 0 100 200 300 400 500
0
0.005
0.01
0.015
0.02
f(x)
Estimated Probability Density Function
−100 0 100 200 300 400 500
0
0.2
0.4
0.6
0.8
1
F(x)
Cumulative Distribution Function
x
mean(x) = 34.9967
var(x) = 1417.38
Figure 6.12: Aggregated contact duration at 1 area on 1 day
0 100 200 300 400 500 600 700 800
0
2
4
6
8
x 10
−3
f(x)
Estimated Probability Density Function
0 100 200 300 400 500 600 700 800
0
0.2
0.4
0.6
0.8
1
F(x)
Cumulative Distribution Function
x
mean(x) = 73.5851
var(x) = 28397
Figure 6.13: Aggregated contact duration for all areas on all days
85
Chapter 7
Enhanced Random Walk Model
Until now, we have used random walk model as an approach trying to capture the
contact process of the vehicular network. Theoretical verication has been done using
Matlab simulation in chapter 4. In this chapter, we would like to use SUMO, a well-
known Trac Simulator described in chapter 2, to see how it really ts the real world
dataset and compare the Random walk model to the traditional Origin destination
model.
7.1 Random Walk simulation on SUMO
7.1.1 Normal Random Walk
In the experiment, total number of vehicles are constant and obtained from the raw
trace. We only used the normal random walk with very simple inputs like total
number of vehicles, generated transition probability matrix. Initial locations for those
86
vehicles are generated randomly. Moreover, encounters at consecutive time slots are
still considered one encounter for the two vehicles. The RW model is built based
discrete time slots, which means vehicles make a movement decision in the beginning
of each time slot based on the extracted transition probability matrix. Any two
vehicles at the same location at the same time slot is considered encountering each
other. The time unit of an encounter is actually duration of the time slot. In order
to make the simulation as close as possible with the real scenario, encounters of the
same pair of vehicles on consecutive time slots are considered as one encounter.
10
1
10
2
10
3
10
4
10
5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Time slots
CCDF
Comparision (empirical versus Random Walk) Shanghai traces for 1 day
empirical−IVE
empirical−IAVE
RW−IVE−noBD
RW−IAVE−noBD
Figure 7.1: Normal RW simulation versus empirical data - Feb 05th
Figure 7.1 shows the comparison between those encounter distributions from the
raw data and the normal random walk. For IVE distribution, the RW curve lies above
the raw curve. For IAVE distribution, there is a large gap between the RW curve and
87
the raw curve. Even though the RW model somehow captures the trend of IVE and
IAVE statistics of the vehicular trac, the RW model seems to underestimate the
statistics. In the real world, more encounters are observed because in the random walk
model, a grid of 300mx300m cells are built based on the original map, in which only
cars sharing the same grid are considered meeting each other, which is a constraint
compared to the real scenario.
7.1.2 Birth Death Random Walk
In the experiment, birth and death states are introduced in the random walk model.
This idea comes from the fact that in the real scenario, cars tend to enter and leave
the whole region after nishing the trip. Therefore, initially, there is already a number
of existing cars in the simulation. Cars enter and move out of the simulation given
the birth death probability in the transition probability matrix.
Figure 7.2 shows the newly comparison of IVE and IAVE distributions between
birth death RW and empirical distributions. For IVE distribution, the RW curve is
shifted and lies under the raw curve. For IAVE distribution, the gap between the
RW curve and the raw curve is reduced, but the RW curve still lies above the raw
curve. In summary, adding birth death states does help to reduce the gap of IAVE.
Moreover, adding birth death states may help to reduce the cases that pairs of vehicles
only meet each other once or twice in normal RW model, which shifts the IVE curves
below the raw curve.
88
10
1
10
2
10
3
10
4
10
5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Time slots
CCDF
Comparision (empirical versus Random Walk) Shanghai traces for 1 day
empirical−IVE
empirical−IAVE
RW−IVE−BD
RW−IAVE−BD
Figure 7.2: Birth death RW simulation versus empirical data - Feb 05th
The gure 7.3 shows the comparison between the raw data and the birth death
random walk over multiple days (Feb 6th, Feb 12th, Feb 18th, Feb 23rd).
7.2 Conventional Origin Destination simulations
using SUMO
In chapter 2, the conventional trac model includes specication step of collecting
information of origin and destination pairs in the vehicular network. In order to
generate this type of network's information, we extracted origin and destination of
taxis in Shanghai trace based on occupancy status along with the departure time
when the taxis started the trip. All the information is provided as an input trip le
89
10
1
10
2
10
3
10
4
10
5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Time slots
CCDF
Comparision (empirical versus Random Walk) Shanghai traces − Day: Feb06
empirical−IVE
empirical−IAVE
RW−IVE−BD
RW−IAVE−BD
10
1
10
2
10
3
10
4
10
5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Time slots
CCDF
Comparision (empirical versus Random Walk) Shanghai traces − Day: Feb12
empirical−IVE
empirical−IAVE
RW−IVE−BD
RW−IAVE−BD
10
1
10
2
10
3
10
4
10
5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Time slots
CCDF
Comparision (empirical versus Random Walk) Shanghai traces − Day: Feb18
empirical−IVE
empirical−IAVE
RW−IVE−BD
RW−IAVE−BD
10
1
10
2
10
3
10
4
10
5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Time slots
CCDF
Comparision (empirical versus Random Walk) Shanghai traces − Day: Feb23
empirical−IVE
empirical−IAVE
RW−IVE−BD
RW−IAVE−BD
Figure 7.3: Birth death RW simulation versus empirical data over multiple days
to SUMO. There is also randomness in the above SUMO simulation because SUMO
allows all of the vehicles to choose their shortest path to travel from its origins to its
destinations. Consequently, other vehicular information like corresponding velocities,
accelerators, directions, arrival time cannot be controlled.
Figure 7.4 shows the comparison between the empirical encounter distribution
(inter-encounter and any-encounter) and the corresponding statistics obtained by the
simulation generated by SUMO. Even though there is randomness in all of the routes
and other vehicular parameters, we can still see a match between traditional OD
simulation generated by SUMO and raw statistics regarding IVE and IAVE.
90
Figure 7.4: OD based SUMO simulation versus empirical data
7.2.1 Improve the t between RW and ODE
7.2.2 Partition
In [62], by analysis of Shanghai mobility traces, instead of the homogeneous system,
the authors conrmed the existence of three constantly well-connected distinct social
communities in the data. Following their approach, using Louvain algorithm, we
decomposed the entire taxi
eet traces and partition it into 3 major communities,
estimated transition probability matrix and initial steady state distribution for each
of them. We then used those information as the input to the random walk simulation
and extracted the corresponding inter-encounter and inter-any encounter distribution.
91
The gure 7.5 plots the encounter information between empirical trace, homogeneous-
input random walk model (single community) and heterogeneous-input random walk
model (three communities).
10
1
10
2
10
3
10
4
10
5
Time slots
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
CCDF
Comparision of Shanghai traces on Feb 5th - 3 communities
empirical-IVE
empirical-IAVE
RW-IVE-bd - 3 communities
RW-IAVE-bd - 3 communities
RW-IVE-bd - 1 community
RW-IAVE-bd - 1 community
Figure 7.5: Encounter distribution Comparison for multiple communities
From the gure, for both inter-encounter and inter-any-encounter information,
we could see a better t for multiple heterogeneous transition probability matrixes
compared to homogeneous transition probability matrix with respect to the empirical
distribution.
92
7.2.2.1 Multiple transition probability matrixes
Another idea is to use multiple transition probability matrices trying to capture the
underlying movement nature of vehicles in the network in the whole day. The follow-
ing gure 7.6 shows the comparison by diving 24 hours into 2 intervals, each of 12
hours.
10
1
10
2
10
3
10
4
10
5
Time slots
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
CCDF
Comparision (empirical versus Random Walk) Shanghai traces - 2 Intervals
empirical-IVE
empirical-IAVE
RW-IVE-BD
RW-IAVE-BD
Figure 7.6: RW model (2 transition probability matrices) versus empirical data
93
Chapter 8
Conclusions
Along with the auto industry transform trends and vastly expansion of autonomous
vehicle projects in the recent years, we could see in the near future a connected and
intelligent vehicle generation which is integrated with wireless technologies, ecient
car to car communication, real time location services enabling possibility of exchang-
ing cheap, fast and reliable data and information among each other leading to a wide
range of applications available to enhance passengers' experience. Therefore, vehic-
ular networks and inter-vehicular communication have been of signicant interest to
researchers and companies.
In order to fully exploit the characteristics of opportunistic vehicular network, it
is necessary to understand the contact process among vehicles in the network. In
this thesis, dierent perspectives of the contact process include inter-contact inter-
vals, inter-any contact intervals and contact duration have been studied. We have
considered a recursive polynomial-time computation that yields the exact solution of
pairwise inter-encounter time distribution for a particular pair of random walkers, and
94
give an approximate computation that is also polynomial time for individual-to-any
inter-encounter time (i.e., the time between contacts of a particular walker with any
of the other walkers in the population). We have also provided a Markovian analyti-
cal model for the propagation of content via area-based dissemination. Furthermore,
how contact durations vary with area, time of day, dierent days, as well as relative
pairwise vehicular directions has been shown from numerical studies of Shanghai real
taxis trace. Inter-encounter statistics using the theoretical random walk model are
compared with the empirical information as well as the traditional trac forecasting
origin-destination model generated by SUMO (a veried trac simulator). Finally,
further techniques have been considered to reduce the gap between the real world and
the theoretical model.
95
Reference List
[1] \Car to car communication consortium." http://www.car-to-car.org.
[2] \Intelligent transportation system." http://www.its.dot.gov.
[3] A. Jamalipour and Y. Ma, Intermittently Connected Mobile Ad Hoc Networks:
From Routing to Content Distribution. Springer Science & Business Media, 2011.
[4] S. Al-Sultan, M. M. Al-Doori, A. H. Al-Bayatti, and H. Zedan, \A comprehen-
sive survey on vehicular ad hoc network," Journal of Network and Computer
Applications, vol. 37, pp. 380 { 392, 2014.
[5] T. Spyropoulos, A. Jindal, and K. Psounis, \An analytical study of fundamen-
tal mobility properties for encounter-based protocols," International Journal of
Autonomous and Adaptive Communications Systems, vol. 1, no. 1, pp. 4{40,
2008.
[6] \Sumo ocial website." http://www.dlr.de/ts/en/desktopdefault.aspx/
tabid-9883/16931_read-41000/.
[7] H. Zhu, M. Li, L. Fu, G. Xue, Y. Zhu, and L. M. Ni, \Impact of trac in
uxes:
Revealing exponential inter-contact time in urban vanets," IEEE Transactions
on Parallel and Distributed Systems, vol. 22, no. 8, pp. 1258{1266, 2011.
[8] J. C. C. D. R. G. Augustin Chaintreau, Pan Hui and J. Scott, \Impact of human
mobility on the design of opportunistic forwarding algorithms," 2006.
[9] X. Zhang, J. Kurose, B. N. Levine, D. Towsley, and H. Zhang, \Study of a
bus-based disruption-tolerant network: mobility modeling and impact on rout-
ing," in Proceedings of the 13th annual ACM international conference on Mobile
computing and networking, pp. 195{206, ACM, 2007.
[10] Y. Li, D. Jin, L. Zeng, and S. Chen, \Revealing patterns of opportunistic
contact durations and intervals for large scale urban vehicular mobility," in
2013 IEEE International Conference on Communications (ICC), pp. 1646{1650,
IEEE, 2013.
96
[11] X. Zhuo, Q. Li, W. Gao, G. Cao, and Y. Dai, \Contact duration aware data repli-
cation in delay tolerant networks," in 2011 19th IEEE International Conference
on Network Protocols, pp. 236{245, IEEE, 2011.
[12] \Shanghai jiao tong university taxi trace data." http://wirelesslab.sjtu.
edu.cn/taxi_trace_data.html.
[13] C. Schuette and P. Metzner, \Markov chains and jump processes," 2009.
[14] C. M. Grinstead and J. L. Snell, Introduction to Probability, Chapter 11. AMS,
2003.
[15] L. Lov asz, \Random walks on graphs," Combinatorics, Paul erdos is eighty,
vol. 2, pp. 1{46, 1993.
[16] K. Teknomo, \K-means clustering tutorials." http://people.revoledu.com/
kardi/tutorial/kMean/.
[17] \Kolmogorov-smirnov test tutorial 1." http://ocw.mit.edu/courses/
mathematics/18-443-statistics-for-applications-fall-2006/
lecture-notes/lecture14.pdf.
[18] \Kolmogorov-smirnov test tutorial 2." http://www.maths.qmul.ac.uk/
~
bb/
CTS_Chapter3_Students.pdf.
[19] S. Engmann and D. Cousineau, \Comparing distributions: the two-sample
anderson-darling test as an alternative to the kolmogorov-smirno test," Journal
of Applied Quantitative Methods, vol. 6, no. 3, pp. 1{17, 2011.
[20] A. N. Pettitt, \A two-sample anderson-darling rank statistic," Biometrika,
vol. 63, no. 1, pp. 161{168, 1976.
[21] \Fundamentals of transportation." http://en.wikibooks.org/wiki/
FundamentalsofTransportation/.
[22] D. Nguyen-Luong, \Trac forecasting models in the usa: Application to the
elaboration of regional transportation plans," 2000.
[23] M. Behrisch, L. Bieker, J. Erdmann, and D. Krajzewicz, \Sumo{simulation of
urban mobility: an overview," in Proceedings of SIMUL 2011, The Third Inter-
national Conference on Advances in System Simulation, ThinkMind, 2011.
[24] D. Krajzewicz, J. Erdmann, M. Behrisch, and L. Bieker, \Recent development
and applications of sumo{simulation of urban mobility," International Journal
On Advances in Systems and Measurements, vol. 5, no. 3&4, 2012.
97
[25] F. Bai and A. Helmy, \A survey of mobility models," Wireless Adhoc Networks.
University of Southern California, USA, vol. 206, p. 147, 2004.
[26] J. Harri, F. Filali, and C. Bonnet, \Mobility models for vehicular ad hoc net-
works: a survey and taxonomy," IEEE Communications Surveys & Tutorials,
vol. 11, no. 4, pp. 19{41, 2009.
[27] M. Shahzamal, M. Parvez, M. Zaman, and M. Hossain, \Mobility models for
delay tolerant network: A survey," International Journal of Wireless & Mobile
Networks, vol. 6, no. 4, p. 121, 2014.
[28] D. Karamshuk, C. Boldrini, M. Conti, and A. Passarella, \Spot: Representing
the social, spatial, and temporal dimensions of human mobility with a unifying
framework," Pervasive and Mobile Computing, vol. 11, pp. 19{40, 2014.
[29] D. Maltz, \Dynamic source routing in ad hoc wireless networks," Mobile Com-
puting, vol. 353, no. 1, pp. 153{181, 1996.
[30] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, \Ecient routing in inter-
mittently connected mobile networks: The single-copy case," IEEE/ACM Trans-
actions on Networking (ToN), vol. 16, no. 1, pp. 63{76, 2008.
[31] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, \Ecient routing in in-
termittently connected mobile networks: the multiple-copy case," IEEE/ACM
transactions on networking, vol. 16, no. 1, pp. 77{90, 2008.
[32] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, \Spray and wait: an ecient
routing scheme for intermittently connected mobile networks," in Proceedings of
the 2005 ACM SIGCOMM workshop on Delay-tolerant networking, pp. 252{259,
ACM, 2005.
[33] W. Hsu and A. Helmy, \On nodal encounter patterns in wireless lan traces,"
IEEE Transactions on Mobile Computing, vol. 9, no. 11, pp. 1563{1577, 2010.
[34] V. Conan, J. Leguay, and T. Friedman, \Characterizing pairwise inter-contact
patterns in delay tolerant networks," in Proceedings of the 1st international con-
ference on Autonomic computing and communication systems, p. 19, ICST (In-
stitute for Computer Sciences, Social-Informatics and Telecommunications En-
gineering), 2007.
[35] H. Cai and D. Y. Eun, \Crossing over the bounded domain: from exponential
to power-law inter-meeting time in manet," in Proceedings of the 13th annual
ACM international conference on Mobile computing and networking, pp. 159{
170, ACM, 2007.
98
[36] A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott, \Impact
of human mobility on opportunistic forwarding algorithms," IEEE Transactions
on Mobile Computing, vol. 6, no. 6, pp. 606{620, 2007.
[37] D. P. Sanders, \Exact encounter times for many random walkers on regular and
complex networks," Physical Review E, vol. 80, no. 3, p. 036119, 2009.
[38] Z. Kalay, \Eects of connement on the statistics of encounter times: exact
analytical results for random walks in a partitioned lattice," Journal of Physics
A: Mathematical and Theoretical, vol. 45, no. 21, p. 215001, 2012.
[39] C. Cooper, A. Frieze, and T. Radzik, \Multiple random walks and interacting
particle systems," in International Colloquium on Automata, Languages, and
Programming, pp. 399{410, Springer, 2009.
[40] J. P. Lavine, \The encounter probability for random walkers in a conned space,"
in MRS Proceedings, vol. 899, pp. 0899{N07, Cambridge Univ Press, 2005.
[41] A. Passarella and M. Conti, \Characterising aggregate inter-contact times in
heterogeneous opportunistic networks," in International Conference on Research
in Networking, pp. 301{313, Springer, 2011.
[42] F. Bai and A. Helmy, \A survey of mobility models," Wireless Adhoc Networks.
University of Southern California, USA, vol. 206, p. 147, 2004.
[43] S. Madi and H. Al-Qamzi, \A survey on realistic mobility models for vehicular
ad hoc networks (vanets)," in Networking, Sensing and Control (ICNSC), 2013
10th IEEE International Conference on, pp. 333{339, IEEE, 2013.
[44] D. Li, J. Zhou, J. Wang, and P. Yu, \Linking generation rate based on gauss-
markov mobility model for mobile ad hoc networks," in Networks Security, Wire-
less Communications and Trusted Computing, 2009. NSWCTC'09. International
Conference on, vol. 2, pp. 358{361, IEEE, 2009.
[45] J. Ariyakhajorn, P. Wannawilai, and C. Sathitwiriyawong, \A comparative study
of random waypoint and gauss-markov mobility models in the performance eval-
uation of manet," in 2006 International Symposium on Communications and
Information Technologies, pp. 894{899, IEEE, 2006.
[46] S. Bitam and A. Mellouk, \Markov-history based modeling for realistic mobility
of vehicles in vanets," in Vehicular Technology Conference (VTC Spring), 2013
IEEE 77th, pp. 1{5, IEEE, 2013.
[47] Y. Li, D. Jin, Z. Wang, P. Hui, L. Zeng, and S. Chen, \A markov jump process
model for urban vehicular mobility: modeling and applications," IEEE Transac-
tions on Mobile Computing, vol. 13, no. 9, pp. 1911{1926, 2014.
99
[48] R. Agarwal, V. Gauthier, M. Becker, T. Toukabrigunes, and H. A, \Large
scale model for information dissemination with device to device communication
using call details records," Computer Communications, vol. 59, pp. 1{11, 2015.
[49] J. Zhao, Y. Zhang, and G. Cao, \Data pouring and buering on the road: A new
data dissemination paradigm for vehicular ad hoc networks," IEEE transactions
on vehicular technology, vol. 56, no. 6, pp. 3266{3277, 2007.
[50] P. Patil and A. Gokhale, \Voronoi-based placement of road-side units to improve
dynamic resource management in vehicular ad hoc networks," in Collaboration
Technologies and Systems (CTS), 2013 International Conference on, pp. 389{
396, IEEE, 2013.
[51] P. Salvo, F. Cuomo, A. Baiocchi, and A. Bragagnini, \Road side unit coverage ex-
tension for data dissemination in vanets," in Wireless On-demand Network Sys-
tems and Services (WONS), 2012 9th Annual Conference on, pp. 47{50, IEEE,
2012.
[52] K.-H. Jung, W.-S. Lim, J.-P. Jeong, and Y.-J. Suh, \A link contact duration-
based routing protocol in delay-tolerant networks," Wireless networks, vol. 19,
no. 6, pp. 1299{1316, 2013.
[53] H.-J. Lee, J.-C. Nam, W.-K. Seo, Y.-Z. Cho, and S.-H. Lee, \Enhanced prophet
routing protocol that considers contact duration in dtns," in 2015 International
Conference on Information Networking (ICOIN), pp. 523{524, IEEE, 2015.
[54] C. Yu, Z. Tu, D. Yao, F. Lu, and H. Jin, \Probabilistic routing algorithm based
on contact duration and message redundancy in delay tolerant network," Inter-
national Journal of Communication Systems, 2015.
[55] Q. Nguyen and B. Krishnamachari, \Computing inter-encounter time distribu-
tions for multiple random walkers on graphs," Information Theory and Applica-
tions Workshop, 2017.
[56] Q. Nguyen and B. Krishnamachari, \Area-based dissemination in vehicular net-
works," Workshop on Cellular Ooading to Opportunistic Networks (MASS),
2014.
[57] F. Ye, S. Roy, and H. Wang, \Ecient data dissemination in vehicular ad hoc
networks," IEEE Journal on Selected Areas in Communications, vol. 30, no. 4,
pp. 769{779, 2012.
[58] A. Nocaj and U. Brandes, \Computing voronoi treemaps: Faster, simpler, and
resolution-independent," in Computer Graphics Forum, vol. 31, pp. 855{864,
Wiley Online Library, 2012.
100
[59] \Kmeans denition." http://www.mathworks.com/help/stats/kmeans.html.
[60] Q. Nguyen and B. Krishnamachari, \A study of contact durations for vehicle
to vehicle communications," International Symposium on Sensor Networks, Sys-
tems and Security, 2017.
[61] J. Yin, T. ElBatt, G. Yeung, B. Ryu, S. Habermas, H. Krishnan, and T. Talty,
\Performance evaluation of safety applications over dsrc vehicular ad hoc net-
works," in Proceedings of the 1st ACM international workshop on Vehicular ad
hoc networks, pp. 1{9, ACM, 2004.
[62] F. Bai, K. R. Moghadam, and B. Krishnamachari, \A tale of two cities x2014;
characterizing social community structures of
eet vehicles for modeling v2v
information dissemination," in 2015 12th Annual IEEE International Conference
on Sensing, Communication, and Networking (SECON), 2015.
101
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Backpressure delay enhancement for encounter-based mobile networks while sustaining throughput optimality
PDF
Understanding the characteristics of Internet traffic dynamics in wired and wireless networks
PDF
Data replication and scheduling for content availability in vehicular networks
PDF
Optimizing distributed storage in cloud environments
PDF
Efficient data collection in wireless sensor networks: modeling and algorithms
PDF
Data-driven optimization for indoor localization
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Differentially private learned models for location services
PDF
Functional connectivity analysis and network identification in the human brain
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
Realistic modeling of wireless communication graphs for the design of efficient sensor network routing protocols
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Brain connectivity in epilepsy
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Utilizing context and structure of reward functions to improve online learning in wireless networks
PDF
Spatiotemporal traffic forecasting in road networks
Asset Metadata
Creator
Nguyen, Quynh
(author)
Core Title
Modeling intermittently connected vehicular networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/06/2018
Defense Date
01/17/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
encounter distribution,intermittently connected vehicular networks,mobility modeling,OAI-PMH Harvest,random walk model,vehicular networks
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krishnamachari, Bhaskar (
committee chair
), Golubchik, Leana (
committee member
), Shahabi, Cyrus (
committee member
)
Creator Email
quynh.t.nguyen1711@gmail.com,quynhngu@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-161
Unique identifier
UC11672276
Identifier
etd-NguyenQuyn-6160.pdf (filename),usctheses-c89-161 (legacy record id)
Legacy Identifier
etd-NguyenQuyn-6160.pdf
Dmrecord
161
Document Type
Dissertation
Rights
Nguyen, Quynh
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
encounter distribution
intermittently connected vehicular networks
mobility modeling
random walk model
vehicular networks