Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 825 (2004)
(USC DC Other)
USC Computer Science Technical Reports, no. 825 (2004)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005 1121
Improving Lookup Latency in Distributed Hash
Table Systems Using Random Sampling
Hui Zhang, Student Member, IEEE, Ashish Goel, and Ramesh Govindan
Abstract—Distributed hash table (DHT) systems are an impor-
tant class of peer-to-peer routing infrastructures. They enable
scalable wide-area storage and retrieval of information, and will
support the rapid development of a wide variety of Internet-scale
applications ranging from naming systems and file systems to
application-layer multicast. DHT systems essentially build an
overlay network, but a path on the overlay between any two nodes
can be significantly different from the unicast path between those
two nodes on the underlying network. As such, the lookup latency
in these systems can be quite high and can adversely impact the
performance of applications built on top of such systems.
In this paper, we discuss a random sampling technique that in-
crementally improves lookup latency in DHT systems. Our sam-
pling can be implemented using information gleaned from lookups
traversing the overlay network. For this reason, we call our ap-
proach lookup-parasitic random sampling (LPRS). LPRS converges
quickly, and requires relatively few modifications to existing DHT
systems.
For idealized versions of DHT systems like Chord, Tapestry,
and Pastry, we analytically prove that LPRS can result in lookup
latencies proportional to the average unicast latency of the net-
work, provided the underlying physical topology has a power-law
latency expansion. We then validate this analysis by imple-
menting LPRS in the Chord simulator. Our simulations reveal
that LPRS-Chord exhibits a qualitatively better latency scaling
behavior relative to unmodified Chord. The overhead of LPRS is
one sample per lookup hop in the worst case.
Finally, we provide evidence which suggests that the Internet
router-level topology resembles power-law latency expansion. This
finding implies that LPRS has significant practical applicability as
a general latency reduction technique for many DHT systems. This
finding is also of independent interest since it might inform the de-
sign of latency-sensitive topology models for the Internet.
Index Terms—Distributed hash table (DHT), Internet topology,
latency expansion, latency stretch, peer-to-peer, random sampling,
randomized algorithm.
Manuscript received October 1, 2003; revised June 13, 2004; approved by
IEEE/ACM TRANSACTIONS ON NETWORKING Editor V . Padmanabhan. This
work was supported in part by the National Science Foundation under NSF
Grant CCR0126347 and NSF Career Grant 0133968. A more complete version
of this paper with detailed proofs, simulations, and survey of related work is
available in a companion technical report (Goel et al., Technical Report 04-822,
Computer Science Department, University of Southern California, 2004).
H. Zhang and R. Govindan are with the Department of Computer Science,
University of Southern California, Los Angeles, CA 90089 USA (e-mail:
huizhang@usc.edu; huizhang@enl.usc.edu; ramesh@usc.edu).
A. Goel was with the Department of Computer Science, University of
Southern California, Los Angeles, CA 90089 USA. He is now with the
Department of Management Science and Engineering, and the Department
of Computer Science, Stanford University, Stanford CA 94305 USA (e-mail:
ashishg@stanford.edu).
Digital Object Identifier 10.1109/TNET.2005.857106
I. INTRODUCTION
D
ISTRIBUTED hash table (DHT) systems [25], [31], [35],
[38] are a new class of peer-to-peer networks which pro-
vide routing infrastructures for scalable information storage and
retrieval. Such systems comprise a collection of nodes (usu-
ally end systems) organized in an overlay network. They sup-
port scalable and distributed storage and retrieval of
pairs on the overlay network. They do this by associating each
node in the network with a portion of the key space; all data
items whose keys fall into a node’s key space are stored at that
node. Given a key, retrieving the data item corresponding to the
key involves a lookup operation, viz., routing the retrieval re-
quest to the corresponding node. Different DHT systems differ
in the details of the routing strategy as well as in the organiza-
tion of the key space.
With the advent of this class of systems, researchers have pro-
posed a wide variety of applications, services, and infrastruc-
tures built on top of a DHT system. Examples of such proposals
include systems for wide-area distributed storage [9], [18], [22],
[32], naming [8], web caching [14], application-layer multicast
[6], [27], [40] event notification [7], indirection services [34],
and DoS attack prevention [17]. As such, DHT systems hold
great promise for the rapid deployment of Internet-scale appli-
cations and services.
However, because in most DHT systems a request will take
overlay hops in average ( : the network size), the
routing latency between two nodes on the overlay network can
be different from the unicast latency between those two nodes
on the underlying network. The ratio of these two quantities
is called the latency stretch (or, simply, the stretch). A DHT
network might have a large latency stretch; this happens if the
overlay network topology is not congruent with the underlying
IP topology. A single logical hop on the overlay network could
incur several hops in the underlying IP network. Large latency
stretch factors can clearly result in poor lookup performance,
and can adversely affect the performance of applications and
services that use DHT systems.
Sophisticated algorithms are known for designing DHT sys-
tems with low latency. Plaxton et al. [24] give an elegant algo-
rithm for a DHT system that achieves nearly optimal latency on
graphs that exhibit power-law expansion, while preserving the
scalable routing properties of the DHT system. However, a di-
rect implementation of this algorithm requires pairwise probing
between nodes to determine latencies, and is not likely to scale
to large overlays.
To improve latency stretch, existing DHT systems propose
different practical approaches that are closely tailored to their
1063-6692/$20.00 © 2005 IEEE
1122 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
routing strategies. For example, CAN [26] proposes a dis-
tributed binning scheme to construct an overlay topology which
closely resembles the underlying IP topology. Both Tapestry
[38] and Pastry [31] use similar approaches to efficiently and
approximately build the proximity-optimized routing tables
described in [24] during node arrival: a new node quickly
populates its routing table by copying appropriate routing table
rows from each node on the lookup path from some node ,
which is already in the system and near in terms of network
latency, to the node in the system with its ID closest to ;
then further optimizes each entry in its routing table. Besides,
Tapestry [30] enhances its network with a probabilistic location
algorithm based on attenuated bloom filter data structure. For
each request, a Tapestry node uses the probabilistic algorithm
to search every node within a small number of hops (hops on
the underlying topology) before forwarding the request to the
next overlay node. Pastry [5] proposes an efficient approach
to locate a nearby node in the system for a new joining node.
These approaches heuristically
1
improve latency stretch, in the
sense that they do not (yet, to our knowledge and based on
[13, Table I]) prove a bound on latency stretch for a general
network topology in the same way that Plaxton et al. [24] do. It
may well be possible that these heuristics (or others based on
Plaxton’s work) exhibit good latency stretch on real router-level
topologies.
In this paper, we propose a latency improvement technique
that has several desirable properties: it is simple, applicable to a
class of DHT systems, incrementally but rapidly improves their
latency performance, and can be theoretically shown to work
well on a large class of graphs.
Our technique can be applied to a class of DHT systems that
uses “geometric” routing (i.e., where the search space for the
target reduces by a constant factor after each hop. We define
the term search space more precisely in Section II). This class
includes Chord, Tapestry, and Pastry, but not CAN. In order to
improve the latency in these systems without radically altering
the routing scheme, the latency for each (or most) of these hops
must be improved. Thus, each node must have a pointer to a
good (i.e., low-latency) node in each geometrically decreasing
“range” of the key-space (we define the term range more pre-
cisely in Section III-A). One simple, and somewhat optimistic,
way for a node to obtain a good pointer for a range is to ran-
domly sample a small number of nodes from this range, mea-
sure the latency to each sampled node, and use the one with the
smallest latency as the pointer for this range. Surprisingly, this
simple scheme gives near-optimum latency stretch for a large
class of networks with a small number of samples. Most of this
paper is devoted to the description, analysis, and simulation of
this random sampling scheme.
We use Chord to illustrate how this idea may be efficiently
implemented in a practical DHT system. Each overlay node on
the path followed by a lookup request samples its distance to the
lookup target (of course, after the lookup has been successfully
resolved). In doing so, each node obtains a distance sample to
one of its key-space ranges, which it uses to update its routing
table. We call this technique lookup-parasitic random sampling
1
One exception is the approach in [13], which we discuss in Section VI.
(LPRS), since it piggybacks on lookup operations in Chord.
The Chord routing scheme remains largely unchanged. LPRS
incrementally obtains low-latency routing tables, unlike other
schemes which compute low-latency tables when a node joins
the overlay.
Next, we analyze the random sampling scheme for ideal-
izations of DHT systems which organize their key-space as a
cycle (like Chord) and those which organize their key-space
as a hypercube (like Pastry and Tapestry). We prove that if the
underlying physical network has an exponential expansion in
terms of latency, then no significant latency improvement is
possible without significantly changing the underlying routing
schemes. However, if the underlying physical network has a
power-law expansion in terms of latency, then we prove that the
random sampling scheme improves the average latency from
, which these systems achieve without any latency
improvement, to . Here, is the number of nodes in the
DHT system, and is the average latency between pairs of
nodes in the underlying physical network. Since is a lower
bound on the average latency, the simple random sampling
strategy achieves near-optimum average latency, after a fairly
small (poly-logarithmic) number of samples. Furthermore,
the first few samples offer most of the latency improvement
(Sections IV and V-F1); this indicates that the strategy would
work well in the presence of node dynamics, which is further
validated in Section V-F4.
We perform extensive simulations to evaluate the perfor-
mance improvement due to LPRS in realistic settings (as
opposed to the idealized setting in which we obtain our theo-
retical guarantees). We modified the SFS Chord simulator [41]
to implement LPRS. For simple topologies with power-law-
latency expansion (rings and meshes), the latency stretch of
LPRS-Chord (Chord augmented with LPRS) is quite small and
appears to be independent of network size, whereas the latency
stretch of Chord is much larger and appears to grow as the
logarithm of the network size. For topologies with exponential
latency expansion (PLRG [2] and random graphs [3]), the im-
provement due to LPRS is not as significant. We also simulate
the time-evolution of the latency stretch with LPRS-Chord,
and find that relatively few samples are required to bring the
overall system to an acceptable performance regime. Even with
relatively high levels of node dynamics, LPRS-Chord maintains
its low stretch performance.
To determine what latency expansion real-life networks have,
we measure the latency expansion of a router-level topology
gathered by mapping the Internet [12]. We use geographic
distance as an approximation for the propagation latency be-
tween two nodes. Our results suggest that the Internet exhibits
power-law expansion in terms of latency. Our simulations of
Chord and LPRS-Chord on Internet router-level topologies
show the same kind of qualitative improvement due to LPRS as
on rings and meshes. This indicates that random sampling has
significant practical applicability as a general latency reduction
technique for many DHT systems.
Our finding about the latency expansion properties of In-
ternet router-level graphs has independent interest. The Internet
is widely believed to have an exponential expansion in terms of
hop-count [23], and this has influenced the design of topology
GOEL et al.: IMPROVING LOOKUP LATENCY IN DISTRIBUTED HASH TABLE SYSTEMS USING RANDOM SAMPLING 1123
generators [36]. Future topology generators that incorporate
link latency will be impacted by our finding. Also, there are sev-
eral recent results [1], [16], [24] which improve routing/latency
performance in peer-to-peer systems under the assumption that
the underlying network has a power-law expansion in terms of
latency. Thus, our finding about the latency expansion of the
Internet validates not just the algorithms presented in our paper,
but an entire line of research.
The remainder of this paper is organized as follows. Section II
introduces Chord briefly. In Section III, we describe LPRS and
its implementation in Chord. Section IV presents theoretical re-
sults that relate the performance of random sampling to the un-
derlying topology. In Section V , we simulate the LPRS scheme
for different topologies, and also evaluate the latency expansion
of real-life networks. We discuss related work in Section VI, and
Section VII presents our conclusions.
II. CHORD
In this section, we briefly describe the Chord [35] DHT
system. From our perspective, Chord is a representative of the
class of DHT systems that use “geometric” routing (i.e., where
the search space for the target reduces by a constant factor
after each hop). As we discuss in Section III-E, our random
sampling scheme—LPRS—applies more generally to such
systems. However, for concreteness, we describe and evaluate
LPRS in the context of Chord.
Like all other DHT systems, Chord supports scalable storage
and retrieval of arbitrary pairs. To do this, Chord
assigns each overlay node in the network an -bit identifier
(called the node ID). This identifier can be chosen by hashing the
node’s address using a hash function such as SHA-1. Similarly,
each key is also assigned an -bit identifier (following [35], we
use key and identifier interchangeably). Chord uses consistent
hashing to assign keys to nodes. Each key is assigned to that
node in the overlay whose node ID is equal to the key identifier,
or follows it in the key space (the circle of numbers from 0 to
). That node is called the successor of the key. An im-
portant consequence of assigning node IDs using a random hash
function is that location on the Chord circle has no correlation
with the underlying physical topology.
The Chord protocol enables fast, yet scalable, mapping of
a key to its assigned node. It maintains at each node a finger
table having at most entries. The th entry in the table for a
node whose ID is contains the pointer to the first node, , that
succeeds by at least on the ring, where . Node
is called the th finger of node .
Suppose node wishes to lookup the node assigned to a key
(i.e., the successor node of ). To do this, node searches its
finger table for that node whose ID immediately precedes ,
and passes the lookup request to . then recursively
2
repeats
the same operation; at each step, the lookup request progres-
sively nears the successor of , and the search space, which is
the remaining key space between the current request holder and
the target key , shrinks quickly. At the end of this sequence, ’s
predecessor returns ’s identity (its IP address) to , completing
2
Chord lookups can also traverse the circle iteratively.
the lookup. Because of the way Chord’s finger table is con-
structed, the first hop of the lookup from covers (at least) half
the identifier space (clockwise) between and , and each suc-
cessive hop covers an exponentially decreasing part. It is easy
to see that the average number of hops for a lookup is
in an -node network. It is also easy to see that the total dis-
tance traversed between and the successor of on the overlay
network may be significantly longer than the unicast distance
between those two nodes.
Our brief description of Chord has omitted several details of
the Chord protocol, including procedures for constructing finger
tables when a node joins, and maintaining finger tables in the
face of node dynamics (the Chord stabilization procedure). The
interested reader is referred to [35] for these details.
III. LOOKUP-PARASITIC RANDOM SAMPLING IN CHORD
Routing in several DHT systems such as Chord, Pastry, and
Tapestry has an interesting property: the search space for the
key decreases by a constant factor (i.e., geometrically) after each
hop. In order to improve the latency without radically altering
the routing scheme, the latency for each (or most) of these hops
must be improved. Thus, each node must have a pointer to a
good (i.e., low-latency) node in each geometrically decreasing
“range” of the key-space. One simple, and somewhat optimistic,
way for a node to obtain a good pointer for a range is to ran-
domly sample a small number of nodes from this range, mea-
sure the latency to each sampled node, and use the one with the
smallest latency as the pointer for this range. Surprisingly, this
simple scheme gives almost-optimum latency stretch for a large
class of networks with a small number of samples. In this sec-
tion we first describe the above ideas in more detail, and then
present a particular realization of random sampling in Chord
that we call Lookup-Parasitic Random Sampling (LPRS). We
finish this section by discussing the application of LPRS to other
DHT systems.
A. Random Sampling for Latency Improvement
Let be the number of nodes in the network. Before pro-
ceeding, we need to introduce the following three terms:
• Range: For a given node in a Chord overlay with ID
(we will henceforth refer to such a node simply as node
), its th range is the interval of the key space defined as
), where .
• Latency expansion: Let denote the number of
nodes in the network that are within latency of . Infor-
mally, for , a family of graphs has a -power-law
latency expansion if grows (i.e., “expands”) pro-
portionally to , for all nodes . Some simple examples
of graph families with power-law latency expansion are
rings , lines , and meshes .
Informally, a family of graphs has exponential latency
expansion if grows proportionally to for some
constant . Formal definitions of power-law and
exponential latency expansion are given in the companion
technical report [39].
• Sampling: When we say node samples node , we mean
that measures the latency (e.g., by using ping) to .
1124 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
Depending on the measured value, may update its finger
table.
Now consider the th range of a node . Suppose there are
nodes whose IDs lie in this range. Because the IDs of the nodes
are chosen randomly, the latency of each of these nodes from
is likely to be random. The key idea behind random sampling is
that node picks a small sample from among the nodes, and
sets as its th successor (i.e., its th finger table entry) that node
from the sample which is closest to itself in terms of latency.
3
Whether this scheme succeeds, and if it does, how large the
sample needs to be, depends on the latency expansion character-
istics of the underlying physical topology. Intuitively, in a net-
work with exponential latency expansion, an overwhelming ma-
jority of the nodes will be very far from a node , and finding
the closest node from a small sample is unlikely to significantly
improve the latency of the overall scheme. However, in a net-
work with power-law latency expansion, a node only needs
to sample a small number of nodes from each range in order to
find a nearby node.
It turns out that it is possible to make these intuitions precise.
In a network with -power-law latency expansion, if each node
on an -node Chord overlay obtains uniform samples
from each of its ranges, the average distance traversed
by a lookup is where is the average latency between a
pair of nodes in the original graph (we prove this formally in
Section IV). In a network with exponential latency expansion,
no significant improvement can be obtained. For both classes of
networks, the original implementation of Chord would result in
average latency.
Two questions then remain:
• How does each node efficiently obtain samples
from each range?
• Do real networks have power-law latency expansion
characteristics?
We address the first question next. We discuss the second in
Section V , where we present evidence which suggests that real
networks resemble graphs with power-law latency expansion.
B. Lookup-Parasitic Random Sampling (LPRS)
Consider a lookup for key which starts at node . Let
be the target node for the lookup. Since the mapping of docu-
ments to nodes is random, node can consider to be a random
sample from the largest range (i.e., the entire network). Since
the search space decreases geometrically along the request path,
successive nodes in the path can consider node to be a random
sample from successively smaller ranges (We make these no-
tions precise in Section IV).
This suggests a simple technique to get fast random sampling
in terms of ranges for Chord nodes: when a request completes,
each node on the request path samples the target, and updates
its finger table accordingly. We call this technique Lookup-Par-
asitic Random Sampling (LPRS) since all information the
3
This changes the Chord invariant, in which the ith successor of a node n is
the first node on the Chord circle whose ID is equal or greater than n +2 .
As we show later (Section III-D), this does not alter the O (log n) behavior of
the average number of overlay hops for each lookup.
random sampling scheme needs is contained in the path that a
lookup request traverses. A naive, simpler strategy might have
been for just the source to sample the target. Such a scheme is
highly unlikely to result in any samples from the smaller ranges,
and hence would not lead to optimum latency improvement
with few samples.
C. LPRS-Chord
We use the term LPRS-Chord to denote Chord augmented
with LPRS. Four modifications to Chord are required in order
to implement LPRS-Chord.
First, LPRS-Chord needs additional information to be carried
in each lookup message. Specifically, each intermediate hop ap-
pends its IP address to a lookup message. This is a reasonable
implementation choice since each request takes hops
on the Chord overlay (and our modifications don’t break this
property, see below). When the lookup reaches its target, the
target informs each listed hop of its identity. Each intermediate
hop then sends one (or a small number) of pings to get a rea-
sonable estimate of the latency to the target. It is well known
that latency estimation using pings is not perfect (e.g., due to
routing asymmetry), but this methodology should suffice for us,
since we are not interested in making fine-grained latency dis-
tinctions. For example, it is less important for an LPRS-Chord
node to correctly pick the closer of two potential successors
the latency to whom is comparable (e.g., both are topologically
equidistant from the node), than to pick the closer of the two
potential successors the latency to whom differs significantly
(e.g., because one is on the East Coast and the other on the West
Coast).
Second, LPRS-Chord requires the use of recursive lookups,
rather than iterative lookups. Each node incrementally improves
its own finger table for low latency hops to successors. Consider
a lookup from source to target traversing through node .
Suppose ’s successor toward is . An iterative lookup would
have contact , obtain the identity of , and proceed thus. This
can invalidate our latency optimizations, since the latency from
to is clearly not optimized by LPRS-Chord. Only picks as
its successor for the appropriate range with a view to reducing
latency. For this reason, LPRS-Chord requires recursive lookups
(where passes on the request to , and continues the lookup
recursively).
Third, LPRS-Chord need a latency-sensitive replacement
scheme to update its finger table entries based on the samples
obtained. The following algorithm describes how a node up-
dates its finger table when it has a distance estimate for a Chord
node . This simplified description assumes, however, that each
node always maintains its predecessor and successor in the
Chord ring. This information is needed in Chord to enable the
join, leave, and stabilization algorithms to continue to work
correctly. The approach for maintaining this information in
LPRS-Chord is identical to that in Chord, and therefore requires
no change in the Chord join and leave algorithms.
1) A node maintains one finger table entry for each of its
ranges.
2) For node , finds the range it lies in.
GOEL et al.: IMPROVING LOOKUP LATENCY IN DISTRIBUTED HASH TABLE SYSTEMS USING RANDOM SAMPLING 1125
3) If is closer than the current th successor for node ,
make the new th successor of (In practice, an imple-
mentation might maintain several “candidate” -succes-
sors for fast fail-over. In this case, LPRS would replace
the furthest candidate successor.)
Finally, because the invariants in original Chord are no longer
maintained in LPRS-Chord, it needs a slightly modified stabi-
lization mechanism to rebuild finger tables in the face of node
dynamics. Instead of refreshing the first node in each range,
the stabilization algorithm in LPRS-Chord refreshes the known
nearest node in each range.
Due to space constraints, we omit the pseudocode for LPRS-
Chord from this paper. The interested reader is referred to the
companion technical report [39].
D. Discussion
There are three issues that need to be discussed with regard
to LPRS-Chord.
The first is that LPRS-Chord breaks Chord’s invariant in con-
structing finger table entries. Chord’s invariant is that in any
node ’s finger table, the th successor is the first node whose
ID is equal to or larger than . In LPRS-Chord, this in-
variant is no longer true. The th finger table entry points to some
“nearby” node whose ID lies in the interval .
However, the following theorem holds for LPRS-Chord:
Theorem 1: LPRS-Chord resolves a lookup in
hops on an -node overlay network.
Proof: Suppose that node is the origin of a lookup re-
quest for key , and that resides in ’s th range. Let be
the target of the lookup for . In Chord, the first hop from
toward would have been the Chord node with the lowest ID
among all Chord nodes in (i.e., Chord’s th
successor, assuming there is at least one node in that range). In
LPRS-Chord, that hop may not be feasible, since the finger entry
for the th range may point to a node that has an ID greater than
. In that case, the next hop for LPRS-Chord would be some
node in the th range of node ; this hop would cover at
least one-fourth of the distance to in the key space. Recursively,
each hop covers at least one-fourth of the remaining distance in
the key-space toward , resulting in hops.
In practice, as our simulations show (Section V),
LPRS-Chord’s average number of overlay hops is compa-
rable to or better than Chord’s.
A second issue is the particular choice of sampling strategy in
LPRS-Chord. Although we have described the strategy in which
each node on the lookup path samples the target, there are many
other conceivable sampling strategies, some of which may be
easier to implement in practice. We did a comparison of five
conceivable sampling strategies in the context of Chord, and
pointed out LPRS is the right strategy to achieve uniform sam-
pling
4
in terms of ranges (see the companion technical report
[39] for the details).
4
In Chord, when the base of the key space b is not chosen to be 2, LPRS might
not achieve strictly uniform sampling. However, we speculate that the skew of
sampling probability on each range will be within a constant given a fixed b.
For example, we can show that for a family of bases b =2 (m 1 is an
integer), the skew (defined as the ratio of the maximal sampling probability to
the minimal sampling probability in terms of ranges) is no more than 2. We do
not further explore this issue in this paper.
Finally, we note that a practical implementation may choose
not to sample on every request, trading off the time to con-
verge to a “good” latency for reduced sampling overhead. There
are several ways to do this. For example, an adaptive approach
might work as follows: A node initially samples the target of
every request, but when it notices its latency estimates to a par-
ticular range not improving significantly, it reduces the sampling
rate for that range. We do not discuss these implementation de-
tails in this paper.
E. Generalizations
Revisiting Section III-A, we see that the general property of
Chord that random sampling leverages is that the Chord routing
table leads to geometric routing (defined more formally in the
next section). The random sampling strategy should more gen-
erally apply to any DHT system that uses geometric routing.
Indeed, geometric routing forms the basis of our proofs about
the random sampling property (Section IV).
We observe that both Tapestry and Pastry use geometric
routing. In both of these schemes, a lookup for a key is routed
as follows: the lookup initiator finds the neighbor whose ID’s
most significant bit matches that of the key. This neighbor
looks at the next bit, and finds that neighbor whose ID matches
the corresponding bit in the key, and so on. The details of the
schemes vary slightly; in fact, the comparison is not bit-by-bit,
but digit-by-digit. In principle, however, this kind of forwarding
results in a logarithmic number of hops in the key space.
Thus, LPRS can be retrofitted into both Pastry and Tapestry
quite easily (Pastry and Tapestry implement a simpler heuristic
for latency reduction, which we discuss in Section VI). There is
one difference in detail, however, between Chord and these two
systems. In Chord, lookups traverse the key space in one direc-
tion around the Chord circle. Lookups in these other two DHT
systems can traverse the key space in either direction. Thus,
these systems can sample in both directions (see the companion
technical report [39] for the details).
Interestingly, CAN’s routing tables do not satisfy the geo-
metric property. CAN uses a -dimensional Cartesian coordi-
nate space and each node only maintains the pointers to its
neighbors. Thus, each hop on the CAN overlay traverses an ap-
proximately uniform distance in the key space. For this reason,
LPRS does not apply to CAN, which needs to use a different
strategy to effect latency reduction (Section VI).
IV. ANALYSIS
In this section, we formally analyze the performance of
random sampling for routing schemes such as Chord, Pastry,
and Tapestry. The proofs of all theorems in this section are
given in the companion technical report [39].
We model the underlying network as an undirected graph
with nodes and a latency function de-
fined on the set of links. For node denotes the loca-
tion of in the virtual name-space. For document , we will as-
sume that denotes the location, in the virtual name-space,
of the node where is stored. We extend the latency function to
all pairs of nodes by defining the latency between a pair of nodes
to be the smallest latency along a path between the nodes. For
1126 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
ease of exposition we will assume that . We will study
two kinds of systems:
• Hypercube systems: Here, the virtual name-space is struc-
tured as a -dimensional hypercube . Each point in this
space can be represented as a -bit number. We will as-
sume that is a one-one,
onto function chosen uniformly at random from the space
of all such functions. This is an idealization of systems
such as Pastry and Tapestry.
For two points , their bit-difference
is the most significant bit-position in which
they are different. For example,
, and .An
-range of node , denoted , is the set of all nodes
such that .
• Linear Systems: Here, the virtual name space is struc-
tured as a a cycle of length . We will assume that
is a one-one, onto function
chosen uniformly at random from the space of all such
functions. This is an idealization of systems such as
Chord.
The difference from point to point in the virtual
name-space, denoted is .
An -range of node , denoted , is the set of all nodes
such that .
Geometric routes: A route from to is geometric if it is of
the form , where
1) , and
2) for any
if the virtual name-space
is a hypercube, and
for some constant ,if
the virtual name space is linear.
The intuition behind this definition is quite simple. If the des-
tination node lies in the -range of a node , then it lies in the
th or smaller range of the node which occurs just after
in a geometric route. Since the size of an -range is expo-
nential in , the definition of geometric routes ensures that the
“search-space” for the destination node decreases at least geo-
metrically, i.e., by at least a constant factor, after each successive
hop.
Frugal routing: A DHT routing scheme is said to be frugal
if it
1) uses only geometric routes, and
2) if is an intermediate node in the route, is the destina-
tion, and , then the node after in the route
depends only on and .
Note that when a request is issued for locating node , all that is
known to the source and any intermediate nodes is . Still,
this is sufficient for a source or intermediate node to determine
or and hence determine the value
such that . In this section, we will only study DHT
routing schemes that are frugal. This includes Chord, Pastry, and
Tapestry.
Let denote the diameter of graph
. Let denote the set of all nodes such
that .
A. Bounds on Latency of Frugal Schemes
We will study the latency of a lookup request made by an ar-
bitrary node to a document chosen uniformly at random. Recall
that is a random function, and hence making a request to a
random document is the same as making a request to a random
node. Further, there is no relation between a node’s position
in the physical topology and its position in the virtual overlay
topology.
Theorem 2: For any frugal DHT scheme, the maximum la-
tency of a request is .
Theorem 3: If is drawn from a family of graphs with ex-
ponential latency expansion, then the expected latency of any
frugal DHT scheme is .
Theorems 2 and 3 together imply that if a frugal DHT scheme
is intended to run over graphs with exponential latency expan-
sion, then any optimizations we perform can only yield a con-
stant factor improvement. In contrast, if is drawn from a
family of graphs with -power-law latency expansion, then there
exist frugal DHT routing schemes which have expected latency
. We will now present one such scheme, based on simple
random sampling, and analyze the resulting latency.
B. The Random Sampling Algorithm
We will present and analyze our algorithm only for the case
when the virtual space is organized as a hypercube. When the
virtual space is linear, the algorithm needs to take into account
the fact that the routing becomes “uni-directional.” However, the
same analysis holds with minor technical differences; a detailed
discussion is omitted for ease of exposition.
The random sampling algorithm is described in Fig. 1. The
parameter , the number of random samples to be taken from
each range, is given as an input to the algorithm. The routing
scheme in Fig. 1 is clearly frugal. The following theorem ana-
lyzes the expected latency.
Theorem 4: If we follow the random sampling algorithm
above, and is drawn from a family of graphs with -power-law
latency expansion, then the expected latency of a request is
.
C. Discussion
While the above analysis is involved, it conveys a simple
message: if the underlying network has -power-law latency
expansion, then for , a simple random sampling
scheme results in expected latency. For example, if
and , each node need only sample 0.8% of
the nodes for the network to achieve asymptotically optimal la-
tency performance. This is in stark contrast to Theorem 3 for
networks with exponential latency expansion, and motivates the
question of whether real-life networks demonstrate power-law
or exponential latency expansion; this question is addressed in
Section V .
The following additional observations are also worth making:
1) The Proof of Theorem 4 would not go through if the
random sampling algorithm simply obtained
random samples from as opposed to getting samples
from each of the ranges. Intuitively, for geometric
routing to result in good performance, we need to do
GOEL et al.: IMPROVING LOOKUP LATENCY IN DISTRIBUTED HASH TABLE SYSTEMS USING RANDOM SAMPLING 1127
Fig. 1. Random Sampling Algorithm ( G; ID;J ).
“geometric sampling”. This issue is further explored in
the the companion technical report [39].
2) The above algorithm and the analysis do not assume
anything about the mechanism for obtaining the samples.
This gives us significant flexibility: the samples can be
obtained either upon network creation/node joining or
by piggybacking over normal DHT functionality, as in
Section III.
3) Let . Then
Thus, the improvement in the guarantee offered by The-
orem 4 is most significant for the first few samples. This
is supported by our simulations (Section V-F1).
4) Let denote the average latency between pairs of nodes
in the network. For graphs with -power-law latency ex-
pansion as well as for graphs with exponential latency ex-
pansion, . Therefore, all the theorems in this
section could have been stated in terms of as opposed
to .
V. SIMULATION RESULTS
In this section, we evaluate the performance of LPRS-Chord
through simulation. The goal of this evaluation is to validate the
analytical results of Section IV . That analysis assumed ideal-
ized models of Chord (and Pastry and Tapestry, for that matter),
and our validation ensures that those models have not omitted
important details of the Chord protocol. Furthermore, our simu-
lations help quantify the performance of LPRS-Chord vis-a-vis
unmodified Chord; our analysis only describes the asymptotic
behavior of these two systems.
We also address one important issue that determines the prac-
tical applicability of this work: What are the expansion charac-
teristics of real router-level topologies?
A. Simulation Framework
We modified the SFS Chord simulator [41] to implement
LPRS. The Chord simulator is event-driven and simulates
insertion and deletion of “documents” into a Chord network.
It also simulates Chord protocol actions during node joins
and leaves, and the Chord stabilization actions. In the Chord
simulator, the default value of is 24 and the default routing
table size is 40.
Our modifications to the Chord simulator are essentially as
described in Section III-C. First, we changed Chord’s finger
table maintenance algorithms to implement random sampling.
Next, we modified each document retrieval request (i.e., the
request in Chord simulator) to use recursive lookups
(the default mode in the simulator is to use iterative lookups).
Finally, we attached the addresses of each intermediate hop to
the request message. We then modified the procedure
to return to each intermediate node the target’s address, so that
each node can implement the sampling algorithms described in
Section III-B. We also implemented the new stabilization mech-
anism, and we use it to evaluate the impact of node dynamics on
LPRS-Chord (Section V-F4).
B. Simulation Methodology
In all our experiments (except as noted in Section V-F4), the
input to the simulator is a network topology of size . This
topology represents the underlying physical connectivity be-
tween nodes (except when we simulate Chord over an Internet
router-level topology; we discuss our methodology for this in
Section V-E). For simplicity, we assume that all nodes in this
topology participate in the Chord overlay.
Given a network topology, a simulation run for LPRS-Chord
consists of three distinct phases.
In the first phase, nodes join the network one-by-one.
During this phase, we use the Chord join algorithms to build
the finger tables. At the end of this phase, then, the finger tables
are exactly what unmodified Chord would have built (i.e., they
maintain the Chord invariant). This allows us to measure how
much of a latency improvement LPRS-Chord obtains over
unmodified Chord.
In the second phase, each node on average inserts four
documents into the network. In principle, during this phase we
could have enabled the sampling and finger table replacement
in LPRS-Chord, since document insertion involves a lookup
request. However, for the ease of understanding our results, we
chose to enable these sampling actions only in the next phase.
In the third phase, each node generates, on average
requests. One request is generated on
each simulation clock tick. In most of our simulations, the
targets for these requests are chosen uniformly. However, we
also discuss the impact of a Zipf-ian document popularity on
LPRS-Chord.
C. Performance Metrics
The focus of our paper has been on improving the lookup
latency in Chord and other DHT systems. Our primary metric,
then, is:
1128 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
Fig. 2. Stretch of ring and mesh.
• average latency stretch: (henceforth, just stretch) the ratio
of the average latency for each lookup to the average
latency between each pair of nodes on the underlying
topology.
In some places, we discuss two related metrics:
• hops: the average number of hops that each request takes
on the overlay, and
• hop-reach: the average latency on the underlying network
incurred by each hop on the overlay network.
We introduce these two metrics in order to dissect LPRS-Chord
and to more clearly understand its contributions to latency
improvements.
D. Impact of Topology
Our first set of simulations investigates the performance of
LPRS-Chord for different underlying physical topologies. We
do this because Section IV indicates that the performance of
LPRS-Chord depends crucially on the topology.
Specifically, we simulate LPRS-Chord and unmodified
Chord over the ring, mesh, random graph, and power-law
random graph topologies, for a variety of network sizes ranging
from 100 to 6400. For each network size, we compute the
stretch, hops and hop-reach metrics. For LPRS-Chord we
compute these metrics when each node has performed
lookup requests in the third phase of the simulation run. We
conducted three simulation runs for each topology size. We
note that the computed stretch from these three runs varies
very little. Therefore, we believe that more statistically robust
experiments will not invalidate our conclusions.
1) Topologies With Power-Law Expansion: In this first set
of experiments, we consider the ring and the mesh (or grid)
topologies. In these topologies, we assume that each link has
unit latency. These topologies are known to have power-law ex-
pansion, with for the ring, and for the mesh.
Fig. 2 plots the stretch of the ring and the mesh as a func-
tion of topology size. To understand these graphs, recall that the
analysis of Section IV predicts that the latency in LPRS-Chord
is proportional to the average path length. This corresponds to a
latency stretch that is independent of network size. Indeed, that
is what we observe in these two graphs. For both the ring and
Fig. 3. Ring hops.
Fig. 4. Ring reach.
the mesh, the stretch is close to three across the range of topolo-
gies we consider. Note that LPRS is qualitatively different from
unmodified Chord. As predicted by our analysis, the stretch of
unmodified Chord increases logarithmically with topology size,
and is independent of the underlying topologies.
There are two ways to improve stretch: reducing the number
of hops a lookup traverses on the overlay, and reducing the phys-
ical distance traversed per overlay hop. To better understand
where LPRS-Chord’s performance improvements are coming
from, we examine our two other metrics for the ring (the results
for the mesh are qualitatively similar, and therefore omitted)
topology: hops and hop-reach.
Figs. 3 and 4 plot these two quantities for LPRS-Chord and
unmodified Chord. LPRS-Chord explicitly targets the reduction
in hop-reach, by trying to find physically nearby successors. It
is therefore not surprising that LPRS-Chord’s hop-reach is sig-
nificantly lower than that of original Chord.
What is interesting is that LPRS-Chord noticeably improves
upon Chord’s hops performance. LPRS-Chord does not explic-
itly target this in its design. In fact, LPRS-Chord breaks Chord’s
invariant that the finger for the th range is the first node whose
ID is equal to or larger than . This invariant is suffi-
cient (but not necessary—see Theorem 1 for example) to en-
sure Chord’s performance bound. In Section III-D,
GOEL et al.: IMPROVING LOOKUP LATENCY IN DISTRIBUTED HASH TABLE SYSTEMS USING RANDOM SAMPLING 1129
Fig. 5. Random stretch.
Fig. 6. PLRG stretch.
we claim even with this design change LPRS-Chord can resolve
lookups using hops on the overlay. Fig. 3 validates
this claim.
56
2) Topologies With Exponential Expansion: Our second set
of experiments simulates Chord over two families of topolo-
gies known to have exponential expansion: the classical random
graph [3], and the power-law random graph (PLRG) [2]. In these
topologies as well, each link has unit latency. In our random
graphs, each node has a uniform probability of connecting
to another node. Our power-law random graphs were generated
with a parameter of 0.7.
Figs. 5 and 6 describe the stretch of LPRS-Chord and un-
modified Chord as a function of topology size. Notice that
even for these topologies, LPRS-Chord has at least a factor of
two smaller latency stretch compared to unmodified Chord.
However, in these topologies, the performance improvement is
5
In fact, Fig. 3 suggests that LPRS-Chord noticeably improves upon Chord’s
hops performance. We believe this improvement is due to some CHORD-SFS
simulator quirks; please refer to the companion technical report [39] for the
details.
6
It might be tempting to assume that if Chord could be modified to focus
on hop-count reduction, it would exhibit latency stretch comparable to LPRS-
Chord. However, since Chord does not make any routing decisions on the basis
of latency, each of the (lo g N ) hops must still be of expected length ( L),
resulting in a latency of ( L log N ).
markedly less dramatic; as topology size increases, the stretch
of LPRS-Chord also increases. In fact, we have verified that
this performance improvement comes almost entirely from
LPRS-Chord using fewer overlay hops, and not from the reduc-
tion in hop-reach derived by random sampling. We omit those
graphs for brevity.
These results also validate another analytical result from Sec-
tion IV . We argued there that in graphs with exponential expan-
sion, random sampling is not likely to result in orders-of-mag-
nitude latency improvement.
E. LPRS-Chord and the Internet Router-Level Topology
Our analysis and simulations suggest that on graphs with
power-law expansion, LPRS-Chord exhibits a qualitatively
different stretch scaling from that of unmodified Chord. On
graphs with exponential expansion, however, LPRS-Chord
provides some, but not as dramatic, improvement. The applica-
bility of our result to real-world DHT systems, then, seems to
rest on the following question: what are the latency expansion
characteristics of the Internet router-level topology?
The literature on topology modeling and characterization has
generally assumed that real Internet router-level topologies have
an exponential expansion [23].
7
In fact, recent work [36] has
shown that the PLRG is a reasonable model for the Internet
router-level topology, particularly when attempting to evaluate
large-scale metrics.
However, our simulations show that LPRS-Chord’s perfor-
mance over the PLRG topology is good, but does not repre-
sent a compelling improvement over unmodified Chord in terms
of hop-reach. Does this imply that LPRS is only marginally
useful? No. Previous studies have defined expansion in terms of
router-level hops. We conjecture that when expansion is defined
in terms of latency (as in Section III), the Internet router-level
topology exhibits power-law expansion.
We used a large router-level topology dataset to validate this
conjecture. This dataset was gathered using a methodology sim-
ilar to prior work on topology discovery: traceroutes from sev-
eral locations (in this case, six nodes on the NIMI infrastructure)
to random destinations chosen from BGP routing tables. Router
interfaces were disambiguated using techniques described in
[12].
How do we measure latency on an Internet router-level
topology? We argue that, for the purpose of improving the
performance of DHT systems, the propagation latency between
two nodes is the component that any latency improvement
scheme should attempt to adapt to. The expansion character-
istic of the Internet router-level topology that we are interested
in, then, is the propagation latency expansion.
To measure the propagation latency expansion of the Internet
router-level topology, we make the following simplifying as-
sumption: the propagation latency between any two nodes on
the router-level topology is well approximated by the accumu-
lated geographic distance of the path between the two nodes in
shortest path routing. This particular assumption has not been
validated by any measurement study. While prior work [21] has
7
There is some debate about this. Earlier work, based on a smaller dataset
had claimed that the expansion of router-level topologies better matched a
power-law [11].
1130 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
Fig. 7. Latency expansion.
clearly pointed out that the direct geographic distance between
two network nodes is by no means the only factor in deciding
their network distance, we used the accumulated geographical
distance of the routing path to more carefully approximate the
propagation latency. The one exception to this is the case when
a link in the router-level topology corresponds to a link-layer
tunnel or circuit that is actually routed through one or more ge-
ographic locations before reaching the end of the link. At the
very least, the geographic distance can be used to establish a
lower bound on the propagation latency for such a link.
To assign geographic location to nodes in our router-level
topology, we used the Geotrack tool [21]. This tool uses heuris-
tics based on DNS names to infer node locations. In many cases,
Geotrack cannot assign locations to nodes. In these cases, a node
was assigned the location of the topologically nearest node for
which Geotrack could assign a location (if there was more than
one such neighbor, we randomly picked one).
Having assigned geo-locations to all nodes, we could com-
pute the latency expansion properties of the Internet router-level
graph. However, our router-level topology contained upwards
of 320 000 nodes that were mapped to 603 distinct cities. To
tractably compute the latency expansion of this rather large
topology we randomly sampled 92 824 node pairs and com-
puted the geographic distance between them.
8
Before we present our results, we mention that our proce-
dure is approximate for several reasons. The incompleteness of
topology discovery and interface disambiguation (or alias res-
olution) methods is well documented [12], [33]. Geo-location
techniques such as Geotrack are also approximate; in particular,
our approximation of placing un-resolvable nodes near resolv-
able ones can underestimate the actual geographic distance be-
tween two nodes. Despite this, we believe our general conclu-
sions from this experiment will hold up to scrutiny, since we
have been careful to make qualitative judgments from a very
large dataset.
Fig. 7 plots the latency expansion of the router-level graph.
It includes, for calibration, the hop-count expansions of the
8
We also attempted other sampling techniques: One-to-all distance measure-
ment for up to 1000 nodes, and full connection distance measurement for large
subgraphs up to 6400 nodes. All the expansion curves converged to one another
as the number of sampled nodes increased.
Fig. 8. Stretch on the router-level graph.
ring and PLRG topologies, as well as the hop-count expansion
of the router-level graph itself. We immediately make the fol-
lowing observations. The hop-count expansion of the Internet
router-level graph resembles that of the PLRG and can be
said to be exponential. However, the latency expansion of the
Internet router-level graph is significantly more gradual than
its hop-count counterpart and more closely matches the ring
topology.
We believe, then, that the Internet latency expansion more
closely resembles a power-law. That is, the router-level graph
looks more “mesh”-like when the distances of links are
considered.
This would argue that LPRS-Chord is a very desirable la-
tency improvement scheme for wide-area DHT infrastructures.
In fact, we have verified this by simulating LPRS-Chord on sub-
graphs of the router-level topology. We sampled several sub-
graphs of different sizes. Nodes in these subgraphs consist of
“edge” nodes in the router-level graphs (arbitrarily defined as
those with degree one or two). Each pair of nodes on these sub-
graphs is connected by a link whose latency is the geographic
distance between those two in the underlying topology.
On each subgraph, we ran both Chord and LPRS-Chord, as-
suming that each node on the subgraph was part of the Chord
overlay. Fig. 8 depicts the stretch for the two schemes as a func-
tion of network size. We note the striking similarity of this graph
and Fig. 2. In particular, the stretch of LPRS-Chord is essentially
independent of network size, and below three, whereas that of
unmodified Chord increases with network size.
F . Other Issues
This section addresses four aspects of LPRS-Chord’s per-
formance that we have not validated or explored thus far. First,
LPRS-Chord incrementally improves latency using lookups;
we discuss the convergence time properties of LPRS-Chord.
Second, random sampling in LPRS-Chord incurs extra over-
head for lookups. We compare it with the overhead of recursive
lookups. Third, our analysis of LPRS-Chord has assumed a
uniform distribution of requests to targets. In practice, targets
are likely to be Zipf-ian, and we explore the implications of
this for LPRS-Chord through simulation. Last, the routing
performance of LPRS-Chord could get deteriorated when the
GOEL et al.: IMPROVING LOOKUP LATENCY IN DISTRIBUTED HASH TABLE SYSTEMS USING RANDOM SAMPLING 1131
Fig. 9. Convergence time.
frequency of node joining, leaving, and failure is high. We study
the performance of LPRS-Chord under high node dynamics.
1) Convergence Time: How many lookups does it take be-
fore LPRS-Chord converges to a low stretch value on graphs
with power-law expansion? To understand this, Fig. 9 plots the
performance evolution of LPRS-Chord over time for network
sizes 900 and 6400 on the ring topology (for the mesh and the In-
ternet graph, the results are similar and omitted). In these graphs,
the x-axis describes the number of requests generated
per node and normalized by (10 for 900 nodes and 13
for 6400 nodes). We calculated average latency stretch every
requests.
This figure validates our contention in Section IV that LPRS-
Chord is fast. When each node has generated on average
samples (about 39 requests in the 6400 node case), the net-
work has converged to its eventual stretch value of slightly under
three. Furthermore, even with a few samples, the latency stretch
improvements are quite dramatic. After samples per node
the latency stretch is within 30% of the final stretch, and after
samples the improvement is within 10%. This bears out
our analysis in Section IV: relatively few samples are needed
for latency improvements, and the improvement from the initial
samples is quite dramatic.
2) Overhead of LPRS: In LPRS-Chord, each node along the
path of a lookup might sample its latency to the target. In a naive
implementation, this would result in an overhead of
(PING) operations per lookup.
In practice, the number of lookups is significantly lower than
this theoretical limit. Fig. 10 shows the overhead of LPRS, mea-
sured by the average number of sampling actions per request in
our simulations. For context, the figure also includes the number
of samples that would have been taken theoretically, measured
by the average hops (routing messages) per request, for com-
parison. The number of samples actually taken is clearly less
than the number of hops in a request, because in many cases
the target node is already in the routing table of an intermediate
node. Nevertheless, the LPRS overhead is still large in absolute
terms, and worse, it increases logarithmically with network size.
In the companion technical report [39], we outline a heuristic
which adapts the sampling rate to the observed improvement
Fig. 10. Overhead comparison in LPRS-Chord.
Fig. 11. Impact of Zipf-ian document popularity.
in latency. While we have not performed a detailed evaluation
of this heuristic, it has the potential to significantly reduce
overhead.
3) Impact of Skewed Request Distributions: Our analysis
has assumed that targets are uniformly accessed. In practice,
the popularity of documents stored in DHT systems is likely
to be Zipf-ian (as it is, for example, for Web access patterns
[4]). How, then, are our results affected by a different pattern
of access to documents?
We simulated LPRS-Chord on a 1600 node ring network.
Unlike the previous simulations, the targets of each node’s
requests are chosen from a Zipf distribution. Each
node generates one request every clock tick. Fig. 11 plots
the evolution of stretch as a function of simulation time, for
generated requests. We compute the stretch at
every clock tick. (The evolution of stretch for other topologies
is qualitatively similar, and we omit these results for brevity).
We find that even with a Zipf-ian popularity, LPRS-Chord
eventually converges to a very desirable stretch. After 32 re-
quests per node, the stretch resulting from Zipf-ian access to
documents is less than 10% higher than the stretch resulting
from a uniform access to documents. This is easily explained;
as long as each node gets enough samples from each of its
1132 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
TABLE I
SIMULATION Dyn1600:RING TOPOLOGY, K = 40 000
ranges, LPRS-Chord will eventually converge to a good stretch
value. Zipf-ian access does not prevent nodes from obtaining
these samples—it just takes a little longer for some nodes to get
samples.
Equally interesting is the fact that the Zipf-ian access curve
also exhibits an initial steep reduction in stretch. This is good
news and suggests that relatively few samples are required to
bring the overall system to an acceptable performance regime.
4) Node Dynamics: While it is highly encouraging that
LPRS converges quite quickly (on the order of 10 s for lookups
even on relatively large overlays), it is important to understand
the impact of node dynamics (nodes joining and leaving an
overlay) on the latency stretch of LPRS-Chord. In this section,
we examine this impact by investigating highly dynamic sce-
narios, where the rates of node joining, leaving, and failures are
comparable to the rate of lookups.
To evaluate LPRS-Chord in such scenarios we use a slightly
modified simulation methodology from the one described in
Section V-B. First, we enable the new schemes mentioned in
Section III-C during the entire simulation run. In particular,
unlike before, LPRS-Chord collects samples throughout the
run. Second, unlike before, samples are collected during all
key lookup actions, whether they are for finding a data item,
or inserting one. This is what a real implementation would
do. Finally, the input to the simulator is a network topology
of size , the underlying network from which nodes are
chosen uniformly to join and leave the Chord overlay network
dynamically.
Table I describes one simulation scenario we design
in order to evaluate the impact of node dynamics (the
results for more scenarios are available in the com-
panion technical report [39]). Each node generates
events
with equal probability during the respective phases. The targets
for the requests are chosen uniformly. With the
default stabilization mechanism in Chord-SFS simulator, each
node refreshes one of its routing table entries every 30 s. In
simulation the size of the Chord network stays
around 1600 nodes.
Under node dynamics, routing state at various nodes can be
inconsistent, resulting in transient failures of lookups. To mea-
sure those, we count the number of timeouts, aborted requests,
and failed requests defined as following:
• timeout: During a request, a timeout occurs
when a node finds the next hop missing from the system.
Fig. 12. Simulation Dyn1600: latency stretch.
• aborted request:A request for target key is
said to be aborted when a node with ID receives the
request but finds out that all the nodes in its routing table
whose IDs are between and are absent from the system.
• failed requests:A request target key is
counted as failed when the request is forwarded to a node
where falls between its own ID and its successor node’s
ID but cannot be found at its successor node’s datastore.
Fig. 12 plots the stretch for Chord and LPRS-Chord under
the scenario Dyn1600. In this graph, the x-axis depicts the av-
erage number of requests generated per node. For ex-
ample, time 0–3 corresponds to Phase 3, time 4–15 corresponds
to Phase 4, etc. The y-axis depicts the corresponding metric av-
eraged over an interval consisting of 1600 requests.
LPRS-Chord maintains its low stretch performance even
when the rate of node joins, leaves, and failures is comparable
to the rate of lookups. A more careful examination reveals
that in highly dynamic situations, there is slight increase in
LPRS-Chord’s latency stretch. This is mainly due to an in-
crease in hop reach, which is not surprising since both the old
nodes that recently lose some proximal fingers and new nodes
need enough samples to find out nearby nodes. Counter to
intuition, the latency stretch of the original Chord exhibited an
improvement in stretch with increased churn in the network.
We believe the main reason is due to quirks in the Chord-SFS
simulator; the interested reader is referred to the companion
technical report [39] for the details.
Finally, Table II lists the number of lookup failures from our
simulation. In addition to absolute values, we also present the re-
sults in percentage. The percentage of aborted (failed) requests
is the ratio of the number of total aborted (failed) requests to
the number of total lookups in Phase 4, i.e., the dynamic phase.
For the percentage of timeouts, it is calculated as the ratio of
total timeout times to the total lookup hops in Phase 4, which
is equal to .
For example, the number of total lookup hops for LPRS-Chord
in Phase 4 of Simulation is equal to (19200 lookups
4.5 hops per lookup), i.e., 86 400 hops. Therefore, the per-
centage of LPRS-Chord timeouts is %.
As shown in Table II, LPRS-Chord had a comparable
number of aborted requests and failed requests, which was a
GOEL et al.: IMPROVING LOOKUP LATENCY IN DISTRIBUTED HASH TABLE SYSTEMS USING RANDOM SAMPLING 1133
TABLE II
ROUTING ROBUSTNESS OF CHORD AND LPRS-CHORD:SIMULATION RESULTS
relatively insignificant fraction of the total number of requests.
But LPRS-Chord incurred about three times the number of
timeouts, largely because finger table replacement is now not
based on finger usage. There are two complementary fixes to
handle this problem:
• Adapting our latency-sensitive replacement scheme to
carefully evict fingers that have not been used in a while.
• Borrowing ideas from Tapestry [38] like TCP timeouts
and back-pointers with heartbeat message, for detecting
failures.
VI. RELATED WORK
In early seminal work on the problem of reducing latency
in DHT systems, Plaxton, Rajaraman, and Richa [24] showed
that with carefully pre-configured routing tables, a data repli-
cation system can achieve asymptotically optimum request la-
tency if the underlying topology has power-law latency expan-
sion. Specifically, their scheme guarantees that the latency for
each request will be at most a constant factor larger than the
minimum possible latency for that request. This is stronger than
the guarantee for our random sampling scheme. However, in
their algorithm, each node needs to find the closest node in each
range, which may be prohibitively expensive.
Recently, Abraham, Malkhi, and Dubzinski [1] have built on
the work of Malkhi, Naor, and Ratajczak [19] and others to
achieve constant degree, logarithmic hops, small latency DHT
systems, assuming -power-law expansion in terms of latency.
Since their proximity technique borrows heavily from [24], i.e.,
each next hop is the nearest among all candidate nodes, their
routing table initialization and maintenance costs appear to be
comparable to that of [24].
CFS [9], a Chord-based storage system, uses proximity
routing [28] to improve its latency performance. While CFS
evaluates its lookup improvements using simulations, we are
able to rigorously establish LPRS-Chord’s performance on a
large class of graphs. Tapestry [38] and Pastry [31] implement
several heuristic latency optimizations, informed by the algo-
rithm of Plaxton et al. Their approaches require considerable
overhead during the node joining phase, while in LPRS the
effort gets piggybacked over normal DHT functionality, and
gets spread out over time. Hildrum et al. [13] study analytically
the tradeoff between the optimal degree and the cost of routing
table proximity optimization in the context of Tapestry. Rhea
et al. [29] propose a new DHT design Bamboo which is based
on the routing structure of Pastry and optimized for high levels
of node dynamics. In Bamboo, a node does both local tuning
and global tuning of its routing table. We observe that LPRS
is a good alternative to the explicit random sampling scheme
used in the global tuning approach. Ratnasamy et al. [26]
proposed using landmarks to make the overlay topology of
CAN resemble the underlying IP topology. Gummadi et al.
[15] studied how the underlying routing geometries affect the
resilience and proximity properties of DHTs, including the
geometric DHTs we studied here.
Karger and Ruhl [16] use similar sampling ideas to obtain an
elegant data structure for the nearest neighbor problem in graphs
with power-law latency expansion. Waldvogel and Rinaldi [37]
proposed a heuristic based on multi-dimensional embeddings
to improve latency in the context of topology-aware content-
addressable overlay networks. Ng and Zhang [10] proposed a
new approach to map P2P nodes into points in Euclidean space
so that distances in the Euclidean space approximate distances
in the underlying space.
Much of the above work is discussed in more detail in a com-
panion technical report [39].
VII. CONCLUSION
We have described LPRS, a fast random sampling tech-
nique for DHT systems that use geometric routing. We have
analytically shown that on graphs with power-law latency
expansion, LPRS can result in an average lookup latency that
is proportional to the average unicast latency. Analysis of a
very-large Internet router-level topology dataset shows that
the latency expansion of the Internet resembles a power-law.
This immediately implies that LPRS is a practical approach to
reducing lookup latency in DHT systems.
REFERENCES
[1] I. Abraham, D. Malkhi, and O. Dubzinski, “LAND: Locality aware net-
works for distributed hash tables,” presented at the ACM-SIAM Symp.
Discrete Algorithms (SODA04), New Orleans, LA, 2004.
[2] W. Aiello, F. Chung, and L. Lu, “A random graph model for massive
graphs,” in Proc. 32nd ACM Symp. Theory of Computing, Portland, OR,
2000, pp. 171–180.
[3] B. Bollobas, Random Graphs.. Orlando, FL: Academic, 1985.
[4] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching
and Zipf-like distribution: Evidence and implications,” in Proc. IEEE
INFOCOM, vol. 1, Mar. 1999, pp. 126–134.
[5] M. Castro, P. Druschel, Y . C. Hu, and A. Rowstron, “Exploiting network
proximity in peer-to-peer overlay networks,” Microsoft Research, Tech.
Rep. MSR-TR-2002-82, 2002.
[6] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and
A. Singh, “SplitStream: High-bandwidth multicast in a cooperative en-
vironment,” presented at the 19th ACM Symp. Operating Systems Prin-
ciples (SOSP’03), Lake Bolton, NY , Oct. 2003.
[7] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron, “SCRIBE:
A large-scale and decentralized application-level multicast infrastruc-
ture,” IEEE J. Sel. Areas Commun., vol. 20, no. 8, pp. 1489–1499, Oct.
2002.
[8] R. Cox, A. Muthitacharoen, and R. Morris, “Serving DNS using Chord,”
presented at the 1st Int. Workshop on Peer-to-Peer Systems (IPTPS’02),
Cambridge, MA, Mar. 2002.
[9] F. Dabek, M. F. Kasshoek, D. Karger, R. Morris, and I. Stoica, “Wide-
area cooperative storage with CFS,” presented at the 18th ACM. Symp.
Operating Systems Principles (SOSP’01), Banff, Canada, Oct. 2001.
[10] T. S. E. Ng and H. Zhang, “Predicting Internet network distance with
coordinates-based approaches,” in Proc. IEEE INFOCOM, vol. 1, Jun.
2002, pp. 170–179.
[11] C. Faloutsos, P. Faloutsos, and M. Faloutsos, “On power-law relation-
ships of the Internet topology,” in Proc. ACM SIGCOMM, Aug. 1999,
pp. 251–262.
[12] R. Govindan and H. Tangmunarunkit, “Heuristics for Internet map
discovery,” in Proc. IEEE INFOCOM, Tel-Aviv, Israel, Mar. 2000, pp.
1371–1380.
1134 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 13, NO. 5, OCTOBER 2005
[13] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y . Zhao, “Distributed
object location in a dynamic network,” presented at the 14th Annu. ACM
Symp. Parallel Algorithms and Architectures, Winnipeg, MB, Canada,
Aug. 2002.
[14] S. Iyer, A. Rowstron, and P. Druschel, “SQUIRREL: A decentralized
peer-to-peer web cache,” in Proc. 21st ACM Symp. Principles of Dis-
tributed Computing (PODC’02), Monterey, CA, 2002, pp. 213–222.
[15] K. Gummadi, R. Gummadi, S. Ratnasamy, S. Shenker, and I. Sotica,
“The impact of DHT routing geometry on resilience and proximity,” in
Proc. ACM SIGCOMM, Karlsruhe, Germany, Aug. 2003, pp. 381–394.
[16] D. R. Karger and M. Ruhl, “Finding nearest neighbors in growth-re-
stricted metrics,” in Proc. 34th ACM Symp. Theory of Computing
(STOC’02), Montreal, QC, Canada, May 2002, pp. 741–750.
[17] A. Keromytis, V . Misra, and D. Rubenstein, “SOS: Secure overlay ser-
vices,” in Proc. ACM SIGCOMM, Pittsburgh, PA, 2002, pp. 61–72.
[18] J. Kubiatowicz, D. Bindel, Y . Chen, S. Czerwinski, P. Eaton, D. Geels,
R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B.
Zhao, “Oceanstore: An architecture for global-scale persistent storage,”
in Proc. 9th Int. Conf. Architectural Support for Programming Lan-
guages and Operating Systems (ASPLOS 2000), Cambridge, MA, Nov.
2000, pp. 190–201.
[19] D. Malkhi, M. Naor, and D. Ratajczak, “Viceroy: A scalable and dy-
namic emulation of the butterfly,” in Proc. 21st ACM Symp. Principles of
Distributed Computing (PODC’02), Monterey, CA, 2002, pp. 183–192.
[20] R. Motwani and P. Raghavan, Randomized Algorithms.. Cambridge,
U.K.: Cambridge Univ. Press, 1995.
[21] V . N. Padmanabhan and L. Subramanian, “An investigation of ge-
ographic mapping techniques for Internet hosts,” in Proc. ACM
SIGCOMM, San Diego, CA, 2001, pp. 1–13.
[22] A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen, “Ivy: A read/write
peer-to-peer file system,” presented at the 5th Symp. Operating Systems
Design and Implementation (OSDI’02), Boston, MA, Dec. 2002.
[23] G. Phillips, S. Shenker, and H. Tangmunarunkit, “Scaling of multicast
trees: Comments on the Chuang-Sirbu scaling law,” in Proc. ACM SIG-
COMM, 1999, pp. 41–51.
[24] C. G. Plaxton, R. Rajaraman, and A. W Richa, “Accessing nearby
copies of replicated objects in a distributed environment,” in Proc. 9th
Annu. ACM Symp. Parallel Algorithms and Architectures, Jun. 1997,
pp. 311–320.
[25] S. Ratnaswamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A
scalable content-addressable network,” in Proc. ACM SIGCOMM, Aug.
2001, pp. 161–172.
[26] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “Topologically-
aware overlay construction and server selection,” in Proc. IEEE IN-
FOCOM, 2002, pp. 1190–1199.
[27] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “Application-level
multicast using content-addressable networks,” Proc. 3rd Int. Workshop
on Networked Group Communication (NGC ’01, pp. 14–29, Nov. 2001.
[28] S. Ratnasamy, S. Shenker, and I. Stoica, “Routing algorithms for DHTs:
Some open questions,” presented at the 1st Int. Workshop on Peer-to-
Peer Systems (IPTPS’02), Cambridge, MA, Mar. 2002.
[29] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn in
a DHT,” Univ. of California, Berkeley, CA, Tech. Rep. UCB//CSD-03-
1299, Dec. 2003.
[30] S. C. Rhea and J. Kubiatowicz, “Probabilistic location and routing,” in
Proc. IEEE INFOCOM, vol. 3, Jun. 2002, pp. 1248–1257.
[31] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object lo-
cation and routing for large-scale peer-to-peer systems,” in Proc. 18th
IFIP/ACM Int. Conf. Distributed Systems Platforms (Middleware), Hei-
delberg, Germany, Nov. 2001, pp. 329–350.
[32] A. Rowstron and P. Druschel, “Storage management and caching in
PAST, a large-scale, persistent peer-to-peer storage utility,” presented
at the ACM Symp. Operating Systems Principles (SOSP’01), Banff,
Canada, Oct. 2001.
[33] N. Spring, R. Mahajan, and D. Wetherall, “Measuring ISP topologies
with Rocketfuel,” in Proc. ACM SIGCOMM, Pittsburgh, PA, Aug. 2002,
pp. 133–145.
[34] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana, “Internet
indirection infrastructure,” in Proc. ACM SIGCOMM, Pittsburgh, PA,
Aug. 2002, pp. 73–86.
[35] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan,
“Chord: A peer-to-peer lookup service for Internet applications,”
presented at the ACM SIGCOMM, San Diego, CA, Sep. 2001.
[36] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will-
inger, “Network topology generators: Degree-based vs. structural,” pre-
sented at the ACM SIGCOMM, Pittsburgh, PA, Aug. 2002.
[37] M. Waldvogel and R. Rinaldi, “Efficient topology-aware overlay net-
work,” presented at the 1st Workshop on Hot Topics in Networks (Hot-
Nets-I), Princeton, NJ, Oct. 2002.
[38] B. Y . Zhao, J. D. Kubiatowicz, and A. D. Joseph, “Tapestry: An infra-
structure for fault-resilient wide-area location and routing,” Univ. of Cal-
ifornia, Berkeley, Tech. Rep. UCB/CSD-01-1141, Apr. 2001.
[39] A. Goel, R. Govindan, and H. Zhang. (2004) Improving lookup
latency in distributed hash table systems using random sampling.
Comput. Sci. Dept., Univ. of Southern California. [Online]. Available:
http://www.cs.usc.edu/Research/TechReports/04-825.zip
[40] S. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz, “Bayeux:
An architecture for scalable and fault-tolerant widearea data dissemina-
tion,” in Proc. 11th Int. Workshop on Network and Operating System
Support for Digital Audio and Video (NOSSDAV 2001), Port Jefferson,
NY , Jun. 2001, pp. 124–133.
[41] Chord Simulator. [Online]. Available: http://pdos.lcs.mit.edu/cgi-
bin/cvsweb.cgi/sfsnet/simulator
Hui Zhang (S’99) received the B.Eng. degree from
Hunan University, China, and the M.Eng. degree
from the Institute of Automation, Chinese Academy
of Sciences, China. He is currently pursuing the
Ph.D. degree in computer science at the University
of Southern California, Los Angeles.
His research interests lie in algorithmics and mod-
eling for peer-to-peer systems and World Wide Web.
Mr. Zhang has been a student member of the ACM
since 1999.
Ashish Goel received the Ph.D. degree in computer
science from Stanford University, Stanford, CA, in
1999.
He was an Assistant Professor of computer
science at the University of Southern California,
Los Angeles, from 1999 to 2002. He is currently
an Assistant Professor in the Departments of
Management Science and Engineering and (by
courtesy) Computer Science at Stanford University.
His research interests lie in the design, analysis, and
applications of algorithms.
Prof. Goel is a recipient of an Alfred P. Sloan faculty fellowship (2004–06), a
Terman faculty fellowship from Stanford, and an NSF Career Award (2002–07).
He has been a member of the ACM since 1999.
Ramesh Govindan received the B.Tech. degree from
the Indian Institute of Technology, Madras, and the
M.S. and Ph.D. degrees from the University of Cali-
fornia at Berkeley.
He is an Associate Professor in the Computer Sci-
ence Department at the University of Southern Cal-
ifornia, Los Angeles. His research interests include
scalable routing in Internetworks, and wireless sensor
networks.
Dr. Govindan is a member of the ACM.
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 786 (2003)
PDF
USC Computer Science Technical Reports, no. 822 (2004)
PDF
USC Computer Science Technical Reports, no. 817 (2004)
PDF
USC Computer Science Technical Reports, no. 764 (2002)
PDF
USC Computer Science Technical Reports, no. 742 (2001)
PDF
USC Computer Science Technical Reports, no. 931 (2012)
PDF
USC Computer Science Technical Reports, no. 796 (2003)
PDF
USC Computer Science Technical Reports, no. 669 (1998)
PDF
USC Computer Science Technical Reports, no. 872 (2005)
PDF
USC Computer Science Technical Reports, no. 914 (2010)
PDF
USC Computer Science Technical Reports, no. 758 (2002)
PDF
USC Computer Science Technical Reports, no. 852 (2005)
PDF
USC Computer Science Technical Reports, no. 839 (2004)
PDF
USC Computer Science Technical Reports, no. 773 (2002)
PDF
USC Computer Science Technical Reports, no. 957 (2015)
PDF
USC Computer Science Technical Reports, no. 774 (2002)
PDF
USC Computer Science Technical Reports, no. 841 (2005)
PDF
USC Computer Science Technical Reports, no. 848 (2005)
PDF
USC Computer Science Technical Reports, no. 706 (1999)
PDF
USC Computer Science Technical Reports, no. 776 (2002)
Description
Ashish Goel, Ramesh Govindan, Hui Zhang. "Improving lookup latency in distributed hash table systems using random sampling." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 825 (2004).
Asset Metadata
Creator
Goel, Ashish
(author),
Govindan, Ramesh
(author),
Zhang, Hui
(author)
Core Title
USC Computer Science Technical Reports, no. 825 (2004)
Alternative Title
Improving lookup latency in distributed hash table systems using random sampling (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
14 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270128
Identifier
04-825 Improving Lookup Latency in Distributed Hash Table Systems using Random Sampling (filename)
Legacy Identifier
usc-cstr-04-825
Format
14 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/