Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 717 (1999)
(USC DC Other)
USC Computer Science Technical Reports, no. 717 (1999)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
Heuristics for Internet Map Discovery
Ramesh Govindan, Hongsuda Tangmunarunkit
USC/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292, USA
Abstract— Mercator is a program that uses hop-limited
probes—the same primitive used in traceroute—to infer an
Internet map. It uses informed random address probing to
carefully exploring the IP address space when determining
router adjacencies, uses source-route capable routers wher-
ever possible to enhance the fidelity of the resulting map,
and employs novel mechanisms for resolving aliases (inter-
faces belonging to the same router). This paper describes the
design of these heuristics and our experiences with Merca-
tor, and presents some preliminary analysis of the resulting
Internet map.
I. INTRODUCTION
Obtaining a router-level map of the Internet has received
little attention from the research community. This is per-
haps unsurprising given the perceived difficulty of obtain-
ing a high-quality map using the minimal support that ex-
ists in the infrastructure. However, as we show in this pa-
per, it is useful, and possible, to get an approximate map of
the Internet. Such a map is a first step in trying to under-
stand some of the macroscopic properties of the Internet’s
physical structure. Other potential uses of Internet maps
have been described elsewhere [17] and we do not repeat
them here.
This paper documents a collection of heuristics, some
well-known and some obscure, for inferring the router-
level map of the Internet. Of the many possible defini-
tions of the word map, the one we choose for the purposes
of this paper is: a graph whose nodes represent routers
in the Internet and whose links represent adjacencies be-
tween routers. Two routers are adjacent if one is exactly
one IP-level hop away from the other. In a Section IV-A,
we discuss the implications of this definition, and describe
how well our map collection heuristics allow us to infer
the complete Internet map. Inferring the number, or IP ad-
dresses, of Internet hosts is an explicit non-goal.
Perhaps the only ubiquitously available primitive to in-
fer router adjacencies is the hop-limited probe. Such a
probe consists of a hop-limited IP packet, and the corre-
sponding ICMP response (if any) indicating the expiration
of the IP ttl field (or other error indicators). The tracer-
oute tool uses this primitive to infer the path to a given
destination. Generally speaking, all earlier mapping ef-
forts [17], [4], [6] have computed router adjacencies from
a sequence of traceroutes to different Internet destinations.
The destinations to direct the traceroutes are usually de-
rived from one or more databases (e.g. such as routing
tables, the DNS, or a precomputed table of host address-
es). Finally, with one exception, all earlier mapping efforts
have attempted to map the Internet by sending hop-limited
packets from a single location in the network. Section II
describes in greater detail these efforts at mapping the In-
ternet.
In this paper, we describe the heuristics employed by
our Internet mapper program, Mercator. Written entire-
ly from scratch, Mercator requires no input, i.e.,, it does
not use any external database in order to direct hop-limited
probes. Instead, it uses a heuristic we call informed ran-
dom address probing; the targets of our hop-limited probes
are informed both by results from earlier probes, as well as
by IP address allocation policies. Such a technique enables
Mercator to be deployed anywhere because it makes no as-
sumptions about the availability of external information to
direct the probes. Mercator also uses source-routing to di-
rect the hop-limited probes in directions other than radially
from the sender. This enables Mercator to discover “cross-
links”—router adjacencies that might otherwise not have
been discovered. As we describe later, these heuristics can
result in several aliases (interface IP addresses) for a single
router. Mercator also contains some heuristics for resolv-
ing these aliases. Section III describes these heuristics in
greater detail, and discusses the limitations of the resulting
map.
In Section IV, we discuss our experiences with a deploy-
ment of mapper. We describe various techniques we have
used to validate the map resulting from this deployment.
Finally, we present some preliminary estimates of the size
of this map, and analyze some graph-theoretic properties
of the map. Finally, Section V summarizes our main con-
tributions, and indicates directions for future work.
II. RELATED WORK
Several Internet mapping projects have, or are current-
ly attempting to, obtain a router-level map of the Internet.
The earliest attempt we know of [13] traced paths to 5000
2
destinations from a single network node. These destina-
tions were obtained from a database of Internet hosts that,
in 1995, had sent electronic mail to a particular organiza-
tion. In addition, a small number (11) of these destinations
were used as intermediate nodes in source-routed tracer-
outes to the remaining destinations. Although the use of
source routing can result in greater map fidelity, it is un-
clear how complete the resulting map is, given that the
chosen destinations essentially represent an arbitrary sub-
set of hosts in the Internet.
More recently, researchers [4], [17] have used BGP
backbone routing tables in order to determine the desti-
nations of traceroutes. For each prefix in the table, they
repeatedly generate a randomly chosen IP address from
within that prefix. From traceroutes to each such address,
they determine router adjacencies, building a router adja-
cency graph in this manner. As we show in Section III,
this alone does not result in an Internet map as we have
defined it. In particular, these techniques may miss back-
up links (for which [17] proposes—but does not report re-
sults of—tracing from several locations in the Internet).
Furthermore, these traceroutes may discover two or more
interfaces belonging to the same router; these projects do
not propose techniques for resolving such aliases. Final-
ly, the skitter tool [6] uses a database of Web servers to
determine traceroute targets.
Techniques for collecting other representations of Inter-
net, such as the AS (Autonomous System) topology, have
also been documented in the literature. In one approach,
traces of backbone routing activity over a period of several
days have been used to infer inter-AS “links” [12]. Each
link represents an inter-ISP peering or a customer-ISP con-
nection. Instantaneous dumps of backbone routing tables
have also been used to infer AS-level links [3].
In some cases, router support can be used to determine
router adjacencies. For example, Intermapper [7] build-
s a list of router adjacencies by recursively interrogating
routers’ SNMP [5] MIBs. A similar technique can also be
used for the Internet’s multicast overlay network, the M-
Bone [18]. Routers on the MBone support an IGMP query
that returns a list of neighbors.
Our interest in heuristics for Internet mapping are moti-
vated by the desire to understand network structure better.
In this paper, we present some preliminary results from
analyzing the resulting map. More generally, a high qual-
ity Internet map can be used to validate compact topology
models such as those proposed in [9]. Such models, or the
resulting Internet map itself, can be used as input to simu-
lations [1]. Moreover, high quality Internet maps can also
be used to validate hypotheses about scaling limits of real
networks [15].
III. MAPPING HEURISTICS
In this section, we discuss the design of several heuris-
tics for mapping the Internet. This design is driven by sev-
eral goals and requirements, which we discuss first.
A. The Challenge
The challenge we set ourselves at the outset of this
project was to find a collection of heuristics that would
allow us to map the Internet:
From a single, arbitrary, location,
Using only hop-limited probes.
Why is this an interesting formulation of the mapping
problem?
We chose the first restriction for two reasons. First, de-
ployment of the mapping software then becomes trivial,
especially if the heuristics do not require a specific topo-
logical placement (e.g., at exchange points or within back-
bone infrastructures). In fact, the results described in this
paper were obtained by running the Mercator software on
a workstation at the edge of a campus network. Second,
while it is certainly feasible to design a distributed map-
ping scheme, we chose to defer the complexity of imple-
menting such a scheme until after we had explored the cen-
tralized mapping solution. Upon first glance, requiring our
mapping software to run from a single node might seem
too restrictive. It would appear that the single perspective
provided by this restriction could result in large inaccura-
cies in the map. As we show later in Section III-D, the use
of source routing can help alleviate these inaccuracies.
The second restriction—using only hop-limited probes—
makes only minimal assumptions about the availability of
network functionality. This restriction follows from the
first, allowing the mapping software to be deployed any-
where in the network. It also implies that we explicitly
chose not to use any external databases (the DNS, routing
table dumps) to drive map discovery. Not only does this
choice pose an academically interesting question (How
can we map an IP network starting with nearly zero initial
information?), but it can also lead to more robust heuristic-
s for map discovery. As we show later, not all backbones
have routing table entries for all Internet addresses (Sec-
tion III-E, and not all router interfaces are populated in the
DNS (Section IV-C).
There are several secondary requirements that inform
the design of map discovery heuristics.
Obviously, the resulting map must be complete. Clearly,
this requirement conflicts with the single-location restric-
tion; using hop-limited probes from a single location can-
not possibly reveal all router adjacencies. In Section IV-A,
we argue that our heuristics can give us nearly complete
maps of the transit portion of the Internet.
3
The map discovery heuristics must not impose signifi-
cant probing overhead on the network.
Informally, the heuristics must not result in significant-
ly slower map discovery compared to existing approaches.
Rather than quantify this requirement, we chose to sacri-
fice rapidity of map discovery in favor of completeness and
reduced overhead wherever necessary. This choice has in-
teresting consequences, as described in Section IV-A.
B. Informed Random Address Probing
Mercator uses hop-limited probes to infer router adja-
cencies, but does not use external databases to derive tar-
gets for these probes. In the absence of external infor-
mation, one possible heuristic—with obvious convergence
implications—is to infer adjacencies by probing paths to
addresses randomly chosen from the entire IP address s-
pace [17]. In this section, we describe a different heuristic,
informed random address probing.
The goal of this heuristic is to guess which portions of
the entire IP address space contain addressable nodes. An
addressable node is one which has IP-level connectivity to
the public Internet. Because IP addresses are assigned in
prefixes, Mercator makes informed guesses about which
prefixes might contain addressable nodes. From within
each such addressable prefix, it then uniformly randomly
selects an IP address as the target for one or more hop-
limited probes (described in Section III-C). Mercator uses
two techniques to guess addressable prefixes:
1. Whenever it sees a response to a hop-limited probe from
some IP address , Mercator assumes that some prefix of must contain addressable nodes. In this sense, Mer-
cator is informed by the map discovery process itself.
2. If is an addressable prefix, Mercator guesses
that the neighboring prefixes of (e.g., 128.8/16
and 128.10/16 are the neighboring prefixes for
128.9/16) are also likely to be addressable
1
. This tech-
nique is based on the assumption that address registries
delegate address spaces sequentially.
In the following paragraphs, we describe the details of in-
formed random address probing.
In a given instance of Mercator, repeated application of
these two techniques leads to a gradually increasing pre-
fix population. In order for both the above techniques to
work, the prefix population must be seeded with at least
one prefix. Mercator uses the IP address of the host it is
running on to infer this seed prefix. Then, technique 2
above ensures that this prefix population eventually cov-
ers the entire address space. Technique 1 attempts to en-
sure that addressable prefixes are explored early, leading
Note that Mercator does not rely on these neighboring prefixes being
topologically related—e.g., assigned by the same ISP—to .
to more rapid map discovery. Technique 1 alone is insuf-
ficient for complete map discovery. For some choices of
seed prefix, using this technique alone might only result in
a campus-local map.
For technique 1, we need a heuristic that infers the
length of an addressable prefix from a router’s IP ad-
dress . In classful terms [16], if is a class A (class B)
address, Mercator assumes that ’s prefix length is 8 (re-
spectively 16). This heuristic is based on pre-CIDR [11]
address allocation policies. If is a class C adddress,
Mercator assumes that is a prefix of length 19. This
assumption is based on the practice of some top-level ad-
dress registries in allocating CIDR blocks of length not
greater than 19. For simplicity, we chose relatively coarse-
grained guesses about the size of addressable prefixes. As
a result, only a small portion of an addressable prefix might actually contain addressable nodes. However, be-
cause we probe randomly (and repeatedly, as described in
Section III-C) within , the choice of the prefix length
does not affect the completeness of the resulting map, on-
ly the rapidity of the discovery.
Finally, we need a heuristic that determines how of-
ten technique 2 is invoked, and which addressable prefix’s
neighbor is chosen. If, within some window application
of technique 1 has not resulted in an increase in the prefix
population, Mercator selects a neighbor of some existing
prefix . In our implementation, is chosen to be 3 min-
utes. is chosen from among those prefixes in the pop-
ulation at least one of whose neighbors is not in the pop-
ulation. Among these prefixes, is selected by a lottery
scheduling [20] algorithm. The number of lottery tickets
for each prefix is proportional to the fraction of successful
probes (Section III-C) on that prefix. This heuristic at-
tempts to explore neighbors of those prefixes that are more
densely populated with addressable nodes.
C. Path Probing
To discover the Internet map, Mercator repeatedly se-
lects a prefix (in a manner described later in this subsec-
tion) from within its population, and probes the path to
an address selected uniformly from within that prefix.
Like traceroute, Mercator sends UDP packets to with
successively increasing ttls. To minimize network traffic,
the path probe is self-clocking—the next UDP packet is
not sent until a response to the previous one has been re-
ceived. The probing stops either when is reached, or
when a probe fails to elicit a response, or a loop is detect-
ed in the path. Unlike traceroute, the latter two termination
conditions are appropriate for Mercator since our interest
is in inferring router adjacencies.
A path probe results in a sequence of routers
4
such that responded with an ICMP time
exceeded for a UDP probe with ttl 1, and so on. From this
sequence, Mercator inserts into its map nodes correspond-
ing to , etc., and links , etc.,if
these nodes and links were not already in the map.
To reduce path probing overhead, not all path probes s-
tart at ttl 1. Rather, from the results of each path probe
for prefix , Mercator computes the furthest router in
that path that was already in the map at the time the probe
completed. Subsequent path probes to start at the ttl cor-
responding to . If the first response is from , Mercator
continues the path probe, otherwise it backtracks the path
probe to ttl 1. This technique allows Mercator to avoid,
where possible, rediscovering router adjacencies. More-
over, it reduces the probing overhead on routers in the
vicinity of the host on which Mercator executes.
In what order are prefixes selected for probing? Rather
than selecting prefixes in round-robin fashion, Mercator
uses the lottery scheduling algorithm [20] where each pre-
fix has lottery tickets proportional to the fraction of suc-
cessful probes addressed to the prefix. A probe is deemed
successful if it discovers at least one previously unknown
router. With lottery scheduling, then, Mercator’s probing
is biased towards recently created prefixes densely popu-
lated with addressable nodes. This heuristic attempts to
speed up map discovery.
Finally, Mercator is designed to allow multiple path
probes to proceed concurrently. This configurable num-
ber can be used to tradeoff rapidity of map discovery for
increased overhead.
D. Source-Routed Path Probing
Intuitively, in a shortest-path routed network, one might
expect that path probing (Section III-C) results in a tree
rooted at the host running Mercator. For two reasons, how-
ever, path probing actually discovers a richer view of the
topology:
Inter-domain routing in the Internet is policy-based [19].
Policies can result in widely divergent paths to two topo-
logically contiguous routing domains (Figure 1(a)).
Mercator continuously probes each addressable prefix
over several days. It can therefore potentially discover
backup paths to addressable prefixes (Figure 1(b)).
Even so, because Mercator attempts to discover the In-
ternet topology from a single location, it may miss some
“cross-links” in the Internet map. An obvious approach to
increasing the likelihood of discovering all links in the In-
ternet map is to run Mercator from different nodes in the
network.
In this section, we describe a different solution to the
problem, source-routed path probing. Essentially, our
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Percentage of discovered links
Percentage of source-route capable nodes
100 node topology, degree 5
100 node topology, degree 20
1000 node topology, degree 5
1000 node topology, degree 20
Fig. 2. In relatively sparse random networks, only a few source-
route capable nodes (< 5%) are sufficient to discover 90% of
the links. The one plot in this figure that doesn’t satisfy this
criterion is a dense random graph (the 100 node graph with
a degree 20). This figure serves to illustrate the intuition
behind the use of source-routed path probing.
technique directs path probes to already discovered router-
s via source-route capable routers in the Internet (Sec-
tion 1(c)). Not all routers in the Internet are source-route
capable. In fact, only a small fraction (about 8%, see
Section IV) support source routing. In some cases, these
routers are located at small ISPs who have neglected to
disable this capability in their routers. However, a number
of large backbone provider routers are also source-route
capable. Presumably, these providers use source-routed
traceroutes to diagnose connectivity problems in their in-
frastructures.
Can such a small number of source-route capable router-
s help us discover cross-links? Roughly speaking, each
source-route capable router can give us the same perspec-
tive as an instance of Mercator running at the same loca-
tion. How effective a small fraction of routers is in helping
us discover the entire map depends on several factors in-
cluding the structure of the Internet, the location of the host
running Mercator, and the placement of these source-route
capable nodes. To see whether there is even a plausible ar-
gument for the efficacy of source-routed path probing, we
conducted the following experiment. We generated ran-
dom graphs of different sizes with different average node
degrees. For each such graph, we computed the percentage
of links discovered as a function of the fraction of source-
route capable nodes randomly placed in the network. We
found that (Figure 2) for a relatively sparse graph (one with
a small ratio of the average node degree to the total number
of nodes), 5% of source-route capable routers is enough to
discover 90% of the links.
To detect source-route capability in a router , we at-
tempt to send a source-routed UDP packet via to a ran-
domly chosen routers from our map. This UDP pack-
et is directed to a randomly chosen port and will usually
5
Mercator Host
AB
Path to B
Path to A
(a) Routing policy can lead to richer
topology views....
Mercator host
A
Backup
path to A
Path to A
(b) ... as can backup paths
Mercator host
A
Path to A
Source-route
capable router
Source-routed
path to A
(c) Source-routing provide a differen-
t perspective for probing
Fig. 1. Continuous path probing does not only result in a shortest-path tree routed at the host running Mercator. The ellipses
represent autonomous systems. Where available, source routing can lead to richer topology views.
elicit an ICMPport unreachable message from the
target router. If we do receive such a response, we mark source-route capable. We conduct this test for every router
discovered by the path probing heuristic. Furthermore, for
each router , we repeatedly test for source-route capabili-
ty (to limit overhead, only a small number of such tests are
run concurrently). We do this because a given test may fail
for reasons other than the lack of source-route capability:
may be unreachable at the time the test was conducted,
may not have a route to , and so on.
Once a router is deemed source-route capable, we re-
peatedly attempt to send path probes via to randomly
chosen routers from our map. This method of choosing
probe targets is different from that used by path probing
(Section III-C). We chose this approach only to avoid tar-
geting hosts and setting off alarms—many host implemen-
tations automatically log source-routed packets. Even so,
our probing activity was noticed by several system admin-
istrators (Section IV-B). In order to reduce probing over-
head, the source-routed path probe starts at . The initial
hop-count is inferred from the response obtained when was first discovered. This initial hop-count guess might
be incorrect (e.g., because the path to has changed), but
this does not affect the correctness of our map discovery.
As before, we send hop-limited probes with successive-
ly increasing ttls, terminating only if we reach the desti-
nation, a loop is detected in the path, or a small number
(three, in our implementation) of consecutive hop probes
have failed. This last criterion was empirically chosen to
carefully balance the overhead of probing against the ra-
pidity of map discovery.
E. Alias Resolution
To simplify the preceding discussion, we said that path
probes (source-routed or otherwise) discover routers. Path
probes actually discover router interfaces. Thus, a single
Mercator instance might discover more than one interface
belonging to the same router (i.e., multiple aliases for the
router). For example, because of policy differences, paths
from the Mercator host to two different destinations can
intersect (Figure 1(a)). Similarly, the primary and backup
paths to a destination might overlap (Figure 1(b)). Finally,
a source-routed path probe can, because it probes from a
different perspective, discover additional router interfaces
(Figure 1(c)).
Resolving these aliases is an important step for obtain-
ing an accurate map. Unfortunately, these aliases cannot
be resolved by examining the syntactic structure of inter-
face addresses. This is because a router’s interfaces may
be numbered from entirely different IP prefixes; this com-
monly occurs at administrative boundaries. The DNS is e-
qually ineffective for these purposes; where interfaces are
assigned DNS names, different interfaces are assigned d-
ifferent names. At administrative boundaries, different in-
terfaces of a router may actually be assigned names be-
longing to different DNS domains.
Our solution leverages a suggested (but not required)
feature of IP implementations [2]. Suppose a host ad-
dresses a UDP packet to interface A of a router (Fig-
ure 3(a)). Suppose further that that packet is addressed to
a non-existent port. In what follows, we call such a packet
an alias probe. The corresponding ICMP port unreachable
response to this packet will contain, as its source address,
the address of the outgoing interface for the unicast route
towards . In Figure 3(a), this interface is B. We have ver-
ified this behavior of IP stacks belonging to at least two
major router vendors.
This feature suggests a simple heuristic for alias reso-
lution. Send an alias probe to interface X. If the source
address on the resulting ICMP message (assuming there is
one) is Y, then X and Y are aliases for the same router.
This heuristic was also suggested in earlier work [13].
However, Mercator uses two additional refinements neces-
sary to correctly implement alias resolution.
6
AB
Mercator Host
Alias probe
to A
Response
exits router from B,
has B as source address
(a) The alias probe
Mercator host
A
Source-route
capable router
Source-routed
alias probe
Alias probe
to A fails here,
no route
(b) The source-routed alias probe
Fig. 3. Repeated alias probing can resolve router aliases. In some cases, source-routed alias probes are necessary for alias
resolution.
The first refinement repeatedly sends alias probes to an
interface address. To see why this is necessary, consid-
er the following scenario. Suppose, as in Figure 3(a), a
router has two interfaces A and B. Suppose further that,
at some instant, an alias probe to A returns A itself. This
implies that, at that instant, the router’s route to the send-
ing host exits interface A. It is possible that, at some later
instant, an alias probe to B returns B. This can happen if,
at the instant the second probe is sent, the router’s route to
the sending host exits interface B. Mercator therefore re-
peatedly sends alias probes to every known interface. To
limit probing traffic, only a small number of alias probes
are run concurrently. The efficacy of such repeated alias
probing is based on the observation that there exist domi-
nant paths [14] and routes [12] in the Internet. There is a
high likelihood that, eventually, a large fraction of the alias
probes will be returned via the router’s dominant path to
the sending host.
The second refinement uses source-routed alias probes.
Figure 3(b) explains why this is necessary. In prac-
tice, large backbones do not have complete routing tables.
Thus, the backbone(s) to which the sending host is even-
tually connected may not be able to forward an alias probe
to its eventual destination. To circumvent this, Merca-
tor attempts to send alias probes via source-route capable
routers. This random, combinatorial search, of all source-
route capable routers is a time-consuming process. How-
ever, in the absence of other information about where one
might find a route to a given interface, we believe this is
the only option left to us.
F . Mercator Software Design
Although some of the probing primitives that Mercator
uses are available in the traceroute program, we chose to
implement Mercator from scratch. Doing so allowed us the
flexibility of trying different probing heuristics (e.g.,, dif-
ferent termination conditions for source-routed path prob-
ing, Section III-D). It also enabled us to carefully tune the
overhead imposed by probing without sacrificing the ra-
pidity of map discovery (e.g.,, the path probing optimiza-
tion described in Section III-C).
Mercator is implemented on top of Libserv, a collection
of C++ classes written by one of the authors that provides
non-blocking access to file system and communication fa-
cilities. This allows Mercator to have several outstanding
probe packets, and to independently tune the number of
outstanding path probes, source-routed path probes, and
alias probes. Together with Libserv, Mercator is about
14,000 lines of C++ code.
Mercator periodically checkpoints its map to disk. It
is completely restartable from the latest (or any) check-
point. We made several heuristic revisions during the im-
plementation of Mercator, and this feature allowed us to
avoid re-discovering sections of the Internet topology. It
also allowed us to recover from buggy implementations of
heuristics.
IV. EXPERIENCES AND PRELIMINARY RESULTS
Although the heuristics described in Section III seem
plausible, several questions remain unanswered: Is the re-
sulting Internet map complete? What techniques might we
use to validate the resulting map? How well do the heuris-
tics work? What does the map look like? This section
attempts to answer some of these questions.
A. Implications of our Methodology
A careful examination of the heuristics described in
Section III reveals several important implications of our
methodology. The “map” discovered by Mercator is not a
topology map—it does not enumerate all router interfaces,
and does not depict shared media.
First, because it uses path probing, Mercator cannot dis-
cover all interface addresses belonging to a router. Instead,
it discovers only those interfaces through which paths from
the Mercator host “enter” the router. Source-routed path
probing can help increase the number of interfaces discov-
7
ered, but our use of source-routed path probing does not
guarantee that all routers are discovered.
Second, Mercator does not implement heuristics for dis-
covering shared media. To do this, it would have to infer
the subnet mask assigned to router interfaces. Unfortu-
nately, two potential probing techniques for inferring sub-
net masks do not work very well. The ICMP query that
returns the subnet mask associated with an interface is
not widely implemented. Sending ICMP echo requests to
different broadcast addresses (corresponding to differen-
t subnet masks) and inferring the subnet mask from the
broadcast request that elicits the most responses [17] does
not work either—this capability is now disabled in most
routers in response to smurf attacks. It may be possible,
however, to infer shared media by isolating highly meshed
sections of our Internet map; this remains future work.
Mercator’s map is not an instantaneous map of the In-
ternet. To reduce probing overhead, Mercator limits the
number of outstanding path probes. As a result, it takes
several weeks (Section IV-B) to discover the map of the
Internet. During this interval, of course, new nodes and
links may have been added; Mercator is unable to distin-
guish these from pre-existing nodes. For this reason, the
resulting map is a time-averaged picture of the network.
Furthermore, Mercator discovers a time-averaged routed
topology. If a router adjacency is used as a backup link,
and that backup link is never traversed during a Mercator
run, that adjacency will not be discovered. Similarly, if
two routers are physical neighbors on a LAN, but never
exchange traffic between themselves (e.g.,, for policy rea-
sons), that adjacency is never discovered.
Finally, Mercator does not produce a complete map of
the Internet. In particular, Mercator does not discover de-
tails of stub (campus) networks. Even though a single run
of Mercator sends a large number of path probes, a given
campus network is not probed frequently enough to dis-
cover the entire campus topology. As well, we believe that
many campuses have source-routing turned off, so Merca-
tor is unable to discover cross-links. However, we have
a higher confidence in the degree of completeness of the
Mercator map with respect to the transit portion of the In-
ternet. This is because our probes traverse the transit por-
tion more frequently (and this is true to a greater extent
of the core of the Internet). We could obtain a more com-
plete map by running multiple instances of Mercator and
correlating the results. This is left for future work.
B. Experiences
We implemented the heuristics of Section III and ran
one instance of Mercator on a PC running Linux. When
configured with a limit of 15 concurrent path probes, Mer-
cator takes about 3 weeks to discover nearly 150,000 inter-
faces and nearly 200,000 links. These numbers are greater
than the corresponding numbers obtained from [4], serv-
ing as a simple validation of the completeness of our run.
Before we discuss how we validated our Internet map, and
what that map reveals in terms of the macroscopic Inter-
net structure, we describe our early experiences in running
Mercator and analyzing some of the data.
How well do our techniques for inferring addressable
prefixes work? To answer this question, we first obtained
the list of prefixes contain in a backbone routing table
2
.We
then compared this with the prefixes inferred by Mercator.
Only 8% of the prefixes in the routing table were not “cov-
ered” by any prefix inferred by Mercator. Conversely, 20%
of prefixes inferred by Mercator did not have at least one
overlapping prefix in the routing table. This latter figure
is not surprising; our assumption that the “neighbor” of an
addressable prefix is also addressable (Section III-B) can
result in an overestimation of the addressable space. These
numbers validate both the efficacy of the heuristic and the
near completeness of our exploration of the address space.
How well does path probing work? There is a percep-
tion that many ISPs disable traceroute capability through
their infrastructures. If this were true of the major US
backbones, we would clearly not be able to see any routers
belonging to these in our map. In fact, we do. There is also
a perception that some ISPs configure their routers to not
decrement TTLs across their infrastructures. If this were
true, our map would contain many routers with a large out-
degree (i.e., the entire ISP appears to be one large router).
We have found one possible instance of this, but no evi-
dence that this practice is widespread. We conjecture that
traceroute availability is higher than supposed by many re-
searchers, because ISPs use this capability to debug their
infrastructures.
We also found some instances of obvious misconfigura-
tion. Mercator received at least one ICMP time exceeded
message whose source address was a Class D (multicas-
t) address. It was also able to successfully probe the path
to some net 10 addresses; such addresses are reserved for
private use and should not be globally routable [11].
Our experience with alias resolution has been mixed.
Mercator was able to resolve nearly 20,000 interfaces.
However, many other interfaces were unreachable from
the Mercator host. Mercator could not therefore determine
whether such interfaces were aliases for already discov-
ered routers. Three causes may be attributed to this:
Some of these interfaces are numbered from private ad-
dress space [11], and (with minor exceptions), this space
From a public route server atroute-views.oregon-ix.net.
8
is not globally routable. To these interfaces, then, the alias
resolution procedure cannot be applied.
Some others are assigned non-private addresses, but the
corresponding ISP does not propagate routes to these in-
terfaces beyond its border. To these interfaces, again, the
alias resolution procedure cannot be applied.
Finally, some of these interfaces are assigned non-
private addresses, and some, but not all parts of the In-
ternet core (and, in particular, the backbone ISP to which
the Mercator host defaults) carry routes to these interfaces.
To such addresses, we attempted source-routed alias reso-
lution (Section III-E). This heuristic works, but is very s-
low, because it involves a random search among all source-
route capable routers. As of this writing, we were able
to resolve only about three thousand interfaces using this
technique.
To date, Mercator has discovered about 3,000 distinct
3
pri-
vate addresses. A total of 15,000 interfaces are not direct-
ly reachable from Mercator, and its unclear how many of
these can be reached using source-routing.
Given the widely-held belief that “source-routing does
not work”, our experiences with source-routing path prob-
ing (Section III-D) are probably of most interest. We sum-
marize our experiences below:
About 8% (nearly 10,000) routers on the Internet are
source-route capable. Of these routers, several belong to
large US backbones, implying that they are actively used
(presumably to debug ISP infrastructures using source-
routed traceroutes).
Source-routed path probing works reasonably well (see
the next bullet), although, for a given traceroute, not all in-
termediate hops respond always respond. However, this is
sufficient for our purposes, since our goal is to infer router
adjacencies, rather than study paths. Nearly 1 in every 6
links in our map was discovered using source-routed prob-
ing alone (i.e., direct path probing did not subsequently
re-discover these links).
We did encounter two bugs in source routing. First,
there is at least one router in the Internet that responds to
source routed packets as though the packet was sent di-
rectly to it. That is, this router completely ignores the IP
loose source route option. Second, some earlier versions of
a router vendor’s operating system non-deterministically
fail to decrement the TTL on UDP packets with the loose
source route packet. We do not know the extent of these
bugs. Of these bugs, the latter has the potential for corrupt-
ing our map. We appropriately account for this potential
in our analyses of the map characteristics (Section IV-D).
Private addresses, by definition, are not required to be globally u-
nique, so more than one physical interface can be assigned the same
physical address.
Finally, our experiences with the “social” consequences
of sending traceroutes were similar to those reported else-
where [4]. Many tens of system administrators (largely
from corporate sites containing firewalls which log UDP
packets to non-existent ports) noticed our activity and reg-
istered abuse complaints with our institution.
C. Validating the Map
How did we validate the resulting Internet map? The
previous section discussed simple validation tests for our
map. Could we have used the data from [4] to carefully
compare, on a link-by-link basis, our two maps? Although
theoretically feasible, such a comparison would have re-
quired significant additional programming effort. That da-
ta set was gathered from a different location in the Internet.
Consequently, it discovered different interface addresses
than Mercator; to compare the two data sets, we would
have had to resolve aliases (Section III-E) between the two
sets of interface addresses. We have deferred such valida-
tion for future work.
Instead, we are currently validating subgraphs of our
Internet map. Our strategy is to compare published ISP
maps against ISP subgraphs extracted from our map. To
extract ISP subgraphs, we infer routers belonging to an
ISP from DNS names assigned to router interfaces (not all
router addresses have corresponding names assigned in the
DNS; we failed to complete the inverse mapping on nearly
30% of the discovered interfaces). Using this technique,
and the nam network animator [8], we were able to visu-
alize ISP structures (Figure 4). In some cases, ISPs use
naming conventions that allow us to infer inter-city links,
“core” routers, and customer connections. Using these, we
can sometimes isolate the core portions of ISP networks;
studying the structure of ISP networks is left for future
work.
The extraction of ISPs is tedious enough, but the valida-
tion of the map is made much harder by the fact that most
commercial ISPs do not publish router-level connectivity
(this is regarded as proprietary information). To date, we
have compared our discovered topology against the back-
bone structures of a local ISP and of part of a statewide
research and educational network
4
. Mercator discovered
all routers and all but 1 link in each respective networks.
That these networks are topology close to the host which
ran Mercator is probably irrelevant; our probing is not in-
formed by topological closeness, so we expect that Mer-
cator would show similar fidelity for distant, but similarly
sized ISPs.
These validation results are somewhat encouraging. The
Los Nettos and Calren2 respectively.
9
(a) A small ISP
(b) A medium-sized ISP
(c) A large national backbone
Fig. 4. Mercator can be used to visualize, and analyze, ISP structures.
1
10
100
1000
10000
100000
1 10 100 1000
number of nodes
degree
"Degree Distribution"
(a) The degree distribution
1
10
100
1000
10000
100000
1 10 100 1000
number of nodes
degree
"Degree Distribution, without source-route discovered links"
(b) ... without source-route discovered links
Fig. 5. The degree distribution of the Mercator map.
technique appears to give a fairly complete picture of ISPs
which might be considered to be at the edge of the Inter-
net’s transit portion. It gives us greater confidence in the
hypothesis that Mercator can map the transit portion with
reasonable fidelity. There is one caveat, however. The con-
nectivity structure of both networks are relatively simple;
they both consist of simple rings, with a small number of
“spur” links. It is easy to see that Mercator can map such
sparse ISPs quite well. It is less clear how successful Mer-
cator will be in mapping more meshed ISP structures (e.g.,
those that consist of an ATM core with private virtual cir-
cuits established between routers). We are currently vali-
dating such structures in our map.
D. Discussion and Results
The previous subsections have indicated several reasons
why Mercator’s Internet map may be incomplete, and in
parts maybe even incorrect. As future work, we hope to
run Mercator from different locations around the Internet
to see if we can improve the fidelity of the map. We be-
lieve, however, that the map we have generated still has
some uses. It provides representative realistic topologies
for protocol simulations [1], it can be used in heuristics for
service proximity [10], and can provide visual context for
isolating faults in networks [18].
Can this graph be used for understanding the large-scale
structure of the Internet? On the one hand, we do not have
mathematical bounds for the likely degree of inaccuracy of
Mercator’s map. Furthermore, some graph-theoretic prop-
erties, such as estimates of neighborhood sizes, can be very
sensitive to missing links. This would argue against be-
ing able to draw reliable conclusions about network struc-
ture. On the other hand, we believe that, unless the Internet
infrastructure improves greatly—in the sense that source-
route capability, and other ICMP services, become more
widely available, and routes to router interfaces are wide-
ly propagated—it is logistically difficult to obtain a map
which has a significantly higher accuracy than our own.
This would argue that we might never be able to confident-
ly analyze the statistical properties of the Internet topolo-
gy.
Despite this, and particularly because there is much re-
10
0
2e+09
4e+09
6e+09
8e+09
1e+10
1.2e+10
1.4e+10
1.6e+10
1.8e+10
2e+10
0 5 10 15 20 25 30 35 40
number of nodes
number of hops
"Hop-Pair Distribution"
"Hop-Pair Distribution, without source-route discovered links"
(a) The hop-pair distribution
100000
1e+06
1e+07
1e+08
1e+09
1e+10
1e+11
1 10 100
number of nodes
number of hops
"Hop-Pair Distribution"
"Hop-Pair Distribution, without source-route discovered links"
(b) The hop-pair distribution on a log-log scale
Fig. 6. The hop-pair distribution of the Mercator map.
cent interest in trying to understand network structure [9],
[15], we analyze below the degree distribution, and the
hop-pair distribution [9] of the Mercator map. To under-
stand whether the source-routing bug described in Sec-
tion IV-B, we use two maps in computing these distri-
butions: the complete Mercator map, as well as the map
without any source-route discovered links.
Figure 5(a) plots the degree distribution on a log-log s-
cale. The plot is interesting. For node degrees less than
30, the plot is linear, lending some support to the conjec-
ture that a power law governs the degree distribution of
real networks [9]. However, starting from about a degree
of 30, the distribution is significantly more diffuse. This
might indicate that a different law governs the distribution
of high degree nodes—such nodes are usually found closer
to the core of the network.
Figure 6(a) plots the number of pairs of nodes within at
most hops of each other as a function of . This hop-
pair distribution also has an intriguing shape. For small (less than about 5), the number of pairs of nodes within hops of each other increases relatively slowly. This could
be an artifact of our methodology. More likely, we be-
lieve, is the explanation that beyond small , the number
of nodes reachable within hops increases dramatically
because nodes within a stub domain can reach other stub
domains. Another interesting feature of this curve is that
97% of the hop-pairs are at a of 15 hops or less. Final-
ly, Figure 6(b) plots the hop-pair distribution on a log-log
scale. Can this distribution be described by a power law,
especially for less than 15? Quite possibly, although the
slight non-linearity at lower leads us to believe that that
question still remains open, even ignoring methodological
inaccuracies.
Figure 5 and Figure 6 also show that source-route dis-
covered links do not skew the qualitative conclusions we
might draw from these distributions. That is, our conclu-
sions about the two distributions hold, but with possibly
different numerical values, for the map without the source-
route discovered links.
V. CONCLUSIONS AND FUTURE WORK
This paper is, to our knowledge, the most extensive at-
tempt to date to map the Internet. It documents several
techniques to infer the Internet map, and reports on our
experiences in designing and experimenting with different
heuristics for increasing the fidelity of the map. Clearly,
our results have been mixed. We have been able to explore
more of the Internet than previous efforts, but have, in the
process, revealed several reasons why it is exceedingly d-
ifficult to obtain a highly accurate map in the existing In-
ternet infrastructure.
We intend to explore several directions in the future;
running Mercator from more than one location, evaluating
heuristics for inferring topological elements such as shared
media, and validating sections of the map with the help of
ISPs in order to try to bound the degree of inaccuracy of
our map.
ACKNOWLEDGEMENTS
Deborah Estrin and Scott Shenker provided the moti-
vation for this work, and gave valuable advice. Cengiz
Alaettinoglu suggested significant improvement to earlier
versions of this draft.
REFERENCES
[1] Sandeep Bajaj, Lee Breslau, Deborah Estrin, Kevin Fall, Sally
Floyd, Padma Haldar, Mark Handley, Ahmed Helmy, John Hei-
demann, Polly Huang, Satish Kumar, Steven McCanne, Reza
Rejaie, Puneet Sharma, Kannan Varadhan, Ya Xu, Haobo Yu,
and Daniel Zappala. Improving simulation for network re-
search. Technical Report 99-702, University of Southern Cali-
fornia, March 1999.
[2] R. Braden. Requirements for Internet Hosts — Communication
11
Layers. Request for Comments 1122, Internic Directory Services,
October 1989.
[3] H.-W. Braun and K. C. Claffy. Global ISP Interconnectivity by
AS number. http://moat.nlanr.net/AS/.
[4] Hal Burch and Bill Cheswick. Mapping the Internet. IEEE Com-
puter, 32(4):97–98, April 1999.
[5] J. D. Case, M. Fedor, M. Schoffstall, and C. Davin. Simple Net-
work Management Protocol (SNMP). Request for Comments
1157, Internic Directory Services, May 1990.
[6] K. C. Claffy and D. McRobb. Measurement and Vi-
sualization of Internet Connectivity and Performance.
http://www.caida.org/Tools/Skitter/.
[7] Dartmouth University. Intermapper: An intranet map-
ping and snmp monitoring program for the macintosh.
http://www.dartmouth.edu/netsoftware/intermapper.
[8] Deborah Estrin, Mark Handley, John Heidemann, Steven Mc-
Canne, Ya Xu, and Haobo Yu. Network visualization with the
VINT network animator nam. Technical Report 99-703, Univer-
sity of Southern California, March 1999.
[9] C. Faloutsos, M. Faloutsos, and P. Faloutsos. What does Internet
look like? Empirical Laws of the Internet Topology. To appear,
ACM SIGCOMM 1999.
[10] P. Francis, S. Jamin, V . Paxson, L. Zhang, D. Gryniewicz, and
Y . Jin. An Architecture for a Global Internet Host Distance Esti-
mation Service. In Proceedings of IEEE Infocom, March 1999.
[11] V . Fuller, T. Li, J. Yu, and K. Varadhan. Classless Inter-Domain
Routing (CIDR): An Address Assignment and Aggregation Strat-
egy. Request for Comments 1519, Internic Directory Services,
September 1993.
[12] R. Govindan and A. Reddy. An Analysis of Internet Inter-Domain
Topology and Route Stability. In Proc. IEEE INFOCOM ’97,
Kobe, Japan, Apr 1997.
[13] J.-J. Pansiot and D. Grad. On routes and multicast trees in the
Internet. ACM SIGCOMM Computer Communication Review,
28(1):41–50, January 1998.
[14] V . Paxson. End-to-end Routing Behavior in the Internet. In Pro-
ceedings of the ACM SIGCOMM Symposium on Communication
Architectures and Protocols, San Francisco, CA, September 1996.
[15] G. Phillips, H. Tangmunarunkit, and S. Shenker. Scaling of Mul-
ticast Trees: Comments on the Chuang-Sirbu scaling law. To
appear, ACM SIGCOMM 1999.
[16] J. Postel. Internet Protocol. Request for Comments 791, Internic
Directory Services, September 1981.
[17] R. Siamwalla and R. Sharma and S. Keshav. Discovering internet
topology. Unpublished manuscript.
[18] A. Reddy, D. Estrin, and R. Govindan. Large-Scale Fault Isola-
tion. Technical Report 99-706, Computer Science Department,
University of Southern California, March 1999. Submitted for
publication.
[19] Y . Rekhter and T. Li. A Border Gateway Protocol 4 (BGP-4).
Request for Comments 1771, Internic Directory Services, March
1995.
[20] C. A. Waldspurger and W. E. Weihl. Lottery Scheduling: Flexible
Proportional-Share Resource Management. In First Symposium
on Operating Systems Design and Implementation (OSDI), pages
1–11. USENIX Association, 1995.
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 704 (1999)
PDF
USC Computer Science Technical Reports, no. 746 (2001)
PDF
USC Computer Science Technical Reports, no. 782 (2003)
PDF
USC Computer Science Technical Reports, no. 692 (1999)
PDF
USC Computer Science Technical Reports, no. 731 (2000)
PDF
USC Computer Science Technical Reports, no. 706 (1999)
PDF
USC Computer Science Technical Reports, no. 760 (2002)
PDF
USC Computer Science Technical Reports, no. 718 (1999)
PDF
USC Computer Science Technical Reports, no. 822 (2004)
PDF
USC Computer Science Technical Reports, no. 642 (1996)
PDF
USC Computer Science Technical Reports, no. 669 (1998)
PDF
USC Computer Science Technical Reports, no. 709 (1999)
PDF
USC Computer Science Technical Reports, no. 682 (1998)
PDF
USC Computer Science Technical Reports, no. 945 (2014)
PDF
USC Computer Science Technical Reports, no. 609 (1995)
PDF
USC Computer Science Technical Reports, no. 957 (2015)
PDF
USC Computer Science Technical Reports, no. 603 (1995)
PDF
USC Computer Science Technical Reports, no. 723 (2000)
PDF
USC Computer Science Technical Reports, no. 745 (2001)
PDF
USC Computer Science Technical Reports, no. 935 (2013)
Description
R. Govindan and H. Tangmunarunkit. "Heuristics for Internet map discovery." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 717 (1999).
Asset Metadata
Creator
Govindan, R.
(author),
Tangmunarunkit, H.
(author)
Core Title
USC Computer Science Technical Reports, no. 717 (1999)
Alternative Title
Heuristics for Internet map discovery (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
11 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269708
Identifier
99-717 Heuristics for Internet Map Discovery (filename)
Legacy Identifier
usc-cstr-99-717
Format
11 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/