Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 697 (1999)
(USC DC Other)
USC Computer Science Technical Reports, no. 697 (1999)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Exploiting the Bandwidth-Memory Tradeoff in Multicast State
Aggregation
Pavlin Ivanov Radoslavov
, Deborah Estrin, Ramesh Govindan
University of Southern California
Department of Computer Science
Information Sciences Institute
4676 Admiralty Way, Suite 1001
Marina Del Rey, CA 90292-6695, USA
Phone: +1-310-822-1511
Fax: +1-310-823-6714
Email: pavlin,estrin,govindan @isi.edu
July 1, 1999
USC Dept. of CS Technical Report 99-697 (Second Revision)
Abstract
Multicast data distribution is achieved through three mechanisms: rendezvous of senders with receivers,
distribution tree building protocol, and multicast forwarding. While the first two mechanisms have been ex-
tensively studied, scaling multicast forwarding state has not been addressed. The lack of efficient strategies
for multicast forwarding state aggregation could seriously inhibit the deployment of inter-domain multicast,
because there is no ”natural” limit to the number of concurrent multicast groups, especially given the possible
proliferation of low-bandwidth multicast groups (e.g. events notification).
We analyze two classes of aggregation strategies: non-leaky and leaky. A non-leaky strategy preserves the
semantics of multicast joins, distributing data only in the direction of receivers. A leaky strategy, on the other
hand, carefully trades off gratuitous data distribution for reduced forwarding state. Our simulations show that
the leaky strategy ensures that under expected operating conditions forwarding state scales as the number of
”high-bandwidth” groups. This is desirable scaling behavior, given that network capacity limits the number of
such groups.
Keywords: Multicast, Multicast Forwarding, State Aggregation
Supported by a grant from Sun Microsystems
1
1 Introduction
Today, an increasingly wide variety of applications are
based on IP multicast [8]: audio and video transmis-
sions [4] data replication [15], Web caching [42], and
collaborative workspaces [14]. These applications owe
their success to a key property of today’s multicast in-
frastructure: mechanisms for scalable distribution of
data flows to multiple receivers that avoid replicating
data traffic on shared links of the distribution paths
(see Figure 1). The design of these applications is
greatly simplified by a key feature of the IP multi-
cast service model: the level of indirection provided
by logical group naming [25].
Scalable and efficient distribution of multicast data
is achieved by multicast routing protocols. These pro-
tocols usually construct a multicast distribution tree.
They contain mechanisms that enable receivers in a
given multicast group to rendezvous with senders, and
then establish the corresponding forwarding state in
routers that achieves efficient distribution. Since Steve
Deering’s original work on multicast datagram service [8],
various distribution tree building and rendezvous mech-
anisms have been proposed and deployed [33, 27, 9,
12, 1, 22]. Some of these mechanisms have limited
scalability (within a single domain), while other have
been designed to scale to the entire Internet.
S2
R2
R1
R5 R4
R3
S1
Rx_1 Rx_2 Rx_3 Rx_4
Sender for
group 2
Receivers for group 1 and 2 Receivers for group 1
group 1
Sender for
Figure 1: Example of multicast data distribution
Scalable routing mechanisms alone may not ensure
the scalability of the Internet multicast architecture.
Specifically, we believe there is an architectural ar-
gument for considering multicast forwarding state ag-
gregation. This argument proceeds as follows. The
logical naming feature of IP multicast necessitates per-
group forwarding entries in routers. Each such entry
implies a “cost” to the overall architecture. This cost
is affected by the traffic bandwidth associated with
the group. Creation of forwarding entries for high
bandwidth groups (e.g., for example video and audio
sessions) is justified by the bandwidth savings of us-
ing multicast. To justify supporting relatively low-
bandwidth groups (e.g., whiteboard-type applications [14]
and event notification services [3]), however, we need
mechanisms that amortize the cost of maintaining for-
warding entries over many low-bandwidth groups. One
way to do this is to aggregate multicast forwarding en-
tries for low-bandwidth groups. To our knowledge,
this problem has not been widely considered in the lit-
erature.
Even apart from this architectural argument, there
is a practical reason for studying multicast forward-
ing state aggregation. Unicast forwarding state ag-
gregation is well-understood, but, because multicast
receivers are not—in general—topologically related,
it is unclear whether these unicast techniques apply.
Moreover, the multicast problem is potentially much
greater compared to unicast, because the possible num-
ber of multicast groups grows combinatorially with the
number of network nodes. For example, IPv4 [31] al-
lows up to
multicast groups. Network capacity
permitting, it is not inconceivable that some significant
fraction (say 50%) of these groups are simultaneously
in use. In this case, some routers may need more than
0.5 GB memory to store the forwarding entries. Un-
der similar assumptions, an IPv6 [7] router might need
several Terabytes for the forwarding entries!
In this paper, we discuss algorithms for multicast
forwarding state aggregation. These algorithms target
the scaling of forwarding state to the number of high-
bandwidth groups. They do this by aggregating, where
possible, low-bandwidth groups. This aggregation al-
lows for leaks
1
—traffic for a group may follow paths
that do not lead to any receiver for that group. The ag-
gregation strategy limits the bandwidth allocated for
1
To our knowledge, Van Jacobson was the first to suggest this
tradeoff [19] for multicast forwarding.
2
leaks to a fixed fraction of link capacity at each router.
Section 2 describes IP multicast and presents the
rationale for multicast forwarding state aggregation in
greater detail. Section 3 represents the core of the pa-
per. It describes several aggregation strategies, and
works out the details of the leaky aggregation strategy
that forms the focus of this paper. Section 4 shows
that, for a wide variety of traffic mixes and receiver
distributions, the leaky aggregation strategy manages
to closely track the number of high bandwidth groups,
even with dynamic data traffic and a large number of
joins/leaves. Section 5 discusses the interplay between
leaky forwarding state aggregation and existing multi-
cast routing protocols.
2 Motivation
In the previous section, we argued that multicast for-
warding state aggregation is necessary because for-
warding state incurs a bandwidth-dependent architec-
tural cost. Before we develop this argument further,
we briefly describe multicast forwarding and forward-
ing state aggregation, then survey work related to for-
warding state aggregation.
2.1 Multicast Routing and Forwarding
Multicast routing protocols achieve efficient data dis-
tribution from senders to receivers. They do this by
setting up distribution trees that span all group mem-
bers. The subject of scalable tree construction proto-
cols has received much attention in the literature [33,
27, 9, 12, 1, 22]. A common feature of these protocols
is that they all build multicast distribution trees rooted
at some particular node.
The outcome of every tree construction protocol is
a collection of forwarding entries at each router in
the network. Together, these forwarding entries effect
data distribution from sources to receivers. At a given
router, the forwarding entry determines which neigh-
boring routers receive copies of an incoming multicast
packet. For example, Router R1 on Figure 1 has for-
warding entries for group 1 and group 2. The forward-
ing entry for group 1 specifies that links R1-R2 and
R1-R3 belong to the distribution tree of group 1. The
forwarding entry for group 2 specifies that link R1-R2
belongs to the distribution tree of group 2.
In its most general form, a multicast forwarding en-
try contains four pieces of information: a source ad-
dress prefix (an initial bit-substring of an address—
IPv4 prefixes are usually represented by, for exam-
ple, by an IP address followed by the length of the
initial bit substring 123.4.56.0/24), a group ad-
dress prefix, an incoming interface set, and a set of
outgoing interfaces. For a given forwarding entry, its
address prefixes represent the range of sources and
groups whose distribution tree is represented by that
entry. The incoming interface represents the router’s
“parent” in that distribution tree, and the outgoing in-
terfaces represent the router’s children. A forwarding
entry matches a multicast data packet if its source ad-
dress prefix is the longest initial bit-substring (among
all other forwarding entries) that matches the packet’s
source address, its group address prefix is the longest
initial bit-substring that matches the packet’s group
address, and the interface on which the packet was re-
ceived is included in its incoming interface set. A copy
of every such a packet is forwarded on the outgoing in-
terface set (Figure 2).
1. Destination lookup
2. If packet received on iif,
forward on all (oifs - iif)
G=224.0.1.2/32
iif
S=0.0.0.0/0
oifs
Figure 2: Forwarding a multicast packet for source =
“any” and group = 224.0.1.2
Existing multicast routing protocols differ slightly
in the kinds of forwarding entries they generate:
Group specific vs source-group specific: In group-specific
entries, the source address prefix has zero length.
That is, any source address matches that prefix.
Usually, these entries are created by protocols
that build distribution trees rooted at a group-
specific Rendezvous Point [12, 1] or Root Do-
main [22]. Source-group specific entries con-
tain non-zero source and group address prefix
lengths. This kind of forwarding entry is usu-
ally created for distribution trees rooted at the
source [33, 27, 9, 12, 17, 30], but [1] is a no-
3
table exception.
Uni-directional vs bi-directional A given multicast rout-
ing protocol may define forwarding entries to
be either uni-directional or bi-directional. Uni-
directional forwarding entries result in multicast
traffic that flows only “down” the distribution
tree (from the root of the tree towards the re-
ceivers) [33, 27, 9, 12, 17]. Bi-directional for-
warding entries result in distribution trees that
allow traffic from a sender to reach its nearest
router on the tree, then traverse the tree simulta-
neously towards the root and towards receivers [1,
22, 30]. To achieve bidirectional forwarding, an
entry’s incoming interface set trivially includes
all interfaces of a router.
These semantic differences between forwarding en-
tries is useful for understanding the design of forward-
ing state aggregation mechanisms.
2.2 Forwarding State Aggregation
But what do we mean by forwarding state aggrega-
tion? Consider Figure 3. This shows two forwarding
entries with matching incoming interface and outgo-
ing interfaces. The group address prefixes of these en-
tries are adjacent—i.e., there exists a single address
prefix that includes both those group address prefixes.
In such a situation, these two multicast forwarding en-
tries can be aggregated (represented by a single en-
try) as shown. This is not the only way to aggregate
multicast forwarding entries; Section 3 describes other
approaches.
iif_0
+=
iif_0
224.0.1.0/31 224.0.1.0/32
iif_0
224.0.1.1/32
oif_1 oif_2 oif_1 oif_2 oif_1 oif_2
Figure 3: Example of aggregating multicast forward-
ing entries
Notice that the example of Figure 3 omits the source
prefix. In this paper, we focus on aggregating group-
specific multicast forwarding entries. Recall that such
entries match any source address. Group-specific en-
tries are created by multicast routing protocols that
build shared (rather than per-source) distribution trees.
It is generally acknowledged that scalable inter-domain
multicast routing can only be achieved using shared
trees [22].
The following subsections consider these questions:
Why is it important to solve the problem of aggregat-
ing group-specific forwarding entries? Why is it hard?
Is the problem solvable?
2.3 Why is aggregation important?
One way to argue for the importance of multicast for-
warding state aggregation is to observe the following:
there is no “natural” limit to the number of concur-
rently active multicast sessions in an internet. The ca-
pacity of the network bounds, in some loose sense, the
number of concurrent audio or video sessions. How-
ever, the same cannot be said of all multicast applica-
tions. One example is an event notification service [3]:
event generators intermittently multicast events to in-
terested clients. Another example is congestion-adaptive
multicast applications (e.g., file transfer, multicast dis-
tribution of software updates) that make use of what-
ever bandwidth is available to them. This problem of
unbounded multicast forwarding state growth is par-
ticularly crucial in core or backbone routers (Figure 4).
RootDomain
RootDomain
RootDomain
Receiver
Receiver
Receiver
R1
R2
R4
R3
R5
(Group 1)
(Group 3)
(Group 2)
(Group 1)
(Group 3)
(Group 2)
Figure 4: The router at the center (R3) has the largest
number of forwarding entries (three)
In the absence of a natural limit to the number of
concurrently active multicast groups, the size of the
4
multicast groups is limited only by the available ad-
dress space:
for IPv4 and
for IPv6. Then, if
a router has 32 interfaces, the maximum memory re-
quired to store multicast forwarding entries (assuming
a 50% address space utilization) is 512 MB for IPv4
and
Terabytes of memory for IPv6! It is con-
ceivable that, within 3-4 years, high-end routers will
have enough high-speed forwarding table memory to
satisfy the needs of IPv4. However, forwarding state
aggregation may be imperative for scaling IPv6.
Apart from this pragmatic argument based on tech-
nological trends, an architectural argument can be made
for multicast forwarding state aggregation. This argu-
ment is based on the utility of multicast forwarding
entries. Informally, the utility of a multicast forward-
ing entry to the infrastructure increases in proportion
to the corresponding group’s bandwidth. Thus, bursty
low-bandwidth applications like event notification [3],
or non-bursty low-bandwidth applications (like peri-
odic Web document invalidations [35, 40]) generate
the least utility; their infrequently used forwarding en-
tries occupy expensive router memory. It is unclear
whether such low bandwidth groups will predominate
in the future. However, we believe that the architec-
ture of an internet should not artificially constrain the
numbers of such groups. For this reason, we argue
the need for a forwarding table structure that provides
higher overall utility with respect to forwarding table
occupancy.
2.4 Why is it hard?
Aggregation of unicast forwarding entries is well un-
derstood [34]. This aggregation is achieved by careful
unicast address assignment. Topologically contiguous
networks are assigned adjacent address prefixes. In
this way, an entire autonomous system (AS) can some-
times be represented in the forwarding table by a sin-
gle forwarding entry in a backbone router.
These techniques do not apply to multicast forward-
ing. Today, multicast group addresses are largely as-
signed in topologically independent way. One pro-
posal for inter-domain multicast address assignment [22]
proposes to dynamically assign blocks of group ad-
dresses to individual ASs. Even if they were assigned
thus, the receiver distributions for adjacent groups could
differ. That is, there is little likelihood of there being
many situations that correspond to Figure 3. (Some
have argued that, when addresses are assigned thus,
there is some chance of receivership congruence. For
example, the population interested in any group initi-
ated within AS is likely to be the same. This is an
interesting, but yet unverified, hypothesis).
2.5 Is there hope?
Before we answer this question, we describe a desir-
able goal for multicast forwarding state aggregation.
This goal is motivated by an observation in Section 2.3,
that the utility of a multicast forwarding entry is pro-
portional to the bandwidth of the corresponding group(s).
A desirable goal for forwarding state aggregation, then,
is to scale the worst-case forwarding table size ap-
proximately proportional to the number of high-bandwidth
groups. This goal is desirable because the number of
high bandwidth groups is naturally limited by avail-
able network capacity.
Our statement of the scaling goal is deliberately im-
precise. A more precise statement is not essential to
our argument. Furthermore, it is hard to quantify ex-
actly high bandwidth means. Intuitively, by this we
mean audio and video applications that have fixed band-
width requirements, or are rate-adaptive within some
relatively small bandwidth range. By contrast, we use
the term low bandwidth to refer to applications such
as event notification [3] or multicast Web document
invalidation [35, 40]. We also do not attempt to quan-
tify our scaling goal. Rather than trying to achieve a
fixed aggregation target (e.g., a table size that is no
more than 5% of the number of groups), we ask what
aggregation is achievable with bounded resource uti-
lization.
We would like to achieve this goal, as we’ve said
before, for aggregating group-specific entries. Fur-
thermore, our approach leverages the topological as-
signment of group addresses; the root of the shared
tree for a group is, with high likelihood, topologically
close to that of a group with an adjacent address. Sev-
eral recent proposals for an inter-domain multicast ar-
chitecture share this property [22, 30, 17]. For con-
sistency, we use the terminology introduced in [22]:
all shared trees are rooted at a Root Domain (RD) and
multicast group prefixes are assigned to individual do-
mains. We believe that, with these assumptions, there
5
exists a plausible hypothesis that the scaling goal de-
scribed above can be achieved.
That hypothesis proceeds from a three step argu-
ment.
With our assumptions listed in the previous para-
graph, we can expect that forwarding entries for
adjacent groups share the same incoming inter-
face.
Second, especially in core routers (Figure 4), we
can expect that these forwarding entries’ out-
going interface sets overlap significantly. This
observation is based on the bounded fanout of
routers vis-a-vis the receiver distribution. For
example, if a group has more than a hundred re-
ceivers, a forwarding entry for that group in a
core router can be expected to have, in its out-
going interface set, many of that router’s inter-
faces.
Finally, we can aggregate adjacent low-bandwidth
groups by “leaking” one group’s traffic down
some of the other group’s outgoing interfaces
(Figure 5(d)).
To complete our argument, we note that recent tech-
nological advances, particularly the deployment of dense
Wavelength Division Multiplexing [21] technologies,
can result in dramatically cheaper bandwidth. This
trend will increase the disparity between switching and
bandwidth costs, especially for backbone or core routers.
For this reason, we believe that a “bandwidth-state”
tradeoff will be more justifiable in the future than it
has been until today.
Our hypothesis, then, is that such leaky aggregation
can achieve our scaling goal. Section 3 describes this
strategy and Section 4 evaluates leaky aggregation.
2.6 Related Work
To our knowledge, leaky multicast forwarding entry
aggregation has not been studied in the literature. How-
ever, several other forwarding table compaction ap-
proaches have either been implemented or considered
in the literature. We discuss these now.
One alternative to aggregation is to only cache for-
warding entries of currently active multicast groups.
By active, we mean those groups for which a router
received data in the recent past. The cache entries are
populated from a routing table held in slower, less ex-
pensive memory. However, this approach only works
when there is sufficient traffic locality to maintain high
cache hit ratio. As the number of concurrently active
groups increases, the degree of traffic interleaving will
grow, requiring a large cache to achieve good hit ra-
tios. Conversely, because each cache miss results in
slow path router forwarding, router performance can
degrade with an increase in the number of groups.
A number of papers ([10] [41] [28] [23] [37]) de-
scribe software based solutions that use carefully cho-
sen data structures to reduce the size of the unicast
forwarding table such that it fits in on-chip caches.
Accesses to such cache are fast, allowing very high-
speed unicast packet forwarding even without expen-
sive lookup hardware. These solutions are attractive
because of their flexibility, but unfortunately they are
limited by the capacity of the CPU cache. The num-
ber of unicast forwarding entries that can be stored in
the CPU cache is of the order of 100,000. This is an
order of magnitude or more smaller than the possible
number of multicast forwarding entries.
A hybrid hardware/software approach is described
in [16]. Using some simple additional hardware and a
reasonable amount of cheap memory (33 MB of DRAM
for IPv4), the lookup can be completed by using only
1-2 memory accesses (50-100 ns, i.e. 10-20 Gbps lookup
bandwidth) to slower main memory. This type of solu-
tion can be used to store a much larger number of en-
tries (several million entries), but may not be flexible
enough to accommodate the expected size of multicast
forwarding tables.
Paper [39] is the first and only work we are aware
of that addresses directly the problem of reducing the
number of multicast forwarding states/entries. The ba-
sic idea is to dynamically establish tunnels that by-
pass the routers with fanout of only one outgoing inter-
face for a given group. To accomplish this, however,
the multicast routing protocol needs to be modified,
and there is the additional encapsulation/decapsulation
overhead per data packet. Also, this solution is not
beneficial if the fanout of the groups is more than one.
This solution can be applied together with the leaky
aggregation scheme proposed in this paper.
6
3 Multicast Forwarding Entry Aggre-
gation
Before we describe strategies for forwarding entry ag-
gregation, we list the requirements that constrain the
design space:
Aggregation must conserve joins. Thus receivers
who have joined a group must continue to re-
ceive traffic even if the corresponding forward-
ing entry has been aggregated.
The aggregation strategy must be largely inde-
pendent of the multicast routing protocol. In
particular, it should not require additional rout-
ing protocol modification or control traffic to
work. If a particular multicast routing protocol
has to be modified, the modifications must be
very simple, and the additional control traffic, if
any, must not result in another scalability issue.
Routers that aggregate forwarding entries must
be able to interoperate with those that do not.
Furthermore, the aggregation strategy must not
require a particular deployment order (e.g., core
routers before leaf routers or vice versa).
In Section 2.5, we hypothesized that leaky aggrega-
tion can help us scale the size of the multicast forward-
ing table approximately proportional to the number of
high-bandwidth groups. In this section, we present a
leaky aggregation strategy. First, however, we present
some non-leaky aggregation strategies. These strate-
gies do not tradeoff bandwidth for reduced table size,
and form a baseline for understanding the performance
of leaky aggregation.
3.1 Non-Leaky Aggregation
The simplest non-leaky aggregation strategy is what
we call strict aggregation (Figure 3 and Figure 5(b)).
In this strategy, two adjacent group-specific forward-
ing entries are aggregated if and only if:
Their incoming interfaces match.
Their outgoing interface sets match.
Even if the root domains of adjacent groups are topo-
logically aligned, we expect little likelihood of there
being many group prefixes that satisfy these condi-
tions.
A variant of strict aggregation is what we call pseudo-
strict aggregation (Figure 5(c)). In this form of aggre-
gation, two group prefixes are replaced by the longest
covering prefix if:
Their incoming interfaces match.
Their outgoing interface sets match.
There is no intervening forwarding table entry
whose group prefix matches the longest cover-
ing prefix.
Thus, in Figure 5(c), we can aggregate the entries for
224.0.1.0/31 and224.0.1.3/32 even though
these are not adjacent prefixes (the prefix224.0.1.2/32
lies between them). This is because that router has not
received a join for the group 224.0.1.2/32. Note
that this strategy does not “leak” traffic destined for
224.0.1.2/32! If the router did not get a join for
224.0.1.2/32, it could not have propagated a join
upstream, and should not receive any traffic for that
group
2
. Intuitively, we expect pseudo-strict aggrega-
tion to better compress, in many cases, the routing ta-
ble than strict aggregation.
3.2 Leaky Forwarding Entry Aggregation
The basic idea behind leaky forwarding state aggre-
gation is simple. Leaky aggregation relaxes the re-
quirement in pseudo-strict aggregation that the outgo-
ing interface sets of the entries must match. A data
packet that matches the resulting forwarding entry will
be forwarded on all interfaces on which joins have
been received, but it may be forwarded on some other
interfaces as well (i.e., those for which no Join mes-
sage was received). Figure 5(d) shows an example of
prefix-based leaky aggregation. Group224.0.1.7/32
does not includeoif 2,butoif 2 is included in the
aggregated entry224.0.1.0/30.
2
On shared LAN, however, a join message sent by one router
to its upstream neighbor can cause traffic to “leak” through other
downstream routers.
7
224.0.1.0/30 224.0.1.7/32
oif_1 oif_2 oif_1
(c) Pseudo-strict aggregation
224.0.1.0/29
oif_1 oif_2
(d) Leaky aggregation
224.0.1.7/32 224.0.1.3/32 224.0.1.0/31
oif_1 oif_2 oif_1 oif_2 oif_1
(b) Strict aggregation
224.0.1.2/32)
(no entry for
(no entries)
. . . . . . . . . . . .
224.0.1.0/32 224.0.1.1/32 224.0.1.3/32 224.0.1.7/32
oif_1 oif_2 oif_1 oif_2 oif_1 oif_2 oif_1
(a) Forwarding entries before aggregation
Figure 5: Example of prefix-based strict, pseudo-strict and leaky aggregation
Obviously, leaky aggregation wastes some bandwidth.
For this reason, such aggregation must be performed
carefully, trading off as little increased bandwidth as
possible for maximal reduction in forwarding table size.
To do this, we first identify low-bandwidth groups,
and only attempt to aggregate such entries. Such con-
trolled leaking is not entirely free of problems. Sec-
tion 5 describes some of the limitations of this ap-
proach.
The design of a leaky aggregation strategy poses
several challenges:
To carefully tradeoff bandwidth and forwarding
table size, we need to estimate the current band-
width of a multicast group. Although techniques
are known for estimating the rate of unicast traf-
fic flows [13], such techniques react at conges-
tion time scales. For leaky aggregation, it proba-
bly suffices to detect application-level phase changes
(e.g., a break in a video lecture).
In our simplified description above, we have sug-
gested only aggregating “low-bandwidth” groups
and not aggregating “high-bandwidth” groups.
In practice, our algorithm cannot assume the ex-
istence of a bi-modal bandwidth distribution, nor
can it assume that there is any a priori band-
width threshold for groups.
As a corollary to the previous point, our strategy
must limit the bandwidth attributable to leaks to
some relatively small fraction of link capacities
(i.e., router administrators should be able to lo-
cally limit the bandwidth attributable to leaks,
to, say 5% of link capacity). This is a key design
challenge: achieving maximal reduction in table
size while still limiting the amount of wasted
bandwidth. In what follows, we use the term
leak budget to refer to this limit.
Finally, the scheme must take into account re-
ceiver joins and leaves. These may cause the
outgoing interface set of forwarding entries to
change.
Our leaky aggregation scheme involves two related
components. First, a simple technique for estimating
the rate of individual groups. Second, a heuristic that,
given the rates of individual groups, and the link ca-
pacities, computes aggregated forwarding state in a
manner that does not violate the leak budget. We now
describe these two components.
3.3 Estimating the bandwidth of individual
groups
Our basic approach to estimating the bandwidth of an
individual group is to count the number of data packets
that are forwarded using that group’s forwarding entry
over some time interval . This coarse estimation suf-
fices for us, and does not require a change to router for-
warding engines—some major router vendors already
support such counters. In this paper, we do not sug-
gest a value for : further experimentation is needed
to determine this value.
Clearly, we cannot estimate the bandwidth of all
groups simultaneously. To do this, the router would
need to install group-specific (unaggregated) forward-
ing table entries for every group for which it has re-
ceived joins. This defeats the purposes of aggrega-
tion. There is an alternative solution, however. Re-
call that we have assumed topological assignment of
group addresses. In BGMP for example, a block of
group addresses is assigned to each root domain. Our
8
solution, then, is to only concurrently estimate rates
for (i.e., install group-specific forwarding entries for)
groups rooted at the same root domain. In this way,
we “stagger” the bandwidth estimation across root do-
mains, ensuring that the instantaneous forwarding ta-
ble is never proportional to the total number of groups
for which we have received joins. This approach, how-
ever, exhibits poor responsiveness to traffic pattern changes.
This may not be a significant drawback, since our goal
is only to detect application-level phase changes, and
not to try and respond at congestion time scales.
In order for our aggregation scheme to work, we
cannot only estimate the bandwidth of individual groups.
We also need to estimate the incoming traffic for groups
that neighboring routers are leaking. This estimation
figures into each router’s aggregation strategy (Sec-
tion 3.4). If a router’s leak budget permits, it can choose
to further propagate these incoming leaks. Otherwise,
it can install forwarding entries with an empty outgo-
ing interface set to prevent these leaks. To estimate the
incoming leaks, we install prefixes corresponding to
“holes” in the prefix associated with the root domain.
For example in Figure 6, the router would need to esti-
mate the bandwidth not just on224.0.1.0/32 and
224.0.1.7/32, but also the leaks on the interven-
ing four prefixes. For a given root domain, the num-
ber of prefixes needed to fill the holes is, in the worst
case, proportional to where is the number
of groups rooted at that domain and is the size of
the prefix associated with the root domain. We believe
this is still within acceptable bounds for the transient
increase in forwarding table size.
224.0.1.1/32 224.0.1.2/31 224.0.1.4/31 224.0.1.6/32 224.0.1.0/32 224.0.1.7/32
oif_1 oif_2 oif_1
(no entries)
. . . . . . . . . . . .
224.0.1.7/32 224.0.1.0/32
oif_1 oif_2 oif_1
(a) Forwarding entries before aggregation
224.0.1.0/29
oif_1 oif_2
(b) Forwarding entries after leaky aggregation
(c) Entries installed to measure bandwidth
Figure 6: De-aggregation for bandwidth estimation
In summary, a router periodically de-aggregates all
prefixes allocated to a single root domain, and installs
the group-specific forwarding table entries. After time
, it reads the counters associated with each forward-
ing table entry and uses these as input to the algorithm
described in Section 3.4. After computing the aggre-
gates, it installs the corresponding forwarding entries,
and deletes the group-specific entries. Because only
one or few prefixes are de-aggregated at a time, and
those prefixes would cover a fraction of all joined groups,
the remaining entries (a larger fraction of all groups)
can be aggregated, and therefore the aggregability of
those entries will contribute to the overall savings. A
pseudo-code description of the algorithm is presented
in Appendix B.1. This algorithm need not be imple-
mented in the forwarding fast path and should not af-
fect router forwarding performance.
3.4 Aggregation Heuristic
Given the coarse grain bandwidth estimates for each
forwarding entry (Figure 7(a)), our problem is to com-
pute the smallest possible forwarding table size that
fits within the leak budget of every link. This section
describes how we compute the aggregated forwarding
table.
First observe that, given bandwidth estimates for
two adjacent groups, we can easily compute the leaks
resulting from their aggregation. For example, con-
sider the groups224.0.1.0/32 and224.0.1.1/32
in Figure 7(a). Knowing the rates of these two groups,
we can easily see that the resulting aggregate224.0.1.0/31
will have a bandwidth of 1005 bps. Furthermore, that
aggregate will leak 1000 bps on oif 2 and 5 bps on
oif 1. Having computed this, we can easily deter-
mine whether 224.0.1.0/31 fits within the leak
budget on interfaces oif 1 and oif 2. More gen-
erally, given forwarding entries and an aggregate that covers those entries, we can determine whether fits within the leak budget of all of its outgoing inter-
faces.
This basic building block suggests the following ide-
alized algorithm for computing aggregates:
1. Given forwarding entries, compute all possi-
ble aggregates of these entries. There are at
most of these (the internal nodes of the
9
radix trie [5] built on top of the forwarding
entries).
2. Take these forwarding entries, and the aggregates. Compute all possible subsets of
these entries. Among those subsets that
a) cover the forwarding entries and b) that fit
within the leak budgets of all interfaces, choose
the subset with the smallest cardinality. Step b)
is computed using the technique outlined in the
previous paragraph.
Clearly, this algorithm will compute the optimal ag-
gregation. However, considering all possible subsets
of forwarding entries might be compute-intensive
simply because the number of subsets is . So,
we choose instead a greedy heuristic that tradesoff op-
timality for computation time. We now briefly de-
scribe this heuristic: detailed pseudo-code for the heuris-
tic can be found in Appendix B.2.
1. Given the forwarding entries for which we
have estimated the rate, construct the correspond-
ing radix trie. (Recall that these entries in-
clude not just groups rooted at a given root do-
main, but also the corresponding “holes”, Sec-
tion 3.3). The leaves of this radix trie corre-
spond to the forwarding entries. Mark each
leaf of the radix trie. Intuitively, a marked en-
try corresponds to one that will be inserted in
the forwarding table. Subsequent steps will at-
tempt to reduce the number of marked entries.
Figure 7(b) shows the result of this step.
2. Starting from the lowest internal nodes, for each
internal node
(a) Mark the node, and remove the mark on
the child of the node with the lower band-
width.
(b) Compute the bandwidth attributable to that
internal node, and appropriately adjust its
outgoing interface set to match the uncov-
ered child.
This step results in marked nodes, but some
of these nodes represent aggregates of the orig-
inal entries. Intuitively, step 2.1 “moves” up
the marks on the low bandwidth entries. In Fig-
ure 7(c), for example, the mark on the lowest
bandwidth entry, 224.0.1.1/32, is replaced
with a mark on the aggregate224.0.1.0/30.
An important observation is that this step does
not introduce any leaks, it simply changes the
matching entry corresponding to a group.
3. For each marked entry , in order of increasing
bandwidth:
(a) Consider the impact of unmarking . Un-
marking will cause its nearest marked
ancestor to leak more traffic. If this leak-
age does not violate the leak budget on any
interface, we unmark .
This key step tries to greedily eliminate entries.
For example, in Figure 7(d), we were able to
remove the mark on 224.0.1.2/31. This
causes traffic to leak on 224.0.1.0/30,but
these leaks are within the budget. At the end
of this step, the remaining marked entries corre-
spond to the aggregated forwarding table entries
rooted at the domain under consideration.
The complexity of this heuristic is .Up
to half of the nodes of the radix trie are considered
exactly once, and for each node, we may need to tra-
verse the path from that node to the root. Figure 7(d)
shows the result of running this heuristic on the input
shown. Notice how the heuristic aggregates the low
bandwidth groups (of bandwidth 5 and 6 bps) but in-
stalls group-specific entries for the “high-bandwidth”
entries.
Finally, running the heuristic for 64K entries for a
32-interface router takes about 200 milliseconds on a
Pentium II 266 MHz PC. This represents an extremely
unlikely worst-case scenario for our heuristic, where
the leak budget and entry bandwidths were set so that
all 64K entries could be aggregated to a single entry.
Finally, because the de-aggregated entries need to be
kept for some amount of time (e.g. one second) in the
forwarding table to estimate the groups bandwidth, the
speed of the algorithm will not be the bottleneck if the
de-aggregation and the aggregation are performed on
two blocks in parallel.
10
(a) Input to the algorithm: bw and oifs per group; allowed leaks
224.0.1.1/32
bw = 5
224.0.1.3/32
bw = 6
224.0.1.1/32
bw = 5
224.0.1.3/32
bw = 6
224.0.1.0/31 224.0.1.0/31
224.0.1.0/32
bw = 1000
224.0.1.2/32
bw = 1000
224.0.1.0/31 224.0.1.2/31
224.0.1.0/30
Allowed leaks per interface = 10
224.0.1.0/32
bw = 1000
224.0.1.2/32
bw = 1000
oif_1 oif_2 oif_1 oif_2
224.0.1.0/32
bw = 1000
224.0.1.2/32
bw = 1000
224.0.1.1/32 224.0.1.3/32
224.0.1.0/30
bw = 11
224.0.1.2/31
224.0.1.0/32
bw = 1000
224.0.1.2/32
bw = 1000
224.0.1.1/32 224.0.1.3/32
224.0.1.0/30
bw = 5
224.0.1.2/31
bw = 6
leaks = 5 leaks = 6
(b) Step 1
(d) Step 3 (c) Step 2
Figure 7: An example of the steps of the leaky aggregation algorithm
3.5 Dynamics of Groups Join/Leave and Mul-
ticast Data Bandwidth
In Section 3.2, we did not discuss the impact of re-
ceiver dynamics on the leak budget. Consider the fol-
lowing scenario. Suppose group entry is represented
in the forwarding table by its aggregate . Clearly, ’s
outgoing interface set includes ’s outgoing interface
set. Now, if a receiver join results in a new interface
added to ’s outgoing interface set, then ’s outgo-
ing interface set must be updated to include this new
interface. But doing so can overrun that some inter-
face’s leak budget, because other groups that match might now start leaking traffic on this new interface.
Notice that we can estimate how much this overrun is,
given the most recent bandwidth estimate for and
the estimates for all groups that are covered by .
Hence, the router needs to make the binary deci-
sion: add another oif to , or install a group-specific
entry in the forwarding table. This decision can be
based on whether at that moment the amount of leaks
or the number are the dominant concern. If it was
the first join for the group, the safer solution is to in-
stall first a a group-specific entry in the forwarding
table, and later the leaky aggregation machinery will
consider it for aggregation. Thus, short-living groups
might never need be considered for aggregation. Sim-
ilarly, when a receiver leave results in an interface be-
ing removed from ’s outgoing interface set, this may
reduce the leaks perpetrated by . This is an oppor-
tunity for better aggregation. The join/leave dynam-
ics will result in either adding/removing outgoing in-
terfaces of existing forwarding entries, or installing
short-living group-specific forwarding entries that will
be eventually aggregated later, and therefore will not
trigger immediate re-aggregation of the forwarding ta-
ble.
Heart-beating
syncronization
or updates
Bursty
Non-bursty
Low bandwidth High bandwidth
Mirror update
of files
Events notification
streams
Audio and video
Figure 8: Examples of high, low bandwidth, bursty,
and non-bursty multicast applications
Clearly, leaky aggregation is targeted towards long-
lifetime groups; the architectural cost of carefully ag-
gregating short-lifetime groups may not be worth the
effort. Figure 8 illustrates another source of dynamics
that can affect leaky aggregation. Bursty multicast ap-
plications (event notification, mirroring software up-
11
dates) may skew the bandwidth attributable to their
aggregates, and hence may affect the leak budget. We
are not concerned with short-term increases and do not
want to control the leaks with granularity similar to
the congestion control. Instead, we need to control
the leaks with granularity similar to the activity of the
multicast groups. If a high bandwidth group becomes
idle, this is an opportunity for better aggregation. On
contrary, if an idle group that was eventually aggre-
gated becomes suddenly a high bandwidth group, a
group-specific entry should be installed to reduce the
potential leaks. By reading periodically the forwarded
bandwidth of each entry in the forwarding table, the
idle groups that have group specific entry can be eas-
ily identified; an aggregated entry that has suddenly
increased its aggregated bandwidth should be consid-
ered as a source of increased leaks. The block of ad-
dresses that has the largest increase of the total for-
warded bandwidth should be re-aggregated (i.e. de-
aggregated and then aggregated) next to control the
amount of leaks. Re-aggregating the block of addresses
with the largest decrease of the total forwarded band-
width has the potentials for better aggregation.
4 Simulation Results
We have studied, through simulation, the performance
of leaky aggregation and have compared it to the non-
leaky methods described in Section 3.1. First, we eval-
uate how the leaky aggregation works among all routers
in the network. Then, we evaluate the dynamics of the
leaky aggregation by considering a single router only.
In this section, we present the results of this study.
4.1 Network-wide Simulations
Our first analysis of leaky aggregation was driven by
two questions: Compared to the non-leaky schemes,
how much additional aggregation does the leaky scheme
offer? How sensitive is the leaky scheme to choices of
various parameters (the leak budget, the traffic band-
width mix, and so on)?
The first set of simulations do not consider the im-
pact of dynamic group formation and removal or re-
ceiver joins and leaves; this is left to the second set of
simulations.
4.1.1 Methodology
Our simulations are based on a January 1999 snap-
shot [32] of the multicast overlay network, the MBone [24].
This was obtained by sending mrinfo [18] queries to
all known MBone routers and their neighbors. The
resulting graph had approximately 4000 nodes. The
choice of topology for simulations was constrained by
the lack of representative router-level Internet topolo-
gies. Our choice has the advantage of being “realis-
tic”. We also verified that our simulation results were
not skewed by our particular choice of topology. To
do this, we used automatically generated transit-stub
topologies [2], but do not present the results here.
On this graph, we labeled as stub nodes those that
were connected to the rest of the network by only one
interface. The remaining nodes were labeled transit
nodes. In our Mbone topology, there were 2300 stub
nodes. This distinction separates leaf routers from in-
terior routers. A subset (32) of the stub routers were
randomly chosen to be the root domains for multi-
cast group prefixes. This models our expectation that
group origination will primarily be confined to “leaf”
networks.
In our simulation, we restricted the multicast ad-
dress space to
(4096 addresses). Because our eval-
uations are comparative, the size of the address space
does not affect our simulations. In the absence of data
indicating otherwise, we divided these addresses into
32 equal sized blocks and assigned each block to one
root domain.
In each block, we assumed a 75% utilization of the
address space. Lower utilizations would have resulted
in fewer overall forwarding table entries, a regime where
aggregation itself is less important and easier. A higher
utilization may be somewhat unrealistic, even when
multicast addresses are dynamically allocated [22]. Of
the allocated multicast groups, we randomly marked
some groups to be low bandwidth and some others to
be high bandwidth. The ratio of the numbers of low
and high bandwidth groups, as well as the bandwidth
ratios are parameters to our simulations. Clearly, they
significantly affect how much bandwidth we can trade
off for reduced space.
Lacking data for group size distributions, we chose,
receivers for each group randomly from among the
stub nodes. This random placement of receivers stresses
12
forwarding state aggregation, reducing the likelihood
that adjacent groups have identical outgoing interface
sets. The number of receivers per group was a parame-
ter to our simulation: this number affects the outgoing
interface set of the forwarding entries, and hence the
leaks. In our simulations, we varied the number of re-
ceivers between 1 and 1500. For a given run of the
simulation, every group was allocated the same num-
ber of receivers. Finally, we simulated bi-directional
distribution trees (a choice of many recently proposed
multicast routing protocols [22, 30, 11]), but the re-
sults should hold for uni-directional trees as well. All
multicast data was originated either by the RD for the
group, or by some of the receivers. Since few of to-
day’s existing applications have them, we did not sim-
ulate non-member senders.
How did we select the capacities of links in the net-
work? We could have assigned every link the same
capacity, but that would not have stressed the leaky ag-
gregation scheme enough. That is because we define
the leak budget to be some fraction of the link capacity
(this fraction is yet another parameter to our simula-
tion). So, if all links are the same capacity, then, even
if a link carries little multicast traffic, it may be used
to leak a disproportionate amount of multicast traffic.
This allows greater aggregation.
To obtain more realistic capacity distribution, we
ran our simulation with one set of parameters from the
space we intended to explore (in our case, this was 512
high bandwidth groups with bandwidth of 200 units,
2560 low bandwidth groups with bandwidth of 1 unit,
and each group had 400 receivers). We performed 100
iterations, and for each iteration we placed at random
the RDs and the receivers and computed the amount of
multicast traffic over each link. The capacity of each
link was defined as twice the amount of maximum ob-
served multicast traffic on that link during any of our
simulation runs (the factor “twice” assumes that the
50% of the network is provisioned for unicast). This
heuristic ensured a heterogeneous distribution of link
capacities. However, because of the way we gener-
ated the link capacities, it sometimes resulted in situ-
ations where some routers had attached links that var-
ied by more than an order of magnitude in capacity.
To limit this skew and model reality more closely, we
adjusted the link capacities so that no router had in-
terfaces whose link capacities varied by more than a
factor of 10.
4.1.2 Results
0
500
1000
1500
2000
2500
3000
3500
0 200 400 600 800 1000 1200 1400 1600
Worst case number of forwarding entries
Number of receivers
512 high bandwidth groups; 2560 low bandwidth groups
Before aggregation
After: strict
After: pseudo-strict
After: leaky 2%
After: leaky 5%
After: leaky 10%
Figure 9: Prefix-based strict, pseudo-strict and leaky
aggregation
0
500
1000
1500
2000
2500
3000
0 5 10 15 20
Worst case number of forwarding entries
Allowed leaks (% of available link bandwidth)
512 high bandwidth groups; 2560 low bandwidth groups
high:low bw = 200:1
high:low bw = 1000:1
Figure 10: Effect of bandwidth ratio on leaky aggre-
gation
Our first set of simulations compared leaky against
non-leaky aggregation schemes. In all runs of the sim-
ulation, the number of groups was fixed at 3072. Of
these, we assumed that 512 were high-bandwidth groups,
and 2560 low-bandwidth groups. In the absence of
data indicating otherwise, this choice assumes close to
an 80-20 split between low and high bandwidth groups.
Why this choice? If far less than 80% of the groups
were low bandwidth, we might be in a regime where
aggregation is not an important problem. For exam-
ple, a split of 50-50 instead of 80-20 will be primar-
ily not because of a larger number of high bandwidth
13
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12
Number of entries (% of address space)
High bandwidth groups (% of address space)
Address space = 4096; high+low bandwidth groups = 3072
400 receivers/group, 5% leaks
400 receivers/group, 10% leaks
Figure 11: Leaky aggregation for different percentage
number of high bandwidth groups
groups (a number naturally limited by available net-
work capacity), but because of the much smaller num-
ber of low bandwidth groups. If the fraction of low
bandwidth groups was much larger, it might not real-
istically represent a future internet. We also fixed the
high to low bandwidth ratio at 200:1. We believe this
to be a very conservative choice: realistically, a high
bandwidth video stream may have a bandwidth of hun-
dreds of kilobits per second and an event notification
group may only have a few packets per hour. Below,
we explore the sensitivity of our results to variations
in these ratios.
For these choice of parameters, Figure 9 plots the
variation of the worst-case number of forwarding en-
tries with increasing number of receivers. Clearly, non-
leaky techniques do not reduce the worst case forward-
ing table size. Even with a large number of receivers,
there appear to be few instances where adjacent groups’
outgoing interface sets overlap. This can be attributed
to our random placement of receivers. If there is in-
deed a correlation between receivers and the originat-
ing root domain (Section 2.4), we might expect non-
leaky methods to perform well. We might also expect
that non-leaky methods perform well with lower router
fanouts. In some of our simulations, non-leaky meth-
ods reduced the forwarding table by a factor of 2 in
routers with 2-4 interfaces.
Figure 9 also plots, for three different values of the
leak budget, the variation in worst case forwarding
table size with number of receivers. Ideally, we ex-
pect the worst-case forwarding table size to match the
number of high bandwidth groups, 512 in our simula-
tions. Consider the curve for a leak budget of 2%—
i.e., where the leaks on every link are limited to 2% of
the link’s capacity. With our choice of simulation pa-
rameters, the low-bandwidth groups account for about
2.5% of the overall bandwidth requirement. So, it
comes as no surprise that a 2% leak budget does not
match our expected forwarding table size. Despite
this, leaky aggregation performs impressively, reduc-
ing the worst case table size to less than 60% of the un-
aggregated size. Furthermore, with increasing number
of receivers, leaky aggregation approaches our scal-
ing goal. With a large number of receivers, the likeli-
hood of outgoing interface set overlap increases. In
that event, aggregating two entries introduces fewer
leaks, so that more aggregates fit into the leak budget.
With a large leak budget, 10%, our heuristic closely
matches our expected table size. However, the curve
for a 5% leak budget reveals an interesting “hump” for
relatively small numbers of receivers. With our choice
of parameters, a 5% leak budget should be sufficient
to leak all low-bandwidth groups. However, the leak
budget is defined as a fraction of link capacity. So, if a
router has a low capacity link, 5% of its capacity may
not suffice to leak all low bandwidth groups incident
on that router. Such routers account for the hump in
the 5% curve.
Figure 9 validates our design of the leaky aggre-
gation mechanism, demonstrates its ability to achieve
our scaling goal, and highlights its graceful behavior
even with a limited leak budget. How would our re-
sults differ with different choices for simulation pa-
rameters? In particular, our results seem to depend
on the total bandwidth requirement of low bandwidth
groups, and its relationship to the leak budget.
One way to study this relationship is to vary the
bandwidth ratio. We mentioned earlier that our choice
of this ratio is conservative. With a more liberal choice
(Figure 10), we see that our scaling goal is reached for
a much smaller leak budget. This is as expected, since
the low-bandwidth groups now account for a much
smaller fraction of overall multicast bandwidth
3
.
Another way to study this relationship is to vary the
number of high-bandwidth groups, keeping the total
3
For the 1000:1 curve, we recomputed the link capacities, us-
ing the technique described in Section 4.1.1.
14
number of groups fixed (Figure 11). This figure shows
a largely linear growth in the table size as a function
of the number of high-bandwidth groups. This rather
dramatically illustrates the ability of our heuristic to
achieve the scaling goal described in Section 2.5. How-
ever, with lower numbers of high-bandwidth groups,
notice that there is a deviation from the linear. In this
region, the leak budget is insufficient to achieve ex-
pected aggregation.
4.2 Join/Leave and Multicast Data Traffic Dy-
namics Simulations
In our second set of simulations we evaluated how well
our re-aggregation strategy contains the size of the for-
warding table, as well as the volume of traffic leakage,
in the face of varying traffic and receiver dynamics.
4.2.1 Methodology
We focused on a single router with 32 interfaces. Our
simulation methodology was designed to stress the re-
aggregation heuristic by employing more than realistic
receiver dynamics and widely varying traffic patterns.
The multicast address space had total of 128K groups.
We defined three types of groups: low, medium, and
high bandwidth, with bandwidth ratio of 1:33:1000.
Of all groups, 50% were low, 30% were medium, and
10% were high bandwidth; the remaining 10% were
considered idle all the time. To define the amount of
bandwidth on each interface that is available for mul-
ticast, we assumed that the router and its links had the
capacity to carry the traffic of 32K groups (i.e. 25%
of the address space), when each group had, on aver-
age, 8 randomly chosen outgoing interfaces. The leak
budget on each interface was fixed at 5% of the max-
imum multicast traffic on that interface. If the router
had the capacity to carry the traffic of more groups,
or if the average number of oifs per group was higher,
then the absolute amount of allowed leaks on each in-
terface would be higher, and therefore the aggregation
results shown below would be better (i.e. the number
of entries after the aggregation will be smaller and/or
the percentage of leaks will be smaller).
The address space was divided into 32 blocks of ad-
dresses of size 4096 addresses each. Each second a
block was chosen to be de-aggregated, and the corre-
sponding group specific entries were installed in the
forwarding table (including the prefixes that cover the
holes between the groups). The forwarded bandwidth
was measured after one second, the entries were ag-
gregated using the algorithm described in Section 3.4,
and the result aggregated entries were installed in the
forwarding table. Then the same process was repeated
for another block of addresses, chosen among all blocks
that had the largest momentary increase of the for-
warded traffic by the corresponding aggregated entries.
We considered three types of a bandwidth distri-
bution of a group, which we call Uniform, Exp, and
Pareto. For a given interval of time, the bandwidth
of a Uniform group had a random value between zero
and twice its average bandwidth. Exp and Pareto were
ON/OFF type of traffic. The average length of the ON
periods was equal to the average length of the OFF
periods, and the length of each period had exponential
and Pareto distribution respectively. During each ON
period, the bandwidth was fixed, and its value also had
exponential or Pareto distribution. The shape parame-
ter of the Pareto distribution was 1.1 for the ON/OFF
period and 1.5 for the bandwidth [20]. During a simu-
lation, all traffic had the same distribution type.
In the simulations we chose to install a group-specific
entry first after a new group was joined, and later even-
tually aggregate it; the alternative option was to just
to add a new interface to the appropriate aggregated
entry. The former reduces the amount of leaks, but
increases the number of entries; the latter is just the
opposite. Finally, we should note that we kept track of
each group measured bandwidth, and used the follow-
ing formula to estimate a group bandwidth from the
de-aggregated block:
This formula allows us to capture the bursty band-
width groups, and at the same time does not underesti-
mate long-term high bandwidth groups if they are idle
for a short interval.
In each simulation, we initially populated the router
with a number of initial groups (10% of all multicast
groups), and on average each group had 2 oifs. After
1000 seconds we caused a number of receivers to join
or leave groups every second. The number of receivers
joining or leaving was uniformly distributed between
0 and 200; the receivers themselves were randomly
selected.
15
Finally, we also assumed that the router is receiv-
ing leaky traffic from its neighbors. The number of
low bandwidth leaky groups was 70% of the number
of low bandwidth joined groups; the percentage for
the middle and high bandwidth leaky groups was 30%
and 10% respectively. It is realistic to assume that the
higher the bandwidth of a group, lower the probability
a neighbor will be leaking it.
4.2.2 Results
Given the particular number of groups, their bandwidth
and the allowed leaks, we computed that the leaks are
enough to aggregate all low bandwidth groups (joined
and leaking from the neighbors), and approximately
1000 of the middle bandwidth groups. In the begin-
ning of the simulation there were approximately 6600
low bandwidth groups, 4000 middle bandwidth groups,
and 1300 high bandwidth groups, therefore we would
expect that the leaky aggregation would save approxi-
mately 7600 groups. Similarly, when the total number
of groups grows to 22000, we would expect that the
leaky aggregation would save approximately 12000 en-
tries.
0
5000
10000
15000
20000
25000
0 1000 2000 3000 4000 5000 6000 7000
Number of entries
Time(s)
Number of entries before and after aggregation
Before aggregation
After aggregation (5% leaks)
Figure 12: Leaky aggregation (Uniform traffic)
Figure 12, Figure 13, and Figure 14 show the num-
ber of entries in the forwarding table with and with-
out aggregation for Uniform, Exp, and Pareto traffic
respectively. The average length of the ON and OFF
periods for Exp was 1 minute. After the initial joins at
time 0, there were no other joins or leaves during the
first 1000 seconds, hence the number of total groups
did not change. The reason that the number of to-
tal groups increase after time 1000 is because we did
0
5000
10000
15000
20000
25000
0 1000 2000 3000 4000 5000 6000 7000
Number of entries
Time(s)
Number of entries before and after aggregation
Before aggregation
After aggregation (5% leaks)
Figure 13: Leaky aggregation (Exp ON/OFF traffic)
0
5000
10000
15000
20000
25000
0 1000 2000 3000 4000 5000 6000 7000
Number of entries
Time(s)
Number of entries before and after aggregation
Before aggregation
After aggregation (5% leaks)
Figure 14: Leaky aggregation (Pareto ON/OFF traffic)
16
not try to keep the average oifs per group to its ini-
tial value of 2: over the time the average number of
oifs became smaller than 2, and the number of joined
groups increased. After taking into account that the
de-aggregation of a prefix adds up to 2048 more en-
tries (to measure the incoming leaks), we can see that,
despite the high variability in traffic, and the signif-
icant traffic dynamics, leaky aggregation performs as
expected. In particular, the size of the forwarding table
does not vary widely. This validates our re-aggregation
heuristic 3.3.
0
5
10
15
20
25
30
0 1000 2000 3000 4000 5000 6000 7000
Amount of leaks (% of max. multicast traffic)
Time(s)
Amount of leaks on a singe interface
Figure 15: Leaky aggregation (Uniform traffic)
0
5
10
15
20
25
30
0 1000 2000 3000 4000 5000 6000 7000
Amount of leaks (% of max. multicast traffic)
Time(s)
Amount of leaks on a singe interface
Figure 16: Leaky aggregation (Exp ON/OFF traffic)
The reason that the Uniform aggregation was slightly
worse than Exp and Pareto was that the formula we
used to estimate a group bandwidth was more likely to
overestimate the bandwidth of a Uniform group. The
observed leaks confirm that. Figure 15, Figure 17, and
Figure 16 show the amount of leaks on a single inter-
face (the results for all interfaces were similar). De-
0
5
10
15
20
25
30
0 1000 2000 3000 4000 5000 6000 7000
Amount of leaks (% of max. multicast traffic)
Time(s)
Amount of leaks on a singe interface
Figure 17: Leaky aggregation (Pareto ON/OFF traffic)
spite the fact that the leak budget was 5%, because of
the group bandwidth overestimation Uniform did not
use all leaky budget. Exp and Pareto leaks however
were beyond the target leak budget, simply because
their momentary bandwidth is less predictable. This is
especially true for Pareto which has a much larger vari-
ance. We believe that those middle bandwidth groups
that were aggregated account for most of the leaks be-
yond the leak budget. As future work, we propose to
investigate whether a better aggregation heuristic and
a better group bandwidth estimation mechanism can
reduce this excessive leakage.
We ran the same simulations, but with different ON/OFF
mean interval (10 minutes, 5 seconds), different num-
ber of blocks used to split the address space (128 blocks,
8 blocks), and with larger initial fanout (8 oifs on aver-
age). In Table 1 we summarize the results. From those
results we can make the following observations.
Increasing the average ON/OFF interval from 1
to 10 minutes improves the aggregation. The
reason is that a group with long OFF interval is
practically idle, and can be aggregated.
Decreasing the average ON/OFF interval from 1
minute to 5 seconds did not change significantly
the results.
A much larger number of blocks (128) did did
not work well. The reason is that, because of the
join/leave dynamics, it would take much longer
to process all blocks once, and aggregate the
low bandwidth groups; similarly, it would take
17
Parameters Aggreg (Uni) Leaks (Uni) Aggreg (Exp) Leaks (Exp) Aggreg (Pareto) Leaks (Pareto)
32 blocks, 2 oifs, 1 min. 14200/22500 1.6-2.0% 12000/22500 7-13% 12200/22500 5-11%
10 min. ON/OFF — — 8800/22500 4-8% 11000/22500 5-11%
5 sec. ON/OFF — — 13800/22500 4-9% 13300/22500 4-10%
128 blocks 20000/22500 1.1-2.0% 18000/22500 7-13% 18000/22500 4-7%
8 blocks 16800/22500 3.0-3.5% 13700/22500 5-11% 14600/22500 4-10%
8 oifs 30500/57000 3-4% 29500/57000 8-12% 26400/57000 7-15%
Table 1: Simulation results summary
longer time to re-aggregate all blocks and re-
duce promptly the eventually increased leaks.
A small number of blocks (8) has a shorter pro-
cessing cycle, but because each block size now
is larger, the number of holes to install to mea-
sure the neighbors leaks is larger too, and there-
fore the total number of forwarding entries is
larger.
Increasing the initial average number of inter-
faces per group from 2 to 8 increased the num-
ber of total groups in the long run, because we
did not try to keep the same average number of
oifs per group. Despite the larger number of
groups, the number of entries after aggregation
was as expected.
5 Discussion
We have, so far, postponed the discussion of several in-
teresting questions: What are the limitations of leaky
aggregation? Does leaky aggregation have undesirable
traffic effects, e.g., loops? Are there other aggregation
strategies? Does leaky aggregation address all mul-
ticast routing related scaling issues? The following
paragraphs address some of these issues.
Clearly, leaky aggregation cannot achieve the same
levels of table compression (Figure 9) when low-bandwidth
groups dominate overall multicast bandwidth. While
we have no data to refute this, we believe that, by
analogy with TCP traffic [6] on the Internet, a few
high-bandwidth multicast applications will dominate
the overall multicast bandwidth. Leaky aggregation
uses some approximated estimation of the bandwidth
of each group, and tries to keep the leaks on each in-
terface within some reasonable limit. However, an ex-
tremely bursty traffic might result in larger than ex-
pected leaks, and, worse, it might not be always possi-
ble to quickly identify the bursty group.
Leaky aggregation can create traffic loops in mul-
ticast routing protocols whose forwarding entries are
bi-directional. Intuitively, this happens because traf-
fic leaking causes a router to receive traffic for which
it never sent a join. Fortunately, this might happen
only in some rare cases, which are easy to identify in
advance. The solution is simple and needs to be ap-
plied only after those cases are identified. Appendix A
describes the problem and the solution in details with
reference to a particular multicast protocol [22].
An alternative approach for multicast group lookup
and non-leaky aggregation has been suggested recently [38]
This approach reorganizes the forwarding table struc-
ture so that for every packet, a per-interface decision is
made whether to forward the packet out that interface
or not. This alternative organization promises greater
non-leaky aggregation. There is more likelihood of
adjacent groups sharing one interface on their outgo-
ing interface sets than there is the likelihood of having
identical sets. We expect that this alternative structure
will equally benefit from leaky aggregation.
Even with leaky aggregation, routers will need to
maintain group-specific state in their routing tables.
Unlike forwarding state, scaling the routing table may
be less critical. First, a router needs to have only a sin-
gle copy of this table
4
. Second, because routing state
is not accessed in the forwarding path, routing tables
can be stored in cheaper and slower memory thereby
alleviating the problem somewhat. If the amount of
routing state becomes an issue, the leaky approach can
4
High-end routers such as [26] and [29] use a number of rout-
ing engines with a copy of the forwarding table at each engine
18
be applied at the routing table level in a network ar-
chitecture where the edge routers carefully aggregate
the join/prune messages (i.e. the edge routers have
the control over the overall network leaks and aggre-
gation). The future will show whether the multicast-
capable networks will need this kind of engineering.
In soft-state multicast protocols such as PIM, con-
trol traffic can increase linearly with the number of
concurrent groups. Leaky aggregation does not solve
this problem. Approaches that adapt the refresh rate
of explicit joins [36] have been proposed to deal with
this.
In theory, leaky aggregation is applicable to source-
group-specific forwarding entries as well. Because
source unicast addresses are already topology assigned,
our leaky aggregation strategy will work on prefixes
which consider the source and the group address to-
gether. It is less clear if there is a problem to solve
here: increasingly, inter-domain multicast protocols [22,
30] instantiate mostly group-specific state. Source-
based leaky aggregation can, however, be applied to
broadcast-and-prune multicast routing protocols such
as DVMRP and PIM-DM, when the number of low-
bandwidth entries is too large
5
.
Finally, the meta-question that someone might ask
is: do we need aggregation at all, and if we do, would
not be possible to achieve the desired result with strict
aggregation only? If most of the time the forwarding
table size was within the capacity of the forwarding
table memory, but occasionally the forwarding table
would overflow, it is quite likely that the router per-
formance will be affected during the overflow. Taking
the router off-line for hardware upgrade might not be
the most appropriate or flexible solution. Automati-
cally triggering leaky aggregation before the overflow
of the forwarding table will eliminate the need for un-
necessary hardware upgrade, and will reduce the man-
5
We should note that if a PIM-DM or DVMRP router receives
leaky (i.e. unwanted) traffic, typically it would send a Prune mes-
sage to the upstream router. Usually “forwarding table cache
miss” is used to identify the unwanted group traffic and trigger
the Prune. If there was a leaky entry in the downstream router too,
then it will not send a Prune message. If the downstream router
did not have any entry for that unwanted group (including an entry
with oifs=NULL), the rate of the Prune messages it sends might
be used by the upstream router for quick identification of the high
bandwidth groups that should not leak anymore on that interface.
agement cost. Indeed, it is possible that using pseudo-
strict or strict aggregation might be good enough, and
we might not need leaky aggregation at all. In fact,
the leaky aggregation algorithm we described in Sec-
tion 3.4 (see Appendix B.2 for the pseudo-code) could
actually be used for pseudo-strict aggregation when
the allowed leaks per interface are zero, or even for
strict aggregation (after a simple modification of the
aggregation rule), simply because strict and pseudo-
strict aggregations are more restricted versions of leaky
aggregation. If the non-leaky aggregation was not good
enough, then the leaky aggregation might be used as a
last resource.
6 Conclusions and Future Work
Our paper presupposes that there is no “natural” limit
to the number of concurrent multicast groups, and that
it would be undesirable for the multicast architecture
to impose artificial limits on the number of concurrent
multicast groups. We have described leaky aggrega-
tion, a strategy that carefully trades off gratuitous data
distribution for reduced forwarding state. Our simula-
tions show that when the receivers are randomly dis-
tributed, it is almost impossible to perform reasonable
non-leaky aggregation. On the other hand, the number
of forwarding entries created by the leaky aggregation
can scale as the number of high bandwidth groups.
Given that the number of high bandwidth groups is
limited by the network capacity, leaky aggregation re-
duces the likelihood of extremely large forwarding ta-
bles.
A skeptic might ask: It is fairly obvious that one can
tradeoff bandwidth for reduced state, so what’s inter-
esting about the work presented in this paper? In our
view, this paper demonstrates the existence of a solu-
tion that exhibits very desirable properties in achiev-
ing this goal. Our leaky aggregation heuristic is in-
crementally deployable, locally configurable
6
requires
little or no changes in routing protocols or forwarding
architectures, behaves gracefully even with underpro-
6
In our simulations, we fixed the leak budget on all links to
be the same. However, the leaky aggregation strategy will work
with locally configured leak budgets. This is because each router
actually takes into account its neighbors’ leaks in determining its
aggregates, Section 3.3.
19
visioned leak budgets, and does not impact router for-
warding efficiency. We believe this to be the main con-
tribution of this paper. Simulating various data traffic
distribution and random join/leaves show that leaky
aggregation can work very well in dynamic environ-
ment.
Scaling forwarding state aggregation is one of the
last unsolved problems in the design of a large-scale
multicast architecture for IP. With such an aggregation
strategy, a future internet can simultaneously support
many low-bandwidth multicast-based applications for
event notification, Web document invalidation and other
collaborative, distributed applications. That, of course,
brings up an interesting philosophical question: Is wide-
area multicast the appropriate tool for these applica-
tions? Or can these applications be supported by a
generic hierarchy of “servers” inside the network that,
essentially, provides application-level multicast? It is
not at all obvious that the latter can be implemented in
a way that provides the same level of robustness as IP
multicast.
7 Acknowledgments
This work is based on a five years old idea by Van Ja-
cobson, but any mistakes or errors developing his idea
are responsibility of the authors. Yuri Pryadkil from
ISI collected and provided the Mbone map informa-
tion we used in our simulations. Pavlin’s work was
supported by a gift from Sun Microsystems, and by
the NSF-funded Routing Arbiter project in ISI.
References
[1] A. Ballardie, B. Cain, and Z. Zhang. Core Based
Trees (CBT version 3) Multicast Routing (Pro-
tocol Specification). Internet Draft, draft-ietf-
idmr-cbt-spec-v3-01.txt, August 1998.
[2] Ken Calvert and Ellen Zegura. GT In-
ternetwork Topology Models (GT-ITM).
http://www.cc.gatech.edu/fac/Ellen.Zegura/graphs.html.
[3] A. Carzaniga, D.S. Rosenblum, and A.L. Wolf.
Design of a Scalable Event Notification Ser-
vice: Interface and Architecture. Technical Re-
port CU-CS-863-98, Department of Computer
Science, University of Colorado at Boulder,
September 1998.
[4] S. Casner. First IETF Internet audiocast. Com-
puter Communication Review, 22(3):92–97, July
1992.
[5] Thomas H. Cormen, Charles E. Leiserson, and
Ronald L. Rivest. Introduction to Algorithms.
MIT Press, 1990.
[6] P. Danzig, S. Jamin, R. Caceres, D. Mitzel,
and D. Estrin. An Empirical Workload Model
for Driving Wide-Area TCP/IP Network Simula-
tions. Journal of Internetworking: Research and
Experience, 3(1):1–26, March 1992.
[7] S. Deering and R. Hinden. Internet Protocol, Ver-
sion 6 (IPv6) Specification. Request for Com-
ments 2460, December 1998.
[8] Stephen Deering. Multicast Routing in a Data-
gram Internetwork. PhD thesis, Stanford Univer-
sity, 1991.
[9] Stephen Deering, Deborah Estrin, Dino Fari-
nacci, Van Jacobson, Ahmed Helmy, David
Meyer, and Liming Wei. Protocol Indepen-
dent Multicast Version 2 Dense Mode Specifica-
tion. Internet Draft, draft-ietf-pim-v2-dm-01.txt,
November 1998.
[10] Mikael Degermark, Andrej Brodnik, Svante
Carlsson, and Stephen Pink. Small Forwarding
Tables for Fast Routing Lookups. In Proceed-
ings of the ACM SIGCOMM’97, Cannes, France,
1997.
[11] Deborah Estrin and Dino Farinacci. Bi-
Directional Shared Trees in PIM-SM. Inter-
net Draft, draft-farinacci-bidir-pim-01.txt, May
1999.
[12] Deborah Estrin, Dino Farinacci, Ahmed Helmy,
David Thaler, Stephen Deering, Mark Hand-
ley, Van Jacobson, Ching-Gung Liu, Puneet
Sharma, and Liming Wei. Protocol Indepen-
dent Multicast-Sparse Mode (PIM-SM): Proto-
col Specification. Request for Comments 2362,
June 1998.
20
[13] Sally Floyd and Kevin Fall. Promoting the Use of
End-to-End Congestion Control in the Internet.
under submission to IEEE/ACM Transactions on
Networking.
[14] Sally Floyd, Van Jacobson, Ching-Gung Liu,
Steven McCanne, and Lixia Zhang. A Reliable
Multicast Framework for Light-weight Sessions
and Application Level Framing. IEEE/ACM
Transactions on Networking, November 1997.
[15] Ramesh Govindan, Haobo Yu, and Deborah Es-
trin. Large-Scale Weakly Consistent Replica-
tion using Multicast. Technical Report 98-682,
Department of Computer Science, University of
Southern California, July 1998.
[16] Pankaj Gupta, Steven Lin, and Nick McKeown.
Routing Lookups in Hardware at Memory Ac-
cess Speeds. In Proceedings of the IEEE Info-
com’98, San Francisco, USA, 1998.
[17] Hugh W. Holbrook and David R. Cheriton.
IP Multicast Channels: EXPRESS Support for
Large-scale Single-source Applications. In Pro-
ceedings of the ACM SIGCOMM’99, Cam-
bridge, Massachusetts, USA, 1999.
[18] Van Jacobson. mrinfo(8): Tool for displaying
configuration info from a multicast router.
[19] Van Jacobson. Some Notes on Multicast Scaling
and PIM. IDMR Working Group Presentation,
30th IETF, Toronto, Canada, July 1994.
[20] Sugih Jamin, Peter B. Danzig, Scott J. Shenker,
and Lixia Zhang. A Measurement-based Admis-
sion Control Algorithm for Integrated Services
Packet Networks (Extended Version). EEE/ACM
Transactions on Networking, 5(1):56–70, Febru-
ary 1997.
[21] Joseph R. Kiniry. Wavelength Division Multi-
plexing: Ultra High Speed Fiber Optics. IEEE
Internet Computing, 2(2), March/April 1998.
[22] Satish Kumar, Pavlin Radoslavov, David Thaler,
Cengiz Alaettinoglu, Deborah Estrin, and Mark
Handley. The MASC/BGMP Architecture for
Inter-domain Multicast Routing. In Proceedings
of the ACM SIGCOMM’98, Vancouver, Canada,
1998.
[23] B. Lampson, V . Srinivasan, and G. Varghese.
IP Lookups using Multiway and Multicolumn
Search. In Proceedings of the IEEE Infocom’98,
San Francisco, USA, 1998.
[24] Michael R. Macedonia and Donald P. Brutzman.
MBone Provides Audio and Video Across the In-
ternet. IEEE Computer, April 1994.
[25] Steven McCanne. Scalable Multimedia Com-
munication with Internet Multicast, Light weight
Sessions, and the Mbone. Technical Report
CSD-98-1002, Department of Computer Sci-
ence, University of California, Berkeley, March
1998.
[26] Nick McKeown. Fast Switched
Backplane for a Gigabit Switched
Router. white paper, available from
http://www.cisco.com/warp/public/733/12000/fasts wp.pdf.
[27] J. Moy. Multicast Extensions to OSPF. Request
for Comments 1584, March 1994.
[28] Stefan Nilsson and Gunnar Karlsson. Fast ad-
dress lookup for Internet routers. In Proceed-
ings of the IFIP 4th International Conference on
Broadband Communications, pages 11–22, Stut-
gart, Germany, 1998.
[29] C. Patridge, P. Carvey, E. Burgess, I. Castineyra,
T. Clarke, L. Graham, M. Hathaway, P. Her-
man, A. King, S. Kohlami, T. Ma, J. Mcallen,
T. Mendez, W. Milliken, R. Osterlind, R. Pet-
tyjohn, J. Rokosz, J. Seeger, M. Sollins,
S. Storch, B. Tober, G. Troxel, D. Waitzman,
and S. Winterble. A Fifty Gigabit Per Second
IP Router. to appear in IEEE/ACM Transactions
on Networking.
[30] Radia Perlman, Cheng-Yin Lee, Tony Bal-
lardie, Jon Crowcroft, Zheng Wang, and Thomas
Maufer. Simple Multicast: A Design for Sim-
ple, Low-Overhead Multicast. Internet Draft,
draft-perlman-simple-multicast-01.txt, Novem-
ber 1998.
21
[31] Jon Postel. Internet Protocol. Request for Com-
ments 791, September 1981.
[32] The SCAN Project.
http://www.isi.edu/scan/mbone.html.
[33] Tom Pusateri. Distance Vector Multicast Routing
Protocol. Internet Draft, draft-ietf-idmr-dvmrp-
v3-07.txt, August 1998.
[34] Y . Rekhter and C. Topolcic. Exchanging Rout-
ing Information Across Provider Boundaries in
the CIDR Environment. Request for Comments
1520, September 1993.
[35] Pablo Rodriguez and Ernst W. Biersack. Contin-
uous Multicast Distribution of Web Documents
over the Internet. IEEE Network Magazine,
March/April 1998.
[36] Puneet Sharma, Deborah Estrin, Sally Floyd,
and Van Jacobson. Scalable Timers for Pro-
tocol Independent Multicast (PIM). Internet
Draft, draft-ietf-pimwg-PIM-STimers-00.ps,
December 1998. currently available only from
ftp://catarina.usc.edu/pub/puneetsh/pim/stimers id.ps.
[37] V . Srinivasan and G. Varghese. Fast Ad-
dress Lookups using Controlled Prefix Exnan-
sion. ACM Transactions on Computer Systems,
17(1):1–40, February 1999.
[38] David Thaler and Mark Handley. On the aggre-
gatability of multicast forwarding state. Techni-
cal Report MSR-TR-99-34, Microsoft, 1999.
[39] Jining Tian and Gerald Neufeld. Forward-
ing State Reduction for Sparse Mode Multicast
Communication. In Proceedings of the IEEE In-
focom’98, San Francisco, USA, 1998.
[40] Amin Vahdat, Paul Eastham, and
Thomas Anderson. WebFS: A
Global Cache Coherent Filesystem.
http://www.cs.berkeley.edu/˜vahdat/webfs/webfs.html,
December 1996. Department of EECS, Univer-
sity of California, Berkeley.
[41] Marcel Waldvoge, George Varghese, Jon Turner,
and Bernhard Plattner. Scalable High Speed IP
Routing Lookups. In Proceedings of the ACM
SIGCOMM’97, Cannes, France, 1997.
[42] Lixia Zhang, Scott Michel, Khoi Nguyen, Adam
Rosenstein, Sally Floyd, and Van Jacobson.
Adaptive Web Caching: Towards a New Caching
Architecture. 3rd International WWW Caching
Workshop, 1998.
A BGMP-specific Loop Problem and
its Prevention
224.1.1/24
Root Domain for
: 224.1.1.1 dataflow
: 224.1.1.1 dataflow because
of the leaky aggregation
: (*,224.1.2.0) Join message
A
B
(*,224.1.2.0)J RD_l
Root Domain for 224.1/16
RD_s
Figure 18: BGMP-specific loop potential problem be-
cause of the leaky entries aggregation
For protocols that use bi-directional trees traffic leaks
can result in loops. Figure 18 shows such a loop in
BGMP. Router A installs an aggregate forwarding en-
try for 224.1/16. This entry is leaky—that is, the
A–B interface is included in the entry’s outgoing in-
terface list even though B has only sent an explicit
join for one group 224.1.2.0 in that prefix. B in-
stalls a forwarding entry for 224.1.1/24. Traffic
for any group in this prefix can reach B from two di-
rections:
–B and
–
–A–B. The BGMP
forwarding rules in B will accept the data over link A–
B and will forward it toward
. The data will be
forwarded by
to
, then A down to B again,
i.e., the data will start looping.
The reason for the loop problem is that B has Group
Routing Information Base (G-RIB) for two overlap-
ping prefixes (224.1/16 and 224.1.1/24) with
different next-hop routers toward the corresponding
22
Root Domains. Furthermore, both A and B imple-
ment leaky aggregation. Based on the G-RIB infor-
mation B has, and the join messages that were sent
to the neighbor routers, B can detect potential loops
for some group prefixes (224.1.1/24 in our exam-
ple). One possible solution is for B to instruct A,
via BGMP’s optional attribute negotiation mechanism,
not to send any leaky traffic for 224.1.1/24 over
link A–B. A can prevent the leaks by simply installing
a single entry for224.1.1/24, and this entry would
never be passed to the leaky aggregation machinery.
We do expect that mostly the routers that are close
to the edges (i.e. routers with less entries to aggre-
gate) will have such overlapping and pointing in dif-
ferent directions G-RIB entries, but even if it is quite
common among backbone routers, in the worst case a
router will need to install a number of entries (each en-
try will represent a prefix size of at least 256 groups),
and those entries would never be deleted. This solu-
tion retains the incremental deployability of BGMP. A
similar solution may be applied for CBT.
B Pseudo Code
B.1 Leaky Aggregation Main Procedure
The main procedure used to perform leaky aggregation
can be implemented only in software and does not re-
quire any explicit hardware support.
while (TRUE) {
/* Select next block to de-aggregate,
* measure its groups bandwidth, and aggregate.
*/
/* Check first if some prefix is a better candidate:
* it may be creating too much bandwidth overhead, or
* may have better aggregability.
*/
curr_block = HEAD(priority_queue);
if (curr_block == NULL)
curr_block = HEAD(round_robin_queue);
/* De-aggregate. Install in the forwarding table entries
* for all groups and for the holes between them.
*/
de_aggregate(curr_block);
/* Continue below after some amount of time */
goto_label_later(READ_BW, T);
READ_BW:
read_entries_bw(curr_block); /* Read estimated bandwidth */
/* Compute the entries. See the pseudo code below */
compute_leaky_entries(curr_block);
/* Install in the forwarding table the computed
* aggregated entries
*/
install_leaky_entries(curr_block);
}
The management of the priority queue can be per-
formed by the same process before the execution con-
tinues from label “READ BW”.
B.2 Greedy Heuristic Aggregation Algorithm
Below is the pseudo-code of the greedy approximation
algorithm we used to compute the leaky aggregated
entries in our simulations. The algorithm time com-
plexity is ; the space complexity is .
compute_leaky_entries(prefix_address_block)
{
/* Assign the allowed leaks per interface for this address block */
for (all_ifs)
allowed_leaks[if_id] = leaks_budget(if_id, prefix_address_block);
/* For each group, mark a node in the Radix trie as INSTALL */
/* Lower the bandwidth, the corresponding node will
* be higher in the trie.
*/
for (parent = addr_block_size-1; parent >= 1; parent--) {
left = LEFT_CHILD(parent); right = RIGHT_CHILD(parent);
if (ENTRY(left).bw <= ENTRY(right).bw)
low = left; high = right;
else
low = right; high = left;
ENTRY(high).flag |= INSTALL;
ENTRY(parent).bw = ENTRY(low).bw;
for (all_ifs)
ENTRY(parent).join[if_id] = ENTRY(low).join[if_id];
}
/* Sort all entries except ROOT */
#define ROOT 1
ENTRY(ROOT).flag &= ˜INSTALL;
/* Can use approximated hash buckets to sort in O(N),
* or any standard O(N*lg(N)) sorting algorithm.
*/
SORT_ALL_INSTALLED_ENTRIES_BY_THEIR_BW();
remaining_nodes = total_installed_entries;
/* Always install the root of the prefix (the default
* entry toward the Root Domain for that prefix).
*/
ENTRY(ROOT).flag |= INSTALL;
while(remaining_nodes--) {
node = POP_SMALLEST_BW_ENTRY();
/* Find first installed parent node up in the Radix trie */
parent = FIND_INSTALLED_PARENT(node); /* O(lg(N)) */
for (all_ifs) {
bw_overhead[if_id] =
compute_leak_after_aggreg(parent, node, if_id);
if (bw_overhead[if_id] > allowed_leaks[if_id]) {
keep_entry = TRUE;
break;
}
}
if (keep_entry == TRUE)
continue;
ENTRY(node).flag &= ˜INSTALL;
for (all_ifs) {
allowed_leaks[if_id] -=
compute_leak_after_aggreg(parent, node, if_id);
ENTRY(parent).join[if_id] |= ENTRY(node).join[if_id];
}
ENTRY(parent).bw += ENTRY(node).bw;
REINSERT_INTO_SORT_LIST(parent); /* O(lg(N)) or */
} /* O(1) for hash buckets */
}
23
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 731 (2000)
PDF
USC Computer Science Technical Reports, no. 751 (2001)
PDF
USC Computer Science Technical Reports, no. 723 (2000)
PDF
USC Computer Science Technical Reports, no. 692 (1999)
PDF
USC Computer Science Technical Reports, no. 706 (1999)
PDF
USC Computer Science Technical Reports, no. 777 (2002)
PDF
USC Computer Science Technical Reports, no. 774 (2002)
PDF
USC Computer Science Technical Reports, no. 704 (1999)
PDF
USC Computer Science Technical Reports, no. 750 (2001)
PDF
USC Computer Science Technical Reports, no. 669 (1998)
PDF
USC Computer Science Technical Reports, no. 677 (1998)
PDF
USC Computer Science Technical Reports, no. 603 (1995)
PDF
USC Computer Science Technical Reports, no. 745 (2001)
PDF
USC Computer Science Technical Reports, no. 631 (1996)
PDF
USC Computer Science Technical Reports, no. 613 (1995)
PDF
USC Computer Science Technical Reports, no. 732 (2000)
PDF
USC Computer Science Technical Reports, no. 682 (1998)
PDF
USC Computer Science Technical Reports, no. 673 (1998)
PDF
USC Computer Science Technical Reports, no. 674 (1998)
PDF
USC Computer Science Technical Reports, no. 657 (1997)
Description
Pavlin Ivanov Radoslavov (USC/ISI), Deborah Estrin(USC/ISI), Ramesh Govindan (USC/ISI). "Exploiting the bandwidth-memory tradeoff in multicast state aggregation." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 697 (1999).
Asset Metadata
Creator
Estrin, Deborah
(author),
Govindan, Ramesh
(author),
Radoslavov, Pavlin Ivanov
(author)
Core Title
USC Computer Science Technical Reports, no. 697 (1999)
Alternative Title
Exploiting the bandwidth-memory tradeoff in multicast state aggregation (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
23 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269956
Identifier
99-697 Exploiting the Bandwidth-Memory Tradeoff in Multicast State Aggregation (filename)
Legacy Identifier
usc-cstr-99-697
Format
23 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/