Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 805 (2003)
(USC DC Other)
USC Computer Science Technical Reports, no. 805 (2003)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1 BGP Dynamics during Route Flap Damping Beichuan Zhang bzhang@isi.edu Daniel Massey masseyd@isi.edu Lixia Zhang lixia@cs.ucla.edu Abstract— The BGP routing protocol uses a mechanism called Route Flap Damping [13] to limit the impact of connectivity instability to any individual sites. Although it is believed that this damping mecha- nism contributes to the stability of global Internet routing, the ex- act effects of damping has not been thoroughly examined in a large scale network setting. Previous work [8] has shown that damp- ing can be falsely triggered by BGP’s path exploration and signif- icantly extends the routing convergence time after even a single route flap. In this paper we examine the impact of damping under a range of connectivity flapping patterns and different damping parameters. Our results show that damping can confine global routing dynamics to follow a predictable analytical model when connectivity to a destination flaps persistently. However when the number of flaps is small, the global routing behavior deviates from the intended analytical model and damping leads to higher dynam- ics as measured by both message overhead and network conver- gence. Such dynamics are largely shaped by the interaction be- tween route reuse timers at different routers; route suppression and reuse at one router can affect the number of routing updates received by other routers, and in turn, others’ damping behavior. We show how this reuse timer interaction, when combined with BGP path exploration, can lead to a staged behavior of routing up- dates consisting of charging, suppression, releasing, and possible additional rounds of secondary charging phases. We also examine the effects of flapping interval, damping parameters, and network topology on both message overhead and network convergence. I. INTRODUCTION As the de-facto global routing protocol for the Internet, BGP is used by thousands of Autonomous Systems (AS) to exchange routing information for hundreds of thousands of destinations (represented by IP address prefixes). In such a large-scale dis- tributed system, faults are bound to occur and BGP routes dy- namically adapt to changes in network topology and routing policy. However, a single unstable BGP route can result in thousands of update messages being propagated throughout the global Internet, increasing both router CPU load and link band- width consumption [6]. To help limit the impact of unstable routes, BGP employs two mechanisms to constrain the behavior of route changes. On the time scale of seconds, an MRAI timer at each sender adds a minimum time interval between two suc- cessive update messages. On a longer time scale, Route Flap Damping [13] uses an adaptive timer at the receiving end to constrain routing updates of unstable routes. It is believed that route flap damping has played an essential role in stabilizing the Internet routing infrastructure [3]. However a systematic in-depth study of its effectiveness on large scale networks is largely missing and the actual routing behavior under damping is not well understood. In route flap damping, a BGP router maintains a penalty value for each prefix advertised by each peer and suppresses unstable routes based on this penalty value. Whenever the peer announces a route change, (e.g., replaces the existing route by a new one, withdraws a route, or re-announces a route after a withdrawal), the penalty is increased according to the type of the change. When the penalty value for a prefix exceeds a pre- set threshold, the associated peer can no longer be selected as the best path to the prefix. Although further updates may be received from the suppressed peer, these updates are not prop- agated further. The penalty value decays exponentially over time. When it reaches a predefined reuse threshold, the route will be considered for best path selection again. Throughout this paper, we use damping as an abbreviation for “route flap damping” to refer the whole mechanism, and route suppression to refer the specific action of stopping using a route. A more detailed description of damping and its background is given in Section II. This paper presents a systematic study of BGP convergence delay and message overhead under various damping settings in- cluding the number of route flaps, the flapping interval, and different sets of damping parameters. Although BGP’s path exploration process is the driving force behind the routing dy- namics, the specific pattern of dynamics is largely shaped by a previously unknown interaction among reuse timers at dif- ferent routers. Our results show that the entire damping pro- cess comprises of three distinct periods, and each period has its own characteristics. We provide a mathematical prediction of damping behavior which matches well to the simulation results when the number of flaps is high. However, when the number of flaps is low, damping behaves counter to intended design. The route suppression and reuse at one router may affect the number of updates received by other routers, and in turn, oth- ers’ damping behavior. As a result, damping causes both higher convergence time and higher message overhead when the num- ber of flaps is low. We analyze this interaction in detail, and examine the impact of flapping interval, damping parameters and network topology. Our results also show that, contrary to intuition, certain damping parameter tunings recommended by RIPE-229 [10] have negative effects on both convergence time and message overhead. These findings provide new insights into the damping mechanism. The remainder of the paper is organized as follows. Section II describes damping in more detail and Section III describes the simulation methodology and setup. Section IV presents the simulation results on damping behaviors and the impact of var- ious factors. Section V reviews related work. We conclude in Section VI with discussion and future work. 2 Fig. 1. A BGP router’s RIBs II. BACKGROUND ON DAMPING BGP route flapping can be caused by various faults in hard- ware, software and operations. For example, route flapping may be triggered by a faulty link that intermittently fails and then recovers. In another example, the TCP connections be- tween BGP peers may have been disrupted by the high-volume data traffic during the recent Internet worm attacks. If the BGP route to a prefix relies on the faulty link or the unstable peer- ing session, the route will flap as the connectivity changes, and the goal of damping is to limit the global impact of such un- stable routes. The damping algorithm (described below) was first implemented in 1995 and then documented in RFC 2439 in 1998 [13]. It is supported in commercial products from all major vendors. Damping is thought to be widely deployed and helps stabilize the Internet routing infrastructure [3]. Conceptually, a BGP router stores routes received from its peers in RIB-IN, picks the best route from among all the RIB- INs, stores this best route in Local-RIB, and puts updates to be sent in RIB-OUT (Fig. 1). Damping associates a penalty value with each entry in a RIB-IN. In other words, there is a penalty value associated with each peer and prefix pair. Whenever a peer sends an update for the prefix, the RIB-IN entry is updated and its corresponding penalty is also increased. Different types of updates (route change, route withdrawal, re-announcement, etc.) cause different penalty increments. If the penalty exceeds a cut-off threshold, the RIB-IN entry is suppressed and can no longer be used when selecting the best route. The penalty value also decays exponentially and a suppressed route will be reused when the penalty drops below a reuse threshold. More formally, if the penalty isp(t 0 ) at timet 0 andp(t) at timet, then p(t) =p(t 0 )e −λ(t−t0) whereλ is often configured by half-lifeH = ln2/λ. A router usually sets a reuse timer based on current penalty value, and reuses the route when the reuse timer expires. Dur- ing route suppression, if more route changes are received, RIB- IN and the penalty value will be updated accordingly, and the reuse timer will be reset based on the new penalty value. How- ever, changes to the suppressed route do not enter Local-RIB or any RIB-OUT and thus changes to the suppressed route will not propagate further. There is also a maximum hold-down time as the upper limit on how long a route can be suppressed after it becomes stable. It is often implemented as an equivalent max- imum penalty value. Table I lists the default damping parame- ters from two major vendors, and Fig. 2 illustrates the penalty value changing over time. The operator community has long recognized that inconsis- tent damping settings in the network may lead to connectivity problems that are difficult to diagnose. RIPE-229 [10] rec- ommends a set of damping parameters for router configura- tion. As early as in 1998, Panigl [10] observed from his op- Damping Parameters Cisco Juniper Withdrawal Penalty (P W ) 1000 1000 Re-announcement Penalty (P RA ) 0 1000 Attributes Change Penalty (P A ) 500 500 Cut-off Threshold (P cut ) 2000 3000 Half Life (minute) (H) 15 15 Reuse Threshold (P reuse ) 750 750 Max Hold-down Time (minute) 60 60 TABLE I DEFAULT DAMPING PARAMETERS 0 500 1000 1500 2000 2500 3000 3500 4000 0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640 Penalty Time (seconds) Cut-off Threshold Reuse Threshold Fig. 2. Damping Penalty (I down = Iup = 60s, Cisco) eration experience that one route withdrawal and one route re- announcement in Europe triggered route suppression in North America. The exact cause of this behavior was not fully ex- plained until 2002, when Mao et. al. [8] discovered that, after a single route change, BGP path exploration can falsely trigger route suppression at remote places. Such unexpected damp- ing phenomena can happen in a topology as small as a 5-node clique. However, our study shows, although BGP’s path ex- ploration process is the driving force behind the routing dy- namics, the specific pattern of dynamics is largely shaped by a previously unknown interaction among reuse timers at differ- ent routers. One of our observations that plays an important role in this work is that there are two types of route reuse timer expiration events: • Noisy expiration, which triggers some routing updates. This is because the route being reused becomes the best route, and changes the Local-RIB and RIB-OUT. • Silent expiration, which does not trigger any routing up- date. This is because the route being reused is not the best route and makes no change to Local-RIB or RIB-OUT. III. SIMULATION METHODOLOGY We use two types of network topologies in the simulation: mesh and Internet-like. A mesh topology is a 2-dimensional grid in which nodes at opposite edges are connected (Fig. 3(a)). All nodes in a mesh are topologically equal. An Internet-like topology [1] is derived from the Internet inter-AS connectivity graph, and has long-tailed distribution for node degree. Using 3 (a) 3 x 3, degree = 4 (b) flapping interval Fig. 3. Sample topology and flapping interval the mesh topology enables us to vary network size and node de- gree independently, while using the Internet-like topology gives more confidence on the result’s applicability to the real Inter- net. We have tested mesh topologies ranging from 36 nodes to 900 nodes, degree 4 to 12, and Internet-like topologies ranging from 29 nodes to 830 nodes. For the ease of presentation, in this paper we show the results from a 100-node mesh topology with node degree of 4 unless otherwise stated. The impact of network size and node degree is discussed in Section IV-F. Given a network topology, an additional node called origi- nAS is attached to an existing node called hostAS in the net- work (Fig. 3(a)). Before the simulation starts, every node has a stable route to the originAS. During the simulation, the origi- nAS sends route announcements and withdrawals alternately to the hostAS to simulate route flapping. A pair of a withdrawal and its following announcement is called a pulse. After a cer- tain number of pulses, the originAS stops flapping, and its final update is a route announcement. The choice of hostAS makes no difference in mesh topologies. In Internet-like topologies, it affects the results quantitatively, but exhibits similar dynamics patterns. We use I down to denote the time interval between a with- drawal and its following announcement, I up to denote the time interval between an announcement and its following with- drawal, and I (= I down +I up ) to denote the time interval be- tween two nearest withdrawals or announcements (Fig. 3(b)). In this paper, only fixed-rate flapping patterns, i.e., I down and I up are constants during each simulation run, are used. The default values are I down = I up = 60 seconds. The impact of flapping interval is discussed in Section IV-D. Two BGP performance metrics, convergence time and mes- sage overhead, are used to quantify the dynamics. The conver- gence time is defined as the time from when the originAS stops flapping (i.e., sending its final route announcement) to when the last BGP update message is observed in the network. The mes- sage overhead is the total number of update messages observed in the network starting from the first flap. We model BGP as a Simple Path Vector Protocol (SPVP) [11], and conduct simulations in SSFNet [12] with our im- proved damping implementation. We use the default 30 sec- onds for MRAI timer, 2ms for link delay, 100ms for processing delay, and Cisco default damping parameters. Larger link delay (e.g., 20ms) and smaller processing delay (e.g., 10ms) give sim- 0 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 8 9 10 Convergence Time (second) Number of Pulses No Damping (simulation, mesh) Full Damping (simulation, mesh) Full Damping (simulation, Internet) Full Damping (calculation) Fig. 4. Convergence Time 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 0 1 2 3 4 5 6 7 8 9 10 Number of Updates Number of Pulses No Damping (simulation, mesh) Full Damping (simulation, mesh) Full Damping (simulation, Internet) Fig. 5. Message Overhead ilar results. The impact of damping parameters is discussed in Section IV-E. We assume damping is enabled by all nodes and the same damping parameters are used throughout the network. The impact of partial deployment and inconsistent parameters is among our future work. IV. BGP DYNAMICS DURING DAMPING Fig. 4 and Fig. 5 show the BGP convergence time and mes- sage overhead as the number of pulses,n, increases. The shapes of these curves are typical in most simulations, including both mesh and Internet-like topologies. In this section, we explain these curves in detail, reveal the underlying causes, and study the impact of various factors. A. Route Flapping without Damping We first study the case that damping is turned off at all nodes to understand the effect of route flapping only. Labovitz et. al. [5] categorize BGP convergence events into four types: T down (a previously available route is withdrawn), T up (a previously unavailable route is announced available), T long (a route is replaced by another with longer AS path), and T short (a route is replaced by another with shorter AS path). Our flapping pattern is a series of alternateT down andT up . Ta- ble II comparesT down andT up against a single pulse. Confirm- ing results from previous work, BGP’s path exploration causes long convergence time and lots of update messages afterT down . For T up and flapping, the convergence time is the time for the final route announcement to propagate through the entire net- work. In the case of pureT up , since the network has been quiet 4 0 50 100 150 200 250 300 350 400 0 60 120 180 240 300 360 420 480 540 600 660 720 Number of Updates Time (second) Np = 4 Fig. 6. Update Series (no damping,n = 4) 0 50 100 150 200 250 300 350 400 0 60 120 180 240 300 360 420 480 540 600 660 720 Number of Updates Time (second) Np = 5 Fig. 7. Update Series (no damping,n = 5) before the event, MRAI timer does not apply, and the only lim- iting factors are link delay and processing delay. During route flapping, however,T up happens before routing updates caused by the previousT down have finished. Therefore, MRAI timer takes effect to limit the propagation rate, resulting in longer convergence time. This convergence time is mainly determined by the network topology and MRAI timer, regardless of the number of flaps. This is why we see an almost flat line in Fig. 4. To see why flapping’s message overhead increases linearly with the number of pulsesn, we plot the update series forn = 4 (Fig. 6) andn = 5 (Fig. 7), showing the number of update mes- sages observed in the network within each second. The graph is composed of withdrawal-induced spikes at every I = 120 seconds (i.e., the interval between two withdrawals), and small fluctuations clustered around every MRAI interval (i.e., 30 sec- onds). Comparing the two graphs, we can see each 120-second period exhibits similar pattern, implying that each pulse con- tributes similar amount of updates to the total message count independently. T down Tup Flapping (n=1) Convergence Time 735 s 1 s 139 s Message Overhead 12469 302 2424 TABLE II FLAPPING V.S. T down ANDTup B. The Intended Behavior of Damping According to the damping algorithm, the routing dynamics can be described by a simple analytical model. When damp- ing is enabled, persistent flaps from the originAS will trigger the hostAS to suppress its route to the originAS. After the flap- ping stops, it will take time R for the penalty value p to drop below the reuse threshold P reuse before the hostAS can send out the route announcement. The announcement will trigger a T up event which takes timeCT up for the network to converge. Since usuallyRCT up , the total convergence time is CT =R+CT up 'R R = 1 λ ln p P reuse Let w(i) be the time between the ith flap and the (i− 1)th flap, and f(i) be the penalty increment caused by the ith flap, i = 1,2,...,k− 1,k, andw(1) = 0. Right after thekth flap, the penalty valuep(k) will be p(k) =p(k−1)∗e −λw(k) +f(k) = k−1 X i=1 [f(i)∗e −λ P k j=i+1 w(j) ]+f(k) In our simulation, one pulse comprises of two flaps, a with- drawal and the following re-announcement. Assuming the penalty increases byP W for each withdrawal andP RA for each re-announcement, our fixed-rate flapping pattern can be defined as w(i) = I down i = 2m I up i = 2m−1 f(i) = P W i = 2m−1 P RA i = 2m where m = (1,2,...) and w(1) = 0. Given these functions, we can derive the penalty value right after the nth pulse, and the result is p =z∗ n X i=1 e −λI(i−1) = 1−e −λIn 1−e −λI ∗z z =P W e −λI down +P RA Using Cisco default parameters (Table I) in above equations, we calculate the convergence time and plot it in Fig. 4. When number of pulses n = 1 and 2, route suppression is not trig- gered (Fig. 2) and the convergence time should be the same as that of no damping. When n ≥ 3, route suppression is trig- gered and the convergence time should go up. This is the in- tended convergence time by the damping algorithm, and is the price that damping is willing to pay for routing stability. It is determined only by the flapping pattern and damping parame- ters,w(i) andf(i), at hostAS, regardless of anything else in the network. The intended message overhead by the damping algorithm generally cannot be obtained analytically, since it depends on 5 the network topology and timing of updates. Nevertheless, it is expected to be almost constant when n ≥ 3, comprising of messages before hostAS suppresses the route and messages af- ter hostAS reuses the route, because pulses after the first two will be suppressed and not be able to cause any update in the network. C. Route Flap Damping 1) The Basic Dynamics Pattern: In Fig. 4 and Fig. 5, sim- ulations on both mesh topology (100 nodes) and Internet-like topology (110 nodes) exhibit the same curve shape: the sim- ulation results match the calculated values very well when the number of pulses is large, but are substantially higher when the number of pulses is small. We call the simulation result con- formal when it matches the intended value, and non-conformal when it does not. The turning point,N h , of the curve is defined as a certain number of pulses, that when n≥ N h the curve is conformal, and whenn <N h the curve is non-conformal. The existence ofN h is the basic dynamics pattern we have observed in simulations with a wide range of flapping intervals, topolo- gies and damping parameters. In this subsection, we examine the underlying reason for this pattern. 2) Reuse Timer Interaction: Mao et. al. [8] studied the case of n = 1 in full-mesh topologies. They showed convincingly that even there is only one pulse, some routers in the network may receive multiple update messages due to BGP path explo- ration caused by the route withdrawal. Therefore route suppres- sion may happen when n = 1, not at hostAS, but somewhere remotely in the network. The network will not converge until the route is reused, and this exacerbates the convergence time. However, even after taking into account path exploration, the convergence time at n = 1, more than 3000 seconds in Fig. 4, is still too high. Since the penalty decays exponentially, sup- pressing a route for longer than 3000 seconds requires a very high penalty value, namely a large number of updates between two peers, which is not likely to be done by path exploration within I down = 60s. Besides, path exploration alone cannot explain the curve forn≥ 2 either. There must be another force affecting the dynamics. We discovered that the missing fac- tor is reuse timer interaction, a previously unknown interaction during damping. To explain the reuse timer interaction, we plot update se- ries and damped link count for n = 1,3,5 (Fig. 8). The up- date series shows the number of update messages observed in the network within every 5 seconds; the damped link count shows the total number of links being suppressed in the net- work. Since there are 200 links in our 100-node mesh topol- ogy, and each link can be suppressed by either end, the upper bound on damped link count is 400. We now discuss the cases ofn = 1,2,3,4,5 one by one. n = 1: Fig. 8(a)(d) clearly show that there are three dis- tinct time periods during the convergence process. • Charging period, from the beginning of the simulation to the 120th second, when a large number of messages are being sent and the damped link count increases rapidly, indicating that path exploration is happening. The result is that many links are suppressed and more than half of the nodes lost their routes to the originAS. 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Penalty Time (seconds) Cut-off Threshold Reuse Threshold Fig. 9. Damping Penalty on Link 46-47 (n=1) • Suppression period, from the 120th second to the 1574th second, when the network is quiet with no update message nor change in damped link count. • Releasing period, from the 1574th second to the 5147th second 1 , when previously suppressed routes are being reused as their reuse timers expire. Unlike that all route suppressions happen in a relatively short charging period, the release of all reuse timers takes long time. The releasing period accounts for about 70% of total conver- gence time and 30% of total message count. This is the part that path exploration cannot explain. Further examination of simu- lation data shows that it is the interaction among reuse timers that stretches the releasing period. At first, reuse timers expire pretty fast as shown by the rapid drop of damped link count between the 1574th and the 2000th second. However, updates generated by noisy reuse timer expiration will arrive at some other nodes and increase their damping penalty. As a result, some reuse timers that have not expired will be postponed by these updates. We call this kind of interaction the secondary charging effect. Sometimes this effect can not only postpone existing reuse timers, but also cause new route suppressions. We pick a long-lasting reuse timer and plot its penalty over time in Fig. 9. After the charging period, the penalty decays smoothly to below the reuse threshold. But soon it is pushed back to above cut-off threshold again by the secondary charg- ing effect. Before the route is eventually reused, the penalty is pushed back by another three times. Secondary charging may happen more than once, causing some reuse timers being post- poned again and again. This effect stretches the releasing period and exacerbates convergence time. n = 2: When the second pulse comes, more reuse timers are set in the charging period due to extra routing updates. During the releasing period, the secondary charging effect will be able to affect more nodes and makes the convergence time longer. n = 3: Based on our simulation setting, the third pulse will trigger hostAS to suppress its route to the originAS. We useRT h to denote this special reuse timer. There are two inter- esting observations in Fig. 8(b) and (e) aboutRT h . One is that during the first part of releasing period, i.e., between the 1575th and the 1927th second, a lot of reuse timers that had noisy expi- 1 Timers expired after the 5147th second are silent and do not contribute to either convergence time or message overhead. 6 0 50 100 150 200 250 300 350 400 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of Updates Time (seconds) C S 0 10 20 30 40 50 60 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 R Np = 1 (a) n = 1; C (charging), S (suppression), R (releasing) 0 50 100 150 200 250 300 350 400 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of Updates Time (seconds) M SC Np = 3 (b) n = 3; M (muffling), SC (strong sec- ondary charging) 0 50 100 150 200 250 300 350 400 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of Updates Time (seconds) Np = 5 (c)n = 5 0 50 100 150 200 250 300 350 400 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of Links being suppressed Time (seconds) C S R Np = 1 (d) n = 1; C (charging), S (suppression), R (releasing) 0 50 100 150 200 250 300 350 400 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of Links being suppressed Time (seconds) M SC Np = 3 (e) n = 3; M (muffling), SC (strong sec- ondary charging) 0 50 100 150 200 250 300 350 400 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of Links being suppressed Time (seconds) Np = 5 (f)n = 5 Fig. 8. Update Series and Damped Link Count (Full Damping) ration previously now expire silently. This is due to the muffling effect caused byRT h . When the hostAS suppresses its route to the originAS upon receiving the third pulse, it has no alternate route and has to send a withdrawal to its other peers. Therefore, damping does not stop withdrawals when the prefix becomes unreachable. For the same reason, this withdrawal will propa- gate through the entire network and eventually every node has no route to the prefix. Since all future route announcements from the originAS are suppressed by the hostAS, this situation of no route does not change until RT h expires. Therefore any reuse timer expiration beforeRT h becomes silent. Another observation is the powerful secondary charging caused byRT h . The hostAS reuses its route at the 1927th sec- ond, which restores the route to some nodes and removes the muffling effect. When some other reuse timers expire shortly after the 2000th second, both the message count and damped link count surge to a high level. The impact is so powerful that a new plateau is formed in Fig. 8(e), which means many new reuse timers are set. Since all other routers rely on hostAS to reach originAS, the hostAS’s route reuse potentially can affect all routers, and cause more powerful secondary charging. These two types of reuse timer interaction compete against each other: the secondary charging effect stretches the conver- gence time, while the muffling effect reduces the convergence time by having reuse timers expire silently. The net result in this simulation run is a somewhat shorter convergence time at n = 3 thann = 2. n = 4: When the originAS flaps more, additional flaps are only experienced by the hostAS. Therefore, the only effect of flaps n > 3 is to postpone RT h only, while reuse timers in the rest of the network keep the same. Larger RT h is able to muffle more reuse timer expiration and shorten convergence time further. n = 5: In Fig. 8(c)(f), RT h has been postponed for so long that it becomes the last one to expire. At the time of its expiration, all other reuse timers have expired silently due to the muffling effect, leaving no secondary charging effect but a single T up event. From this point on (n ≥ 5), the conver- gence time is solely determined by whenRT h fires, exactly the intended behavior of the damping algorithm, and the curve be- comes conformal. Comparing graphs of damped link count in Fig. 8, (f) shows when all the reuse timers are scheduled to ex- pire after path exploration, (d) shows the secondary charging effect stretches the expiration time, and (e) shows the expira- tion time is restored to the originally scheduled time due to the muffling effect. Denoting the last reuse timer to fire in the network byRT net , the curves turning point,N h , is when RT h >RT net The same reasoning can be applied to explain the results on message overhead (Fig. 5). Whenn< 3, the message overhead 7 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 0 2 4 6 8 10 Convergence Time (second) Number of Pulses I-down = 30s, I-up = 60s I-down = 60s, I-up = 60s I-down = 90s, I-up = 60s (a)I down 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 0 2 4 6 8 10 Convergence Time (second) Number of Pulses I-down = 60s, I-up = 30s I-down = 60s, I-up = 60s I-down = 60s, I-up = 90s (b)Iup 0 1000 2000 3000 4000 5000 6000 0 1 2 3 4 5 6 7 8 9 10 Convergence Time (second) Number of Pulses I-down = 500s, I-up = 500s I-down = 1000s, I-up = 1000s I-down = 3000s, I-up = 3000s (c) Long Interval 2000 2500 3000 3500 4000 4500 0 2 4 6 8 10 Number of Updates Number of Pulses I-down = 30s, I-up = 60s I-down = 60s, I-up = 60s I-down = 90s, I-up = 60s (d)I down 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 0 2 4 6 8 10 Number of Updates Number of Pulses I-down = 60s, I-up = 30s I-down = 60s, I-up = 60s I-down = 60s, I-up = 90s (e)Iup 0 5000 10000 15000 20000 25000 30000 0 1 2 3 4 5 6 7 8 9 10 Number of Updates Number of Pulses I-down = 500s, I-up = 500s I-down = 1000s, I-up = 1000s I-down = 3000s, I-up = 3000s (f) Long Interval Fig. 10. Impact of Flapping Intervals increases due to path exploration and secondary charging. After n = 3, the muffling effect reduces the message overhead, until n = 5, whenRT h > RT net . From that point on, the message overhead becomes almost constant because it is the sum of two relatively constant parts: the message count during the charging period, and the message count caused by the lastT up event. 3) Summary: The convergence process during flapping and damping comprises of three periods, charging, suppression and releasing. The charging period contributes major portion of to- tal message overhead, while the releasing period contributes major portion of convergence time. There are two types of reuse timer interaction competing against each other: the secondary charging effect stretches convergence time, but the muffling ef- fect reduces the number of noisy timer expiration. When the number of pulses is greater than the turning point (N h ), the reuse timer at hostAS (RT h ) will outlast the last reuse timer in the network (RT net ), making the muffling effect dominant and bringing the convergence time and message overhead confor- mal with intended values. D. Flapping Intervals Though the flapping interval I down = I up = 60s is chosen rather arbitrarily, the insight obtained from the simulation helps understand the dynamics caused by other interval values. 1) I < I s : One important condition of the basic dynamics pattern is the existence of RT h . According to the damping al- gorithm, when flapping interval is fixed, the penalty value will eventually reach an upper limit, when the penalty increment caused by one flap equals to the decay since last flap. This penalty limit decreases as the flapping interval increases, and when it is less than the cut-off threshold P cut , route suppres- sion will never be triggered at hostAS and RT h will never be set. By solving P cut =p(k) =p(k−1)∗e −λw(k) +f(k) =p(k−1) we can obtain the maximum intervalI s forRT h to be set, and with our parameters,I s = 900s. Fig. 10 (a)(b)(d)(e) show the impact of flapping intervals from 30 seconds to 90 seconds. Results on 10 seconds and 180 seconds are not shown, but similar to Fig. 10. There are two ob- servations. First, the convergence time and message overhead at n = 1 are determined by I down only. LongerI down allows path exploration to complete more, resulting in longer conver- gence time and more message overhead. Second, whenn≥ 5, the convergence time is mainly determined byI =I down +I up , as more frequent flaps cause longer convergence time. For data points in between, the exact values are up to the subtle timing of message transmission and various BGP timers, but all curves follow the same basic shape. Overall, when I ≤ I s , differ- ent flapping intervals change data points quantitatively, but the basic dynamics pattern holds. 2) I s < I < RT ∅ h : Fig. 10 (c)(f) show the results for I down = I up = 500s, so I = 1000s > I s = 900s. Even though it is impossible for RT h to be set, the basic shape 8 0 50 100 150 200 250 300 350 400 450 0 1000 2000 3000 4000 5000 6000 7000 Number of Updates Time (second) flaps (a)I down = Iup = 500s;n = 3 0 50 100 150 200 250 300 350 400 450 0 1000 2000 3000 4000 5000 6000 7000 8000 Number of Updates Time (second) (b)I down = Iup = 500s;n = 7 0 50 100 150 200 250 300 350 400 450 0 2000 4000 6000 8000 10000 12000 14000 Number of Updates Time (second) (c)I down = Iup = 1000s;n = 5 0 50 100 150 200 250 300 350 400 0 1000 2000 3000 4000 5000 6000 7000 Number of Links being suppressed Time (seconds) (d)I down = Iup = 500s;n = 3 0 50 100 150 200 250 300 350 400 0 1000 2000 3000 4000 5000 6000 7000 8000 Number of Links being suppressed Time (seconds) (e)I down = Iup = 500s;n = 7 0 50 100 150 200 250 300 350 400 0 2000 4000 6000 8000 10000 12000 14000 Number of Links being suppressed Time (seconds) (f)I down = Iup = 1000s;n = 5 Fig. 11. Update Series and Damped Link Count (Flapping Interval) still holds, with the turning point N h = 7. This seemingly contradicting result can be explained by path exploration and reuse timer interaction too. Although hostAS will not suppress its route to originAS, some other nodes will because they re- ceive more frequent updates due to path exploration. In fact, in this particular simulation run, route suppression is triggered at nodes two or more hops away from the hostAS. Therefore, the network has two areas: nodes close to hostAS without route suppression, and nodes away from hostAS with possible route suppression. At the boundary of these two areas, there is a set of reuse timersRT h1 ,RT h2 etc., which is collectively denoted byRT ∅ h . AfterRT ∅ h is set, additional flaps will increaseRT ∅ h , but will not be able to affect any other reuse timer in the net- work. Therefore, it is thisRT ∅ h that functions similarly to orig- inalRT h (e.g., muffling, strong secondary charging) and makes convergence time and message overhead eventually conformal. This is better explained with Fig. 11 (a)(d)(b)(e) showing the update series and damped link count. Whenn = 3, the muffling effect is apparent during the early part of the releasing period as most reuse timers expire silently. By the time of 2521st sec- ond, the remaining 16 reuse timers are all at the boundary and belong toRT ∅ h . SinceRT ∅ h comprises of different timers with different expiration times, its release spans a long time period. In this simulation run, it starts at the 3279th second and lasts for 204 seconds. These multiple T up events issued in a relatively long time period make MRAI timer less effective in aggregat- ing updates, causing more update messages in the network, and many reuse timers are set again as a result. Therefore, although RT ∅ h is the last to expire, it fails to make the curve conformal. As the originAS flaps more, additional pulses help synchro- nize the different timers in RT ∅ h . The difference between two reuse timers, RT h1 and RT h2 , is determined by their penalty valuesp h1 andp h2 as RT h1 −RT h2 = 1 λ ln p h1 p h2 As shown in Fig. 6 and Fig. 7, each additional pulse causes sim- ilar number of update messages, which will add similar penalty increment to both p h1 and p h2 . Therefore, regardless of their initial values, as the originAS flaps more, p h1 p h2 approaches to 1 and (RT h1 −RT h2 ) approaches to 0. Atn = 7, the expiration time period of RT ∅ h has reduced to 15 seconds, indicated by a much more narrow peak starting at 7185th second in Fig. 11 (b), and this is good enough in our topology to have negligi- ble secondary charging effect. From this point on (n≥ 7), the convergence time and message overhead become conformal. 3) RT ∅ h < I < T : In this case, theRT ∅ h timer expires be- fore the next pulse comes. Therefore, RT ∅ h will not be able to contain flaps within the boundary, and its expiration time will not be able to accumulate to a large value out-lasting other timers in the network. Since RT ∅ h is no longer effective, con- vergence time and message overhead will not become confor- mal and there is no turning point for the curve, which is clearly shown in Fig. 10 (c) (f) whenI down = I up = 1000s. From its update series and damped link count (Fig. 11 (c) (f)), we can see that each pulse is relatively independent, has similar impact on 9 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 8 9 10 Convergence Time (second) Number of Pulses Default Setting Until 4th High-Cutoff (a) Delayed Route Suppression 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 8 9 10 Convergence Time (second) Number of Pulses Default Setting High-Reuse Half-Life 10min Hold-Down 30min (b) Accelerated Route Reuse 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 8 9 10 Convergence Time (second) Number of Pulses Default Setting RIPE-21 RIPE-22-23 RIPE-24 (c) RIPE 2500 3000 3500 4000 4500 5000 5500 0 1 2 3 4 5 6 7 8 9 10 Number of Updates Number of Pulses Default Setting Until 4th High-Cutoff (d) Delayed Route Suppression 2500 3000 3500 4000 4500 5000 5500 0 1 2 3 4 5 6 7 8 9 10 Number of Updates Number of Pulses Default Setting High-Reuse Half-Life 10min Hold-Down 30min (e) Accelerated Route Reuse 2500 3000 3500 4000 4500 5000 5500 0 1 2 3 4 5 6 7 8 9 10 Number of Updates Number of Pulses Default Setting RIPE-21 RIPE-22-23 RIPE-24 (f) RIPE Fig. 12. Impact of Damping Parameters both number of updates and damped link count, and its charging period overlaps with the releasing period of the previous pulse. After the flapping stops, there is no muffling effect to reduce reuse timer interaction during the last releasing period, and the convergence time remains at a high level. 4) T < I: For a single pulse, suppose the total time of its charging period, suppression period and releasing period is T . When T < I, the previous releasing period and the fol- lowing charging period is totally separated, which means the network has converged before the next pulse comes, and each pulse acts totally independently from each other. Since we start counting the convergence time from the last flap, i.e., the final announcement, and the time between the final announcement and its preceding withdrawal isI down , the convergence time be- comes(T−I down ), regardless of the number of pulses. Fig. 10 (c)(f) show a special case whereI down =I up = 3000s. NoteT is less than 3000 seconds read from Fig. 8. Therefore the con- vergence time is always that of a pure T up , which is 1 second from Table II. 5) Summary: In a network of only two nodes, when the flap- ping is frequent enough to reach the cut-off threshold, damping will take effect; otherwise it will not. However, it is far more complicated in large networks. WhenI < I s ,RT h will be set and the basic dynamics pattern holds; whenI s
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 818 (2004)
PDF
USC Computer Science Technical Reports, no. 820 (2004)
PDF
USC Computer Science Technical Reports, no. 819 (2004)
PDF
USC Computer Science Technical Reports, no. 606 (1995)
PDF
USC Computer Science Technical Reports, no. 670 (1998)
PDF
USC Computer Science Technical Reports, no. 656 (1997)
PDF
USC Computer Science Technical Reports, no. 585 (1994)
PDF
USC Computer Science Technical Reports, no. 644 (1997)
PDF
USC Computer Science Technical Reports, no. 806 (2003)
PDF
USC Computer Science Technical Reports, no. 655 (1997)
PDF
USC Computer Science Technical Reports, no. 648 (1997)
PDF
USC Computer Science Technical Reports, no. 613 (1995)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 786 (2003)
PDF
USC Computer Science Technical Reports, no. 603 (1995)
PDF
USC Computer Science Technical Reports, no. 724 (2000)
PDF
USC Computer Science Technical Reports, no. 631 (1996)
PDF
USC Computer Science Technical Reports, no. 673 (1998)
PDF
USC Computer Science Technical Reports, no. 674 (1998)
PDF
USC Computer Science Technical Reports, no. 797 (2003)
Description
Beichuan Zhang, Daniel Massey, Lixia Zhang. "BGP dynamics during route flap damping." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 805 (2003).
Asset Metadata
Creator
Massey, Daniel (author), Zhang, Beichuan (author), Zhang, Lixia (author)
Core Title
USC Computer Science Technical Reports, no. 805 (2003)
Alternative Title
BGP dynamics during route flap damping (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
12 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269737
Identifier
03-805 BGP Dynamics during Route Flap Damping (filename)
Legacy Identifier
usc-cstr-03-805
Format
12 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/