[Top] [All Lists]

## Re: Updated: draft-zinin-microloop-analysis-01.txt

 Subject: Re: Updated: draft-zinin-microloop-analysis-01.txt Stewart Bryant Fri, 27 May 2005 19:02:12 +0100
 ```Alex Some comments on: draft-zinin-microloop-analysis-01.txt -------------------------------- 2.1 Terminology The following terms are used in the draft. Downstream neighbor Neighbor N of router S is considered S's downstream neighbor for destination D, if Dopt(N, D) < Dopt(S, D) SB> You need to provide a definition of the function Dopt() -------------------------------- Primary neighbor Neighbor N of router S is considered S's primary neighbor for destination D, if N provides the shortest path to D according to the SPF calculation. SB> Need to say something formal like selecting N such that SB> Dopt(N.D) is minimised. ------------------------------- Loop-free neighbor Neighbor N of router S is considered S's loop-free neighbor for destination D, if Dopt(N, D) < Dopt(N, S) + Dopt(S, D). Note that a loop-free neighbor may be, for example, router's primary before or after failure. SB> or -> and/or -------------------------------- 2.2 Next hop safety condition We start the analysis with the following observation: When router X learns about a topology change and starts using neighbor Y as its new primary neighbor for a given destination, a microloop between X and Y can only form if the topology before failure or topology after failure are such that Y uses X as its primary neighbor for the same destination. SB> I don't think that this is quite right. You say that X uses SB> Y as its new next hop, AND Y uses X as it's new next hop. That SB> would be a failure of the IGP, which is out of scope. --------------------------------- Indeed, if the topologies before and after failure are such that Y does not use X as it's next hop, then there is no moment in time before Y learned about the failure or after it learned about it when it would forward traffic to X. Hence, at least one of the two topologies must be such that Y uses X as its next hop for a microloop between X and Y to form. SB> You need to introduce the symmetry constraint earlier, or to SB> discuss multi-hop microloops here. ----------------------------------- Based on the above, we can define a safety condition for neighbor Y of router X that has just learned about a topology change. Note that the condition must satisfy the topological criteria above, and be non-recursive, i.e. not lead to loops if both X and Y follow it. Next-hop safety condition: For networks with symmetric link costs, after a topology SB> I think that symmetry needs to go up a level in the doc SB> structure. ------------------ For networks with asymmetric link costs, the safety condition is mod- ified as follows: Y is X's downstream neighbor based on the topology both before AND after the change. SB> I think that it would be clearer if you say that this IS the SB> new safety criteria. It's not clear at first sight that this SB> is the complete SC. ------------------ Routers SHOULD use the symmetric-link safety condition by default, MAY attempt to dynamically determine the method that needs to be applied based on the topological information from the routing SB> I think that we need to discuss which algorithm should SB> be the default. Given that many networks that are thought SB> to be symmetric turn out to be asymmetric, it's not clear SB> which we should choose and why. ------------------ Type A Routers whose new primary next-hops after the topology change are safe and transition to them will not create a microloop. Two subtypes are recognized: A1: Routers whose primaries haven't changed as a result of the topology change A2: Routers whose new primary satisfies the safety condition SB> I know that you get to ECMP later, but you are setting out a SB> taxonomy here, and I wonder whether you need to say something SB> about them earlier. SB> For example you might want to think about these being a property SB> of the tupple. --------------------- It is clear that type-A routers can immediately switch to their new primary next hops once they are calculated after the topology change. SB> Strictly type A2, A1 does not switch at all. ------------------- Type B: Without an additional delay, the route SHALL be updated with one or more temporary next-hops that satisfy the safety condi- tion. These temporary next-hops SHALL be used for the duration of DELAY_TYPEB. After DELAY_TYPEB, the route SHALL be updated with the new primary next-hops. SB> Should align the language with Type A above. -------------------- Type C: The route's old (primary) next-hops SHALL continue to be used for DELAY_TYPEC. After DELAY_TYPEC, the route SHALL be updated with the new primary next-hops. If, after expiration of DELAY_SPF, the router receives a topology update sooner than DELAY_STABLE after the previous one, the router MUST fall back to the regular convergence mechanisms (immediate installation of the new primary next-hops) aborting any transition processes initiated as part of procedures described here (i.e., if DELAY_TYPEB or DELAY_TYPEC timers are still running), MUST recalcu- late its routing table as soon as practical, and MUST refrain from using the mechanisms described here until it has seen no topological updates for at least DELAY_STABLE. SB> I think that you can say this formally by saying that SB> timers DELAY_TYPEB and DELAY_TYPEC must be immediately SB> expired. ------------------------- 3) None of the new next-hops satisfy the safety condition, how- ever, there's at least one other neighbor that satisfies it (a type-B situation) SB> more formerly a combination of type-A and type-B ------------------------- For situation 2 (an A/B or an A/C combination), the implementation: 1) SHALL update the route with the new next-hops that satisfy the safety condition without an additional delay 2) SHALL add the remaining new next-hops after DELAY_TYPEB. SB> maybe clarify type B is NOT used, although, there seems to SB> be no protocol reason why use type B cannot be used. ------------------------ 3.3 IP Fast Reroute Considerations If the router implements [IPFRR] and performs local failure repair, procedures describes in this document still need to be applied in order to prevent micro-loops while reconverging on the new topology. SB> This is stricter than it should be. Say we implement basic [IPFRR] SB> AND some other enhanced mechanism. We may wish to use some other SB> mechanim in place of this. ---------------------- Another difference is when the router could not repair the failure, the new primary next-hops do not satisfy the safety condition, and there's no other neighbor that does, i.e. a type-C situation. Unlike other routers in the network, the router directly connected to the network does not have the old next-hop any more, and cannot continue using it. In this situation, the router MUST revert to the regular convergence procedures, and update the route with the new next-hops with no additional delay. SB> We need to think about this some more. When we have an imperfect SB> repair we need to consider the "greater good" and that might SB> be to control the convergence of the rest of the network. ---------------------- 3.4 Architectural Constants The following architectural constants have been used in the descrip- tion of the algorithm above: DELAY_SPF The delay between the moment the router receives a topology SB> s/a/the first/ update after a period of stability and the moment it starts its routing table recalculation. This delay is necessary to collect multiple updates originated by different routers that relate to the same topological event. SB> We might want to more formally state the start/inhibit criteria DELAY_STABLE Period of time, during which the network topology is consid- ered to be stable if the router receives no topological updates. When the first update after DELAY_STABLE is received, all other updates that fit within DELAY_SPF are considered as related to a single topological event. SB> Do we need to say restarted on receipt of ANY topo event. DELAY_TYPEB and DELAY_TYPEC Periods of time used by the router to delay installation of new primary next-hops after a topology change when the router has (type-B) or has not (type-C) a safe neighbor to temporary divert the traffic to in the meantime. While correctness and effectiveness of the algorithm described here does not depend on the actual values assigned to the architectural constants, it does depend on the relationship between them, and the assumption that all routers in the same network use the same values. To satisfy these constrains, and yet allow these delays to be decreased as implementations continue to improve towards faster con- vergence, this document defines the architectural constants as con- figurable, specifies the required relationship between the values, and the default values that should be used by the implementations. SB> I wonder if we need to signal these, for example in the LSP/LSA SB> I am concerned that there is little chance that all routers SB> in the network will be correctly configured. The trouble is SB> that if there is a mis-config it will be very hard to detect. ---------------------------- The following equations define the relationship between the constants that needs to be maintained in order for the mechanism described here to provide desireable results: DELAY_SPF > update-propagation-time DELAY_STABLE > DELAY_TYPEB > DELAY_TYPEC > fault-propagation-time SB> I deleted soem of the text from this edit. Did you say that SB> FIB updates MUST be completed by expiry of appropriate timer? ------------------------------ The implementations SHOULD use the following default values for the architectural constants: Constant Default val ---------------------------------------- DELAY_SPF 500 msec DELAY_TYPEC 2 sec DELAY_TYPEB 4 sec DELAY_STABLE 10 sec SB> On what basis? I imagine that you pulled these out of a hat, SB> and I suspect that they are as good values, but I think SB> that you need to say that is what you did. --------------------- 4 Coverage analysis SB> Strictly what follows is not a coverage analysis The above algorithm minimizes the probability of loop formation. More specifically, loops will only be possible when two neighboring routers both experience the type C condition after the topology change. Appendix A shows that transitions between A-A, A-B, A-C, and B-C routers are loop-free. While this mechanism does not remove all possible micro-loops, it addresses the majority of them in topologies with a reasonable level of physical redundancy. Topologically, micro-loop coverage provided by this algorithm is SB> Missing text 5 Security Considerations The mechanism described in this document does not modify any routing protocol messages, and hence no new threats related to packet modifi- cations or replay attacks are introduced. The mechanism changes cer- tain delays used in node-local algorithms and introduces partial event ordering after a topology change has occured. This, however, does not introduce new security risks. For type-B situations, traffic to certain destinations can be temporarily routed via next-hop routers that would not be used with the same topology change if this mechanism wasn't employed. However, these next-hop routers can be used anyway when a different topological change occurs, and hence this can't be viewed as a new security threat. SB> Isn't there a threat in which some vulnerable link is failed SB> causing extended C-C outage? Appendix A. Loop formation analysis SB> NP is a dangerous term to use in an analysis :) SB> I will look at the Appendix another time. - Stewart _______________________________________________ Rtgwg mailing list [email protected] https://www1.ietf.org/mailman/listinfo/rtgwg ```
 Current Thread Updated: draft-zinin-microloop-analysis-01.txt, Alex Zinin Re: Updated: draft-zinin-microloop-analysis-01.txt, Stewart Bryant <= Re: Updated: draft-zinin-microloop-analysis-01.txt, Stewart Bryant