[email protected]
[Top] [All Lists]

Re: Updated: draft-zinin-microloop-analysis-01.txt

Subject: Re: Updated: draft-zinin-microloop-analysis-01.txt
From: Stewart Bryant
Date: Fri, 27 May 2005 19:02:12 +0100

Some comments on: draft-zinin-microloop-analysis-01.txt


2.1 Terminology

   The following terms are used in the draft.

     Downstream neighbor
          Neighbor N of router S is considered S's downstream neighbor
          for destination D, if Dopt(N, D) < Dopt(S, D)

SB> You need to provide a definition of the function Dopt()


     Primary neighbor
          Neighbor N of router S is considered S's primary neighbor for
          destination D, if N provides the shortest path to D according
          to the SPF calculation.

SB> Need to say something formal like selecting N such that
SB> Dopt(N.D) is minimised.


     Loop-free neighbor
          Neighbor N of router S is considered S's loop-free neighbor
          for destination D, if Dopt(N, D) < Dopt(N, S) + Dopt(S, D).
          Note that a loop-free neighbor may be, for example, router's
          primary before or after failure.
SB> or -> and/or


 2.2 Next hop safety condition

   We start the analysis with the following observation:

     When router X learns about a topology change and starts using
     neighbor Y as its new primary neighbor for a given destination, a
     microloop between X and Y can only form if the topology before
     failure or topology after failure are such that Y uses X as its
     primary neighbor for the same destination.

SB> I don't think that this is quite right. You say that X uses
SB> Y as its new next hop, AND Y uses X as it's new next hop. That
SB> would be a failure of the IGP, which is out of scope.


     Indeed, if the topologies before and after failure are such that Y
     does not use X as it's next hop, then there is no moment in time
     before Y learned about the failure or after it learned about it
     when it would forward traffic to X. Hence, at least one of the two
     topologies must be such that Y uses X as its next hop for a
     microloop between X and Y to form.

SB> You need to introduce the symmetry constraint earlier, or to
SB> discuss multi-hop microloops here.


   Based on the above, we can define a safety condition for neighbor Y
   of router X that has just learned about a topology change. Note that
   the condition must satisfy the topological criteria above, and be
   non-recursive, i.e. not lead to loops if both X and Y follow it.

     Next-hop safety condition:

          For networks with symmetric link costs, after a topology
SB> I think that symmetry needs to go up a level in the doc
SB> structure.


   For networks with asymmetric link costs, the safety condition is mod-
   ified as follows:

          Y is X's downstream neighbor based on the topology both before
          AND after the change.

SB> I think that it would be clearer if you say that this IS the
SB> new safety criteria. It's not clear at first sight that this
SB> is the complete SC.


   Routers SHOULD use the symmetric-link safety condition by default,
   MAY attempt to dynamically determine the method that needs to be
   applied based on the topological information from the routing

SB> I think that we need to discuss which algorithm should
SB> be the default. Given that many networks that are thought
SB> to be symmetric turn out to be asymmetric, it's not clear
SB> which we should choose and why.


     Type A

          Routers whose new primary next-hops after the topology change
          are safe and transition to them will not create a microloop.
          Two subtypes are recognized:

          A1:  Routers whose primaries haven't changed as a result of
               the topology change

          A2:  Routers whose new primary satisfies the safety condition

SB> I know that you get to ECMP later, but you are setting out a
SB> taxonomy here, and I wonder whether you need to say something
SB> about them earlier.
SB> For example you might want to think about these being a property
SB> of the <link, neighbour, Dest> tupple.


   It is clear that type-A routers can immediately switch to their new
   primary next hops once they are calculated after the topology change.

SB> Strictly type A2, A1 does not switch at all.


     Type B:
          Without an additional delay, the route SHALL be updated with
          one or more temporary next-hops that satisfy the safety condi-
          tion. These temporary next-hops SHALL be used for the duration
          of DELAY_TYPEB. After DELAY_TYPEB, the route SHALL be updated
          with the new primary next-hops.

SB> Should align the language with Type A above.


     Type C:
          The route's old (primary) next-hops SHALL continue to be used
          for DELAY_TYPEC.  After DELAY_TYPEC, the route SHALL be
          updated with the new primary next-hops.

   If, after expiration of DELAY_SPF, the router receives a topology
   update sooner than DELAY_STABLE after the previous one, the router
   MUST fall back to the regular convergence mechanisms (immediate
   installation of the new primary next-hops) aborting any transition
   processes initiated as part of procedures described here (i.e., if
   DELAY_TYPEB or DELAY_TYPEC timers are still running), MUST recalcu-
   late its routing table as soon as practical, and MUST refrain from
   using the mechanisms described here until it has seen no topological
   updates for at least DELAY_STABLE.

SB> I think that you can say this formally by saying that
SB> timers DELAY_TYPEB and DELAY_TYPEC must be immediately
SB> expired.


     3)   None of the new next-hops satisfy the safety condition, how-
          ever, there's at least one other neighbor that satisfies it (a
          type-B situation)

SB> more formerly a combination of type-A and type-B


   For situation 2 (an A/B or an A/C combination), the implementation:

     1)   SHALL update the route with the new next-hops that satisfy the
          safety condition without an additional delay

     2)   SHALL add the remaining new next-hops after DELAY_TYPEB.

SB> maybe clarify type B is NOT used, although, there seems to
SB> be no protocol reason why use type B cannot be used.


3.3 IP Fast Reroute Considerations

   If the router implements [IPFRR] and performs local failure repair,
   procedures describes in this document still need to be applied in
   order to prevent micro-loops while reconverging on the new topology.

SB> This is stricter than it should be. Say we implement basic [IPFRR]
SB> AND some other enhanced mechanism. We may wish to use some other
SB> mechanim in place of this.


   Another difference is when the router could not repair the failure,
   the new primary next-hops do not satisfy the safety condition, and
   there's no other neighbor that does, i.e. a type-C situation. Unlike
   other routers in the network, the router directly connected to the
   network does not have the old next-hop any more, and cannot continue
   using it. In this situation, the router MUST revert to the regular
   convergence procedures, and update the route with the new next-hops
   with no additional delay.

SB> We need to think about this some more. When we have an imperfect
SB> repair we need to consider the "greater good" and that might
SB> be to control the convergence of the rest of the network.


3.4 Architectural Constants

   The following architectural constants have been used in the descrip-
   tion of the algorithm above:

          The delay between the moment the router receives a topology
SB> s/a/the first/
          update after a period of stability and the moment it starts
          its routing table recalculation.  This delay is necessary to
          collect multiple updates originated by different routers that
          relate to the same topological event.

SB> We might want to more formally state the start/inhibit criteria

          Period of time, during which the network topology is consid-
          ered to be stable if the router receives no topological
          updates. When the first update after DELAY_STABLE is received,
          all other updates that fit within DELAY_SPF are considered as
          related to a single topological event.

SB> Do we need to say restarted on receipt of ANY topo event.

          Periods of time used by the router to delay installation of
          new primary next-hops after a topology change when the router
          has (type-B) or has not (type-C) a safe neighbor to temporary
          divert the traffic to in the meantime.

   While correctness and effectiveness of the algorithm described here
   does not depend on the actual values assigned to the architectural
   constants, it does depend on the relationship between them, and the
   assumption that all routers in the same network use the same values.

   To satisfy these constrains, and yet allow these delays to be
   decreased as implementations continue to improve towards faster con-
   vergence, this document defines the architectural constants as con-
   figurable, specifies the required relationship between the values,
   and the default values that should be used by the implementations.

SB> I wonder if we need to signal these, for example in the LSP/LSA
SB> I am concerned that there is little chance that all routers
SB> in the network will be correctly configured. The trouble is
SB> that if there is a mis-config it will be very hard to detect.


   The following equations define the relationship between the constants
   that needs to be maintained in order for the mechanism described here
   to provide desireable results:

    DELAY_SPF > update-propagation-time

    DELAY_STABLE > DELAY_TYPEB > DELAY_TYPEC > fault-propagation-time

SB> I deleted soem of the text from this edit. Did you say that
SB> FIB updates MUST be completed by expiry of appropriate timer?


   The implementations SHOULD use the following default values for the
   architectural constants:

        Constant                   Default val
        DELAY_SPF                   500 msec
        DELAY_TYPEC                   2 sec
        DELAY_TYPEB                   4 sec
        DELAY_STABLE                 10 sec

SB> On what basis? I imagine that you pulled these out of a hat,
SB> and I suspect that they are as good values, but I think
SB> that you need to say that is what you did.


4 Coverage analysis

SB> Strictly what follows is not a coverage analysis

   The above algorithm minimizes the probability of loop formation. More
   specifically, loops will only be possible when two neighboring
   routers both experience the type C condition after the topology
   change. Appendix A shows that transitions between A-A, A-B, A-C, and
   B-C routers are loop-free.

   While this mechanism does not remove all possible micro-loops, it
   addresses the majority of them in topologies with a reasonable level
   of physical redundancy.  Topologically, micro-loop coverage provided
   by this algorithm is

SB> Missing text

5 Security Considerations

   The mechanism described in this document does not modify any routing
   protocol messages, and hence no new threats related to packet modifi-
   cations or replay attacks are introduced. The mechanism changes cer-
   tain delays used in node-local algorithms and introduces partial
   event ordering after a topology change has occured. This, however,
   does not introduce new security risks. For type-B situations, traffic
   to certain destinations can be temporarily routed via next-hop
   routers that would not be used with the same topology change if this
   mechanism wasn't employed. However, these next-hop routers can be
   used anyway when a different topological change occurs, and hence
   this can't be viewed as a new security threat.

SB> Isn't there a threat in which some vulnerable link is failed
SB> causing extended C-C outage?

Appendix A. Loop formation analysis

SB> NP is a dangerous term to use in an analysis :)
SB> I will look at the Appendix another time.

- Stewart

Rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>