[email protected]
[Top] [All Lists]

Re: Updated: draft-zinin-microloop-analysis-01.txt

Subject: Re: Updated: draft-zinin-microloop-analysis-01.txt
From: mike shand
Date: Fri, 03 Jun 2005 10:56:34 +0100
At 16:43 31/05/2005 -0400, Alia Atlas wrote:
Stewart & Alex,

At 02:02 PM 5/27/2005, Stewart Bryant wrote:
     Primary neighbor
          Neighbor N of router S is considered S's primary neighbor for
          destination D, if N provides the shortest path to D according
          to the SPF calculation.

SB> Need to say something formal like selecting N such that
SB> Dopt(N.D) is minimised.
AA> There's always the possibility that a potential primary neighbor isn't
selected. Consider the case
AA> where a router only selects up to 4 equal-cost paths, and there are 5
or more. The definition should
AA> handle this case as well.

 2.2 Next hop safety condition

   We start the analysis with the following observation:

     When router X learns about a topology change and starts using
     neighbor Y as its new primary neighbor for a given destination, a
     microloop between X and Y can only form if the topology before
     failure or topology after failure are such that Y uses X as its
     primary neighbor for the same destination.

SB> I don't think that this is quite right. You say that X uses
SB> Y as its new next hop, AND Y uses X as it's new next hop. That
SB> would be a failure of the IGP, which is out of scope.
AA> Perhaps a better way to phrase it is:

AA> "... a microloop between X and Y can only form if the topology before the failure is such that Y used
AA> X as its primary neighbor for the same destination."

AA> And then need to clarify the opposite case as well - where the roles of X and Y are reversed. I think
AA> you were trying to - but I agree that it isn't clear.

   Routers SHOULD use the symmetric-link safety condition by default,
   MAY attempt to dynamically determine the method that needs to be
   applied based on the topological information from the routing

SB> I think that we need to discuss which algorithm should
SB> be the default. Given that many networks that are thought
SB> to be symmetric turn out to be asymmetric, it's not clear
SB> which we should choose and why.
AA> How many of the symmetric networks that actually turn out to be
asymmetric have multi-hop loops in
AA> them? Couldn't this be something that was flagged by a MIB - to
indicate that the "symmetric"
AA> network isn't really. Surely this is something that the network
operators would want to know so that it
AA> can be corrected??
Yes, I wonder how much of a problem this really is. Given that the
algorithm doesn't prevent all loops anyway, then a small increment in the
number of loops caused by incorrectly handled asymmetric cost cases doesn't
seem to be much a price to pay, especially since using the stronger
condition to handle them correctly will result in the overall coverage
being less. i.e. the total number of loops may get WORSE by using the
asymmetric cost fixing algorithm.
I know.... something for me to simulate :-)

AA> Another related question is how does PLSN work with max-cost links? How should it work? Is it AA> acceptable to use a max-cost link to reach a safe neighbor that isn't a potential primary neighbor on AA> either the old or new topology? That seems potentially bad to me, since it could cause additional
AA> traffic loss, depending on why the link was set to max-cost.
I think a max-cost link should be treated as unreachable, since that is
probably why it was set to max cost.


3.3 IP Fast Reroute Considerations

   If the router implements [IPFRR] and performs local failure repair,
   procedures describes in this document still need to be applied in
   order to prevent micro-loops while reconverging on the new topology.

SB> This is stricter than it should be. Say we implement basic [IPFRR]
SB> AND some other enhanced mechanism. We may wish to use some other
SB> mechanim in place of this.
AA> I think that the intention should be to say that PLSN is useful to
avoid micro-loops during
AA> re-convergence and this benefit is not provided simply by using basic
[IPFRR] or another repair
AA> mechanism. Both a repair mechanism and a convergence control
mechanism are desirable.
AA> I do think it would be useful to specify the risks/undesirability of
using PLSN without a repair
AA> mechanism when the topology change includes failures.

AA> I do agree with Stewart that the phrasing should consider the possibility of future techniques being
AA> introduced.

   Another difference is when the router could not repair the failure,
   the new primary next-hops do not satisfy the safety condition, and
   there's no other neighbor that does, i.e. a type-C situation. Unlike
   other routers in the network, the router directly connected to the
   network does not have the old next-hop any more, and cannot continue
   using it. In this situation, the router MUST revert to the regular
   convergence procedures, and update the route with the new next-hops
   with no additional delay.

SB> We need to think about this some more. When we have an imperfect
SB> repair we need to consider the "greater good" and that might
SB> be to control the convergence of the rest of the network.
AA> Given that no other router in the network is aware that the router (S)
doesn't have an alternate, I'm
AA> not sure what better option can exist. I think that the convergence
of the rest of the network is being
AA> controlled. The micro-loops related to S are not being handled. The
worst-case that I see is that S
AA> uses a neighbor N where that neighbor is type-B and is using S as its
safe neighbor. I do agree that
AA> we need to think about this more.

3.4 Architectural Constants

   The following architectural constants have been used in the descrip-
   tion of the algorithm above:

          The delay between the moment the router receives a topology
SB> s/a/the first/
          update after a period of stability and the moment it starts
          its routing table recalculation.  This delay is necessary to
          collect multiple updates originated by different routers that
          relate to the same topological event.

SB> We might want to more formally state the start/inhibit criteria
AA> I agree.

          Periods of time used by the router to delay installation of
          new primary next-hops after a topology change when the router
          has (type-B) or has not (type-C) a safe neighbor to temporary
          divert the traffic to in the meantime.

   While correctness and effectiveness of the algorithm described here
   does not depend on the actual values assigned to the architectural
   constants, it does depend on the relationship between them, and the
   assumption that all routers in the same network use the same values.

   To satisfy these constrains, and yet allow these delays to be
   decreased as implementations continue to improve towards faster con-
   vergence, this document defines the architectural constants as con-
   figurable, specifies the required relationship between the values,
   and the default values that should be used by the implementations.

SB> I wonder if we need to signal these, for example in the LSP/LSA
SB> I am concerned that there is little chance that all routers
SB> in the network will be correctly configured. The trouble is
SB> that if there is a mis-config it will be very hard to detect.
AA> What would be done by the routers with this additional
information? Why isn't this a management
AA> problem? These values could be in (yet another) MIB & then the values
of the routers could be
AA> compared. I don't like the idea of adding signaling to check for
inconsistency - when all the router
AA> could do on detecting this would be a log or, I guess, disabling the
functionality in the case of mis-AA> matches.
I think I agree with Alia here. While at first sight it seems that there
might be something you could do with advertising these things in the
protocol, life can get very complicated when you start considering what
happens when various routers and or regions of the network come and go.
There is a very real danger that an "automated"dynamic synchronization
scheme would result in more errors than a manual static one.
Simply using an advertisement in the protocol to give a warning that some
static misconfiguration has been made (as I think Stewart was suggesting)
is more workable, but seems like a poor use of the protocol, especially
since (as Alia points out) the information should be available for
management application to check anyway.


Rtgwg mailing list
[email protected]
Rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>