At 01:12 20/10/2004 -0400, Alia Atlas wrote:
Hi Mike & Stewart,
I was taking another look at the Ordered SPF idea and what you've written
in draft-bryant-shand-lf-conv-frmwk-00.txt and have a few comments/questions.
First, I'd picked up the idea (looking through emails I'm not sure where
:) that you'd calculate a reverse spanning tree from each router connected
to the failure and wait based on the minimum hops. It sounds like you
are actually calculating from the failed component itself. For a
point-to-point link failure, I don't think it matters, but for a node or
broadcast link failure, it definitely does. I think that you may need the
same thing when computing for a new link coming up. More details,
particularly on the ECMP cases would be useful.
Yes. In each case you compute the reverse SPF rooted at the far end of the
failed link as indicated in the LSP you are considering. A node failure
does not appear as a single event, but causes multiple LSPs to be sent (1
for each link failure). Each of these is considered as for a link failure.
It can be shown that in the special case of an "SRLG" which is a node
failure, there can be no conflict in the required ordering of SPFs.
There's also a bit more complexity to the reverse spanning tree in terms
of what paths are inherited in the ECMP case. If you just break ties
based on IP address or keep the first shortest path that's found, you
could end up with rather bad results. Also, you'd need to keep track of
the highest hop-count on a shortest path to know how long to wait.
Yes, exactly. You need the highest hop count of any ECMP paths. This is
because the SPT isn't a simple tree in the case of ECMP.
Note that there is a general problem with (I think) all the methods we are
talking about for loop prevention and repair, where you have routers which
are imposing different ECMP limits. Any calculation should be made assuming
the "worst" case which is the largest number of path splits.
I do see why the delay isn't computed based on destination; all the
affected traffic took the same shortest paths to the failed component (or
will take the same shortest paths to the added component).
Where I think this method may have a problem is with SRLGs. For what I've
been calling local SRLGs (all members connected to the same router), there
wouldn't be an issue, b/c you could pretend that the node had failed from
the perspective of rooting your reverse spanning tree. For more general
SRLGs, there would be a problem. At a minimum, one would need to do
destination-based delay and determine which destinations were reached via
which failed element(s). One would also have to decide how to handle
destinations which were reached via equal-cost paths, where each
equal-cost path went via a different failed element. One would need to
couple the separate reverse spanning trees at the point where the ECMP
existed and bump up the hop-count equivalently (or something
similar). Have you thought about the implications of SRLGs?
Yes. Clearly it is possible for unrelated SRLG (as opposed to the special
case of node failure), to introduce conflicting ordering requirements on
the whole FIB. As you say this would introduce a lot of extra complexity,
and require per destination (or at least per destination set) ordering of
I realize that we have not been discussing SRLGs substantially in the
context of IP FRR. I think this is just b/c of wanting to get the basics
down first, before starting to embellish with all the details. It is, of
course, completely possible to consider SRLGs. This can just be an
additional factor in the acceptance criteria of a loop-free alternate; the
same thing works for U-turn alternates. The concern would be the
additional computation such tracking would require.
Absolutely. The other problem with SRLG as far as repair is concerned is
that you cannot be sure about your SRLGs. On the one hand you may have a
failure of a single link which is part of an SRLG, and hence you may invoke
a more complex repair strategy that actually required (or worse, decide
that the failure was irreparable, when in fact the single link failure
could have been repaired). On the other hand, you may get multiple link
failures occurring as a result of a real world SRLG of which you were
Another issue, of course, is how much engineering current networks would
require to have protection if important SRLGs were considered.
What do you think?
Rtgwg mailing list
Rtgwg mailing list