[email protected]
[Top] [All Lists]

Re: node disjoint and SRLG (was: questions on draft-bryant-ipfrr-tunnels

Subject: Re: node disjoint and SRLG was: questions on draft-bryant-ipfrr-tunnels-01.txt
From: mike shand
Date: Wed, 24 Nov 2004 08:17:52 +0000
        See inline.


At 13:48 23/11/2004 -0500, Curtis Villamizar wrote:


I don't understand the algorithm(s?) that you two are discussing.

In MPLS/TE you simply remove all links from the topology that share a
common SRLG or for some types of SRLG don't remove the links but
weight those links to encourage avoiding them.  Then run the CSPF.

An interesting problem with IP FRR and SRLG is that an SRLG means that
the links *might* go down concurrently.  It does not indicate that the
links *will* go down concurrently.
True, but in IPFRR we have no knowledge of which of these has happened so
we have to assume the worst case. The whole point is that we install the
repair before the information about the failure has had time to propagate.
I suppose if we were prepared to lower our sights somewhat, and have a
mechanism which operated at the limit of propagation time we COUL wait a
while after detecting a failure to see if we saw any LSPs from anywhere
else, and then act accordingly. We already have this concept for loop
prevention, but that is somewhat different since in that case we have the
luxury of knowing that any failure has already been repaired, so we are not
loosing traffic while we are waiting.
If two links share a common
resource they will go down simulaneously if failure of that resource
is the cause of the outage.  If you use the above approach and the
failure is due to some other cause, normal hop by hop forwarding could
cause a loop.
I don't see why that would be true. Remember that in that case of the
tunnels appraoch, the whole essense of the approach is that the paths are
unaffected by the failures. i.e. Fspace (aka Pspace) is the set of nodes
which can be reached from S by paths which are unaffected by the failure,
and Gspace (aka Qspace) is the set of nodes which can reach the "target"
(proxy for destination) by paths which are unaffacted by the failure. Hence
the entire path tunneled to F/G and released to the destination is
guaranteed to be unaffected by the failure.

MPLS/TE FRR does not suffer from this problem because tunnels are used
in the backup that terminate at the egress where the path is dictated
by the ingress or the PLR (point of local repair).  Therefore MPLS/TE
FRR can select a path which avoids all links that could potentially
fail without being concerned that a routing loop could form due to an
inconsistent decision.

Alia has addressed the problem of local SRLG and has provided a clear
enough description of how that works.  There is some question whether
the more general SRLG problem can be adequately addressed because of
the above problem.

The algorithm description above doesn't seem complete and therefore
doesn't make sense to me but that might be becuase the description was
sketchy or it might be just because I'm being dense.

For example:

      . . . . X
     .        .
    a --- b   .
    |      \ .
    |       e
    |      /
    c --- d

A is trying to get to e and looking to protect against failure of a-b
but a-b and d-e share a common fiber (a, c, d may be near each other
geographicly and b, e may be near each other).  A needs to use a long
path via X.  I don't see how this "far side of the SRLG" idea fits
into an algorithm that solves this problem.
Ok from A's perspective, run a normal SPF and you find that B and D are the
far sides. No actually, assuming dashes are hops or something, then D is
not reached via CD. So in this case you would find A's F-space with respect
to AB and CD failure, and then computes B's G space with respect to AB and
CD failure (by running a reverse SPF rooted at B and excising any sub-tree
which traversed either AB or C. The resulting overlap between F sapce and
Gspace would occur somewhere in the A....X....B brnach, and that is where
the repair traffic for B,E and D would be sent.
From C's perspective the situation is similar and again the repair point
would be somewhere near X.
 Note that U-turn doesn't
address this since d-e is not local to A.  The SRLG could also be a-b
and c-d in a slightly different example.

The physical fiber involved in the SRLGs may look more like this
(where the = are two lambdas on the same segment of fiber).

      ..X1..X2..Xn-1..      ..Xn..
     /                \    /     .
    a ----=========----====---b  Xn+1
     \   /         \           \ .
       c             --- d ----- e
Yes this is the classic GMPLS case and there are well known algorithms for
calculating link and or node diverse paths. To solve this it is necessary
(as you point out) to move the primary path away from the normal shortest
path in order to avoid blocking.
However, we do not have the luxury of this sort of thing in a pure IP
network. The primary path is always the shortest path and there is nothing
we can o about it.
Its a very different problem.

Even more interesting is the case were a-b shares one SRLG with a..x
and a-b shares another SRLG with a-c.  In MPLS TE, the primary path
can be changed to a-c-d-e so that the backup path a..x..e could be
used and the primary and backup paths did not share any SRLG.  This is
generally done only with primary and standby (backup from ingress to
egress) or ingress as a PLR in FRR since the ingress and not
the PLR determines the primary path.  It would be difficult for the
ingress to select a primary path that assures that a feasible backup
is available at any PLR.

For example:

        . . . . X
       .        .
    - a --- b   .
  /   |      \ .
 z    |       e
  \   |      /
    - c --- d

If Z is now the ingress, Z is unlikely to consider whether A has a
viable repair path when selecting the path to E.
Z's primary path to E will be strictly the shortest path and hence in your
example will be an ECMP via A and C. As before A's repair for the SRLG of
AB,CD will be via X, and so will that of C. So if AB only has failed,
traffic from Z to E will be split between ZAXE and ZCDE. But if AB and CD
have failed it will be split between ZAXE and ZCZAXE (since C's shortest
path to X is via Z. Note that this is NOT a loop, since the packet when it
comes back through Z is encapped with a destination of X not E. This is
ugly, but not disastrous.
  Z would likely pick
the path z-a-b-e if the path z-c-d-e was a viable packup for failure
of z-a and then A would have no way to create a viable backup.  A
better example of this is where one SRLG is a-b and a.x and another
SRLG is a-b and a-c and the third is a-b and z-a.
This case is not very interesting from Z's perspective. IF it detected ZA
had failed it would also have to assume that AB had failed, but it would do
this anyway since it must assume that node A has failed.
But this does raise an interesting question. So far we (well at least I)
have been thinking of SRLG in terms of a set of links failing. But we know
already that we need to consider node failure as an SRLG consisting of all
the links from a neighboring node, so if we see the link to our neighbor go
down we must assume that the node itself has gone down. So does this mean
that if we have a neighbor E, and SE is an SRLG with links XY and ST, the
ACTUAL SRLG we have to consider is node E failure and links ST and XY failure?

  The path z-a-b-e
and z-c-d-e are fully disjoint but A has no backup path.  OTOH if A
picked a primary path of z-c-d-e, Z, C and D all have viable backups
when acting as the PLR.

Note: I'm using . to mean multiple hops and - to mean one hop, but it
only matters in the a.x.e case since there I just want to indicate
higher cost.


Rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>