[email protected]
[Top] [All Lists]

Re: node disjoint and SRLG (was: questions on draft-bryant-ipfrr-tunnels

Subject: Re: node disjoint and SRLG was: questions on draft-bryant-ipfrr-tunnels-01.txt
From: Curtis Villamizar
Date: Wed, 24 Nov 2004 16:24:20 -0500
In message <[email protected]>
mike shand writes:
> Curtis,
>          See inline.
>          Mike

You misunderstood part of the message because I was elaborating on a
limitation of MPLS/TE in the last part of this.  Sorry for the
confusion.  I made a point about MPSL FRR and then followed with the
example.  That was the final example with a Z in the topology.

You did acknowledge another limitation of all IP FRR approaches that
hadn't been explicitly mentioned previously.  See inline.

There were a few points that maybe I didn't make clearly so I'll try
to clarify.

Part of the problem is you seem to be replying to every post as if all
the comments are directed only at your approach.

See comments inline.


> At 13:48 23/11/2004 -0500, Curtis Villamizar wrote:
> >Mike,
> >
> >I don't understand the algorithm(s?) that you two are discussing.
> >
> >In MPLS/TE you simply remove all links from the topology that share a
> >common SRLG or for some types of SRLG don't remove the links but
> >weight those links to encourage avoiding them.  Then run the CSPF.
> >
> >An interesting problem with IP FRR and SRLG is that an SRLG means that
> >the links *might* go down concurrently.  It does not indicate that the
> >links *will* go down concurrently.
> True, but in IPFRR we have no knowledge of which of these has happened so 
> we have to assume the worst case. The whole point is that we install the 
> repair before the information about the failure has had time to propagate. 
> I suppose if we were prepared to lower our sights somewhat, and have a 
> mechanism which operated at the limit of propagation time we COUL wait a 
> while after detecting a failure to see if we saw any LSPs from anywhere 
> else, and then act accordingly. We already have this concept for loop 
> prevention, but that is somewhat different since in that case we have the 
> luxury of knowing that any failure has already been repaired, so we are not 
> loosing traffic while we are waiting.
> >If two links share a common
> >resource they will go down simulaneously if failure of that resource
> >is the cause of the outage.  If you use the above approach and the
> >failure is due to some other cause, normal hop by hop forwarding could
> >cause a loop.
> I don't see why that would be true. Remember that in that case of the 
> tunnels appraoch, the whole essense of the approach is that the paths are 
> unaffected by the failures. i.e. Fspace (aka Pspace) is the set of nodes 
> which can be reached from S by paths which are unaffected by the failure, 
> and Gspace (aka Qspace) is the set of nodes which can reach the "target" 
> (proxy for destination) by paths which are unaffacted by the failure. Hence 
> the entire path tunneled to F/G and released to the destination is 
> guaranteed to be unaffected by the failure.

The first part of your tunneling (since it isn't explicit route
tunneling) relies on IP forwarding.  Plain IP with a multiple link
failure can have loops form even with symetric routes due to some
nodes reacting to the failure earlier than others.

> >MPLS/TE FRR does not suffer from this problem because tunnels are used
> >in the backup that terminate at the egress where the path is dictated
> >by the ingress or the PLR (point of local repair).  Therefore MPLS/TE
> >FRR can select a path which avoids all links that could potentially
> >fail without being concerned that a routing loop could form due to an
> >inconsistent decision.
> >
> >Alia has addressed the problem of local SRLG and has provided a clear
> >enough description of how that works.  There is some question whether
> >the more general SRLG problem can be adequately addressed because of
> >the above problem.
> >
> >The algorithm description above doesn't seem complete and therefore
> >doesn't make sense to me but that might be becuase the description was
> >sketchy or it might be just because I'm being dense.
> >
> >For example:
> >
> >       . . . . X
> >      .        .
> >     a --- b   .
> >     |      \ .
> >     |       e
> >     |      /
> >     c --- d
> >
> >A is trying to get to e and looking to protect against failure of a-b
> >but a-b and d-e share a common fiber (a, c, d may be near each other
> >geographicly and b, e may be near each other).  A needs to use a long
> >path via X.  I don't see how this "far side of the SRLG" idea fits
> >into an algorithm that solves this problem.
> Ok from A's perspective, run a normal SPF and you find that B and D are the 
> far sides. No actually, assuming dashes are hops or something, then D is 
> not reached via CD. So in this case you would find A's F-space with respect 
> to AB and CD failure, and then computes B's G space with respect to AB and 
> CD failure (by running a reverse SPF rooted at B and excising any sub-tree 
> which traversed either AB or C. The resulting overlap between F sapce and 
> Gspace would occur somewhere in the A....X....B brnach, and that is where 
> the repair traffic for B,E and D would be sent.

You didn't read down to the line that said:

  Note: I'm using . to mean multiple hops and - to mean one hop, but
  it only matters in the a.x.e case since there I just want to
  indicate higher cost.

Any path by way of X is a high cost path.  The point was that without
considering SRLG in this case IPFRR would loop.  Of course you had to
read the whole email message before replying so my fault for putting
that paragraph at the bottom of the message.  Common notation is to
use ... to mean a long (high cost) path.

A-X and A-B share an SRLG.  When a failure occurs at A-B, A does not
know if A-B alone failed, if A-B and C-D have both failed or if A-B
and A..X have failed (non-local failure).  In IP FRR if the failure at
A-B directs traffic toward X the traffic would never reach X.  There
may be a repair at that point in IP FRR that would send the traffic
back around the other way, encapsulating a second time but if the
lowest cost path to X required it to go by way of e (best path from d
to X is by way of c) then there would be a loop.

>  From C's perspective the situation is similar and again the repair point 
> would be somewhere near X.
> >  Note that U-turn doesn't
> >address this since d-e is not local to A.  The SRLG could also be a-b
> >and c-d in a slightly different example.
> >
> >The physical fiber involved in the SRLGs may look more like this
> >(where the = are two lambdas on the same segment of fiber).
> >
> >       ..X1..X2..Xn-1..      ..Xn..
> >      /                \    /     .
> >     a ----=========----====---b  Xn+1
> >      \   /         \           \ .
> >        c             --- d ----- e
> Yes this is the classic GMPLS case and there are well known algorithms for 
> calculating link and or node diverse paths. To solve this it is necessary 
> (as you point out) to move the primary path away from the normal shortest 
> path in order to avoid blocking.
> However, we do not have the luxury of this sort of thing in a pure IP 
> network. The primary path is always the shortest path and there is nothing 
> we can o about it.
> Its a very different problem.

This is a type of problem that occur in the real world.  It is not
addressed by any of the IP FRR solutions but is addressed by MPLS FRR.
This is just something that the framework document should mention.

The remainder of this message starts out but saying "In MPLS TE,
...".  Somehow you seemed to lose the fact that I was describing a
limitation of MPLS FRR below.

> >Even more interesting is the case were a-b shares one SRLG with a..x
> >and a-b shares another SRLG with a-c.  In MPLS TE, the primary path
> >can be changed to a-c-d-e so that the backup path a..x..e could be
> >used and the primary and backup paths did not share any SRLG.  This is
> >generally done only with primary and standby (backup from ingress to
> >egress) or ingress as a PLR in FRR since the ingress and not
> >the PLR determines the primary path.  It would be difficult for the
> >ingress to select a primary path that assures that a feasible backup
> >is available at any PLR.
> >
> >For example:
> >
> >         . . . . X
> >        .        .
> >     - a --- b   .
> >   /   |      \ .
> >  z    |       e
> >   \   |      /
> >     - c --- d
> >
> >If Z is now the ingress, Z is unlikely to consider whether A has a
> >viable repair path when selecting the path to E.
> Z's primary path to E will be strictly the shortest path and hence in your 
> example will be an ECMP via A and C. As before A's repair for the SRLG of 
> AB,CD will be via X, and so will that of C. So if AB only has failed, 
> traffic from Z to E will be split between ZAXE and ZCDE. But if AB and CD 
> have failed it will be split between ZAXE and ZCZAXE (since C's shortest 
> path to X is via Z. Note that this is NOT a loop, since the packet when it 
> comes back through Z is encapped with a destination of X not E. This is 
> ugly, but not disastrous.

In MPLS TE A would simply have no backup that didn't violate an SRLG
and the local-protect-available bit would not be set notifying the
ingress of the problem.  Given current algorithms the ingress would
not know what to do about it (the CSPF would still show z-a-b-e as the
best path).  When this occurs the only way to solve it would be for Z
to do an SPF rooted at each node on the path that could not get
local-protect-available and figure out which links needed to be
avoided for that reason.  

> >   Z would likely pick
> >the path z-a-b-e if the path z-c-d-e was a viable packup for failure
> >of z-a and then A would have no way to create a viable backup.  A
> >better example of this is where one SRLG is a-b and a.x and another
> >SRLG is a-b and a-c and the third is a-b and z-a.
> This case is not very interesting from Z's perspective. IF it detected ZA 
> had failed it would also have to assume that AB had failed, but it would do 
> this anyway since it must assume that node A has failed.

Since the example was illustrating a limitation of MPLS TE, it is
interesting from Z's perspective.

> But this does raise an interesting question. So far we (well at least I) 
> have been thinking of SRLG in terms of a set of links failing. But we know 
> already that we need to consider node failure as an SRLG consisting of all 
> the links from a neighboring node, so if we see the link to our neighbor go 
> down we must assume that the node itself has gone down. So does this mean 
> that if we have a neighbor E, and SE is an SRLG with links XY and ST, the 
> ACTUAL SRLG we have to consider is node E failure and links ST and XY failure?
> >   The path z-a-b-e
> >and z-c-d-e are fully disjoint but A has no backup path.  OTOH if A
> >picked a primary path of z-c-d-e, Z, C and D all have viable backups
> >when acting as the PLR.
> >
> >Note: I'm using . to mean multiple hops and - to mean one hop, but it
> >only matters in the a.x.e case since there I just want to indicate
> >higher cost.
> >
> >Curtis

Rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>