You are assuming I think, that the broadcast "link" is actually
made up of a full mesh of pt-pt links, and what you are describing is a
failure of 3 of those links simultaneously (don't forget the reverse
connectivity check), breaking connectivity of the triangle ABC. As such,
this is neither node failure nor single link failure, but probably comes in
the category of unrelated SRLG, so it is not surprising that an incorrect
ordering may occur (I haven't analyzed it yet to be sure, but it seems likely).
However, if the broadcast link were represented as it normally
would be as a pseudonode (or the OSPF equivalent), then the situation would
be rather different, and would depend on which router was DR. What you
describe would cause the loss of adjacencies between AB,AC and BC.
If E were DR, there would be no change in the LSPs, since A,B &
C's links to and from the pseudonode would still be intact. In IS-IS (I
don't know for sure about OSPF), there would be no break in connectivity
either, since if (for example) A needed to send a packet to C it would send
it to the DR(E), which would in turn send it to C. This is how IS-IS deals
with partial adjacency formation on a LAN.
If A (or B or C) were DR, then all sorts of interesting
possibilities arise. If (for example) B had the next highest priority/mac
address to A, then when B lost its adjacency to A it would elect itself as
a second DR. The consequences are probably too horrible to imagine.
The bottom line is that the broadcast link protocols do not handle
this sort of non-transitive broadcast medium failure at all well. Indeed
they were not designed to. See the following extract from clause 6.7 of
c) The following events are "very low probability", which
means performance will be impacted unless they are extremely
rare, on the order of less than one event per four
1) Delivery of NPDUs with undetected data corruption;
2) Nontransitive connectivity, i.e. where system A can
receive transmissions from systems B and C, but system
B cannot receive transmissions from system C.
SO now going back to your analysis of the pt-pt link case. I think you may
have misunderstood how the reverse SPF is used. Remember that it is ONLY
the portion of the RSPF which crosses the link under consideration which is
So I think this gives the following values, for delay (which is horizon hop
count - hop count)
Node | delay | delay | max from | delay |
| from A | from B | A & B | from C |
A | 0 | 0 | 0 | 5
B | 3 | 0 | 3 | 5
C | 0 | 1 | 1 | 0
D | 0 | 0 | 0 | 0
E | 0 | 0 | 0 | 0
F | 2 | 0 | 2 | 4
G | 1 | 0 | 1 | 3
H | 0 | 0 | 0 | 2
S | 0 | 0 | 0 | 0
I | 0 | 0 | 0 | 1
I don't THINK there is a conflict here, so despite the link failure not
being "related" (in the sense of all applying to the same node), at least
in this case I think it doesn't cause a problem.
But these things are somewhat tricky to evaluate by hand, and my simulator
doesn't (yet :-) deal with such "unrelated" failures. So it is possible I
have missed something here.
I'll try to do a proper analysis when I get time.
At 12:56 21/10/2004 -0400, Alia Atlas wrote:
At 11:33 AM 10/21/2004, mike shand wrote:
At 11:25 21/10/2004 -0400, Alia Atlas wrote:
I'm thinking of the case where there's a broadcast link that
fails. Otherwise, there's a race condition that depends on when the
LSPs are received.
Could you give an example?
I'm working on it. Basically, I'm assuming that arbitrary failures of
connectivity can happen inside a broadcast link due to layer-2 equipment.
Could you give an example where the correct behavior is to increase the
delay in the event of a failure?
I believe that the correct behavior would be the other way (for a failure
case), because of the need to remove dependencies; also, all the nodes
upstream of the one whose hopCount was increased would also have increased
hopCounts, so reducing the delay is appropriate.
The example that I have so far is rather contrived and assumes arbitrary
(including 1-way) failure at layer-2. It is as follows:
----[ I ]
| 10 |
| | 3 3
| [ S ]----[ H ]---[ G ]
| | | 3
| 12 | [ F ]
| | | 3
| | |
| [ A ] [ B ]
| | 5 |
| 5 |----------------|
| | |
| | 5 | 5
| [ C ] [ E ]
|----[ D ]----| 10
Consider the above topology where D is the destination for the traffic
under consideration. A partial failure occurs on the broadcast link
between A, B and C, as could occur due to a layer-2 switch. After the
failure, the following connections exist:
A to C, A to E,
B to A, B to E,
C to E,
E to A, E to B, E to C
Before the failure, S had equal-cost paths to D via A and via H.
As a result of this failure, 3 LSPs would be sent out:
LSP 1 from A: reports connection to B is down
LSP 2 from B: reports connection to C is down
LSP 3 from C: reports connections to A and B are down
The following are the hopCounts from doing RSPTs on A, B and C based
on the topology before the failure.
Node | hopCount | hopCount | max from | hopCount |
| from A | from B | A & B | from C |
A | 0 | 1 | 1 | 1
B | 1 | 0 | 1 | 1
C | 1 | 1 | 1 | 0
D | 2 | 2 | 2 | 1
E | 1 | 1 | 1 | 1
F | 2 | 1 | 2 | 2
G | 3 | 2 | 3 | 3
H | 4 | 3 | 4 | 4
S | 1 | 4 | 4 | 5
I | 2 | 5 | 5 | 6
Assume that LSP 1 arrives before LSP 2 or LSP 3.
In that case, H and S would use the same delay and thereby cause a micro-loop.
Does this match with what you are saying?
I realize that the Ordered SPFs works most of the time; I'm trying to
understand the corner cases (however improbable) better.
Rtgwg mailing list