Thanks to John Scudder for taking the notes.
Folks, please review and send your corrections to the list.
RTGWG IETF 61
1. Agenda bashing, aministrivia (chairs) [5m] 00:05
2. Document status (chairs) [5m] 00:10
RFC 3906 published (informational)
GTSM -- more comments need to be integrated, last call before
Minneapolis. Implementors please inform mailing list/authors
Framework, loopfree, MIB well along.
uloop prevention design team constituted (names already sent to
list). Desire to keep membership small (already not small, so maybe
"less big"). Goal total coverage if possible, extensible if not.
Design team to report back by December '04.
3. Basic IP FRR spec update (Alia)  00:25
Document revved to be more of a spec and less of a survey. Need to
read framework too because that's where definitions section is!
To do: Multihomed prefixes, link selection, SRLG.
Need more people to read & comment. Comments to list please.
4. IPFRR MIB (Alia) [20m] 00:45
Only the first of many MIBs.
Doesn't cover SRLGs (yet?)
Includes protected route table (with NH, alternate NH including alt NH type)
Includes unprotected route table (just route and why)
Global routing stats (various kinds of route counts)
Not covered: IGP (IPFRR enabled? local holddown time?), LDP
(protected/unprotected FECs, alt NH info including alt label). Other
(small) MIBs will probably be needed for these.
Please comment on: is this grouping of MIBs appropriate?
Alex Zinin: re protected/unprotected route tables, why use a
different table instead of augmenting an existing table?
Alia: I don't know how to do that, my understanding is you can't
really extend a MIB, this one is indexed the same as an IP routing
table MIB which I think is as good as it gets.
Alex: so how do I use these tables?
Joel Halpern: please remember that a MIB is a MIB, it's just used for
management purposes, it doesn't drive the implementation.
Alex: Are these different sets of routes, will it be recorded twice,
once in normal routing table and once in unprotected table?
Bill Fenner: I will sometimes admit to being MIB-literate. This is
the right thing to do. Indexes are dup'd but info isn't.
Stewart Bryant: How do we report dynamic info like "repair attempted
but failed"? No doubt there will be other dynamic info.
Alia: Q is what level to detect, what level to report at. Probably
will be in IGP MIB and not this one.
Stewart: I think this is really important for O&M, because these
faults are transient so we must be very attentive to this issue.
Alia: Yep. We need to make sure that we can actually detect the
errors we put in the MIB!
Stewart: Need to go to ipfix? Maybe doesn't even need to be in MIB.
Alia: We should talk about it.
Stewart: We'll try to write a draft up about it.
Don Fedyk: We did consider that. Error reason is in there but
there's no history associated with it. Take a look at what we have
and see what needs to be improved on.
Stewart: An example of what I'm talking about is we think we have a
protection path but when we try to send a packet on it, it fails.
Is MIB grouping sensible, are MIBs sensible, please read and comment
or you will get what you deserve? Right now draft has u-turn
alternates in it, should it include other candidate alternate types?
Stewart: First MIB should include basic, where there is common
ground, then have a different MIB for advanced.
Alia: All I mean is that there is type defined for "u-turn" for
alternate type, and a row in interface for "can I break u-turns".
Alex: Maybe we should just rename u-turn to "reserved"?
Comments to list please. Very few admit to having read it.
Alex: We'll ask on the list about making draft a WG doc.
David Ward: Who will do IGP MIBs?
Alia: Are you volunteering?
David: No. Someone from this WG should do the work and then present
it to the IGP WGs.
5. Micro-loop prevent DT report (Alia, Mike) [20m] 01:05
Discussion [20m] 01:25
Trying to bring order to chaos, we have too many partial solutions
right now. Trying to explain, divide solution space into types,
consider types, summarize.
Basic problem: Microloops resulting from conventional IGP
converge-as-fast-as-you-can loses traffic, undoing IPFRR goodness.
Reason for uloops: Independent/asynchronous decisions. Loops are
temporary! Duration can be much longer than IPFRR time though.
Duration driven by relative time to update FIBs (i.e., degree of
asynchrony). No way to guarantee two routers will take similar
length of time to update FIBs (from one router's PoV the network
change may cause just a few routes to change -> fast download, from
another PoV many routes may change -> slow download).
Solution: Controlled convergence. Inevitably makes convergence
slower, but this is OK because IPFRR repair covers failure allowing
leisurely convergence. But: still want to keep traditional method as
fallback in case of multiple failures.
- Controlled information flow (incremental cost change)
- Controlled distributed behavior (synchronized FIB installation,
ordered FIB changes, path locking)
(See slides for full comparison matrix, highlights follow)
- Incremental cost change -- can take hours
- Synchronized FIB install -- seems simple, but isn't, and dependency on NTP
- ordered spf's. no changes in forwarding plane. doesn't deal with
SRLG (only single failure is supported). Need to extend algo to a
per-destination base. Long delays if large network diameter. Worst
case can be pretty long...
- path locking. cons: complete coverage requires additional
forwarding mechanisms. pros: small delay in rib/fib installation.
Detailed description of the above four methods
Ordering by signalling
Alex: Is node failure a SRLG case?
Mike, Alia: No. Node failure can be handled by any of these techniques.
Ordering by delay
"Lollipop topology" (for example) can make delay-ordered SPF slower
than needed (known techniques are more pessimistic than needed).
Can combine delay and signalling (optimization of delay-based
version, point is that signalling doesn't need to be reliable since
delay backs it up)
Backwards compatibility is a problem.
Alex: how much is it really a problem? Can't you just announce the
capability in your IGP and only start using the method when all routers
Mike: yes but that means if you infect your network with one router
that doesn't support this, you've broken the scheme.
Three epochs -- change discovery time, use transitional paths time,
lock to new topology time.
Potential transitional path types -- tunnels, safe neighbors, packet
Sorting out the possibilities -- what are the criteria? Time to be
converged (ballpark: 10 sec), simplicity, SRLG support (or really,
unpredicted multiple failure coverage), no additional mechanisms
beyond IP (may hurt coverage), common additional mechanisms for this
and other advanced methods, also work for LDP.
- Incremental cost change impractical
- Sync'd FIB swap -- skeptical about practicality
- Ordered SPF -- long delay, poor SLRG support -- enough to be an issue?
- Path locking -- seems most promising, many possibilities (ed: but,
maybe it's just that the newest toy is always the shiniest?)
- Haven't thought of any new methods this morning but we haven't been
to the bar yet
- Need more brain power on this, more discussion
Danny: Is incremental deployability a hard requirement?
Alex: Yeah, and is 100% coverage required?
Danny: Sure but is incremental really a hard requirement?
Alia: Path locking can be done incremental. You can't have a flag day.
Danny: Well not a flag day, but it would be OK to require all routers
to have same version of code before solution becomes viable.
Alia: But still need to worry about turning it all on
Andrew Lange: That's what maintencance windows are for.
Voch Kompella: Re sync'd FIB swap -- If requirements were externally
provided (and included atomic clocks) problem would be easier. Are
we making the problem harder than we have to because we are inventing
our own requirements?
George Swallow: Are you only worried about clock skew during failure?
Mike: Skew isn't the problem, problem is skew in FIB install time.
George: So clock sync is not the biggest issue here actually.
Mike: Yes although I'm nervous about inter-layer dependencies.
Stewart: Well if you can detect that NTP isn't working then you can
just disable the loopfree thingy.
David Ward: So we've asked for a collection of requirements but have
no place to collect them.
Alex: Actually we haven't asked for requirements.
David: How do you multicast?
Mike: General thinking is that you have to get the packet to the
other side of the failure, can't just drop it off some place and use
the unicast/downstream approaches because of RPF, etc.
Bill: Two halves to problem, other half is you need state to know
where downstream neighbors are for mcast. So fast repair has to
repair that state as well. You can get the packet to the other end
of the failure OR get the join state down the repair path real fast.
Stewart: We're talking about for the repair, right? For the uloop
convergence you have lots of time to fix up the mfib?
Everyone: Nope nope.
Bill: You're moving the tree around. PIM needs to get access to the
new SPF topology before the new FIB is put into use, that might work.
Alia: At a minimum we have to not break mcast/make it worse!
Secondary question is how to protect mcast too.
Alex: So getting back to uloop prevention...
David: Design team requested requirements, how are we going to
provide them Alex?
Alex: Oh, thought you were asking about a requirements document
Alex: In particular SPs should try to respond to presenters
questions/strawman requirements. SRLGs? Less than full coverage?
These are important because they will drive the selection of
Danny: Where ARE we going to record the requirements?
Alex: The mailing list?
Alia: The taxonomy doc?
6. Update on draft-atlas-ip-local-protect-uturn (Alia)[20m] 01:45
- Explicitly marked packet identification (well known label?). Makes
ID'ing potential U-turn packets easier, etc.
- Example algorithm for how to look for U-turn alternates. (Worst
case is 1 additional SPF per neighbor.)
- Simplify alternate selection
- More detailed explanation considering link protection
Rtgwg mailing list