[email protected]
[Top] [All Lists]

Re: FW: composite link - candidate for respin, maybe

Subject: Re: FW: composite link - candidate for respin, maybe
From: Curtis Villamizar
Date: Sun, 11 Apr 2010 02:08:29 -0400
In message <[email protected]>
Yong Lucy writes:
>  
> Hi Curtis,
>  
> Sorry, I missed this reply until now.
>  
> See inline. 
>  
> Regards,
> Lucy


Lucy,

I'm replying even though I think we have moved to Dave's text.

See inline.


> Snipped
>  
> > > [LY] IMO: in the context of this draft, flow identification identifies a
> > > flow. Not a group of flows. A flow has to be transported over a single
> > > component link in order to preserve the ordering.
> > 
> > A particular flow identification method may isolate a group of one.
> > That is neither precluded or required.  If a group of flows is bound
> > to a component link, all the flows in that group are bound.
> > 
> > I think were you are confused is that if a LSP is self identified
> > (throgh signaling) or identified through management plane as requiring
> > a strict ordering over the entire LSP, then it is treated as an
> > individual LSP.
>  
> [LY] In the context of composite link, a flow or flow identification
> means composite link need to map it to single component link. We
> discussed about whether flow identification is just the top label on
> the label stack or can be outer label as well as outer+inner label.


The conclusion was that at a midpoint LSR the LSP defines the top link
but the entire label stack can be used for load balance.


> > [begin aside]
> > 
> > BTW- Existing techniques isolate groups of flows and do a good job of
> > load balancing.  IP traffic can have millions of flows, most of which
> > are short lived.  If IP hash is the basis for an entropy label there
> > would be millions of values (up to 20 bits worth - minus a few if
> > reserved label values are avoided, though 19 bits would be easier to
> > implement than a modulo).
> > 
> > Even if all the flows are LSP, if a customer PTP Ethernet is carried
> > as a PW there is no way to know when that mostly idle Ethernet is
> > about to start the nightly database backup and suddenly peak and stay
> > there for a while.  Lots of applications are like this.  Some grow
> > gradually, some spike suddenly.
> > 
> > [end aside]
>  
> [LY] In the meeting, we seem agree not to include IP at first
> phase. If component link has different costs, composite link can't
> simple use ECMP to spread traffic. My understanding for composite link
> requirement is to use TE method to map flows to component links. For
> TE LSP, it can take signaled TE parameters into account in the
> placement. For non-TE LSP, it takes measured BW into account in the
> placement (other flows). The assumption is, in this application, most
> of LSPs are TE LSP, there are small amount of non-TE LSPs.  It uses TE
> method to apply both TE LSP and non-TE LSP. This is what I see the
> main difference from ECMP method. Hope co-authors can further confirm
> my understand on this.


I don't think we agreed not to include IP if it took very little or no
effort to include IP relative to LDP.


> snipped
>  
> > > [LY] Does the draft aims on MPLS network? In MPLS network, there are IP
> > > packets from control plane. Should we limit to this scope for now?
> > 
> > No one has demonstated any technical reason to limit scope to MPLS
> > only, let alone a strong reason.  This is a paragraph about what is
> > known to be "generally safe" to load split on and what existing
> > implementations do.  It is generally safe except now it is unsafe for
> > MPLS-TP OAM LM only.  I should add citations to the PWE CW and the
> > motivation for CW which is based on this behavior.
>  
> [LY] bundling different cost component link together makes composite
> link differ from link bundle. This can result different ways to split
> IP traffic because of different costs. Not use if we need to include
> customer IP traffic in current scope.


Your comment doesn't seem to be relevant to anything above.


> > > >   For example a large aggregate of IP traffic may be subdivided into a
> > > >   large number of groups of flows using a hash on the IP source and
> > > >   destination addresses.  This is as described in [diffserv
> > > >   framework].  For MPLS traffic carrying IP, a similar hash can be
> > > >   performed on the set of labels in the label stack.  These techniques
> > > >   are both examples of means to subdivide traffic into groups of flows
> > > >   for the purpose of load balancing traffic across aggregated link
> > > >   capacity.  The means of identifying a flow should not be confused
> > > >   with the definition of a flow.
> > > >
> > > >   Discussion of whether a hash based approach provides a sufficiently
> > > >   even load balance using any particular hashing algorithm or method
> > > >   of distributing traffic across a set of component links is outside
> > > >   of the scope of this document.
> > > >
> > > >   The use of three hash based approaches are defined in RFCxxxx.  The
> > > >   use of hash based approaches is mentioned as an example of an
> > > >   existing set of techniques to distribute traffic over a set of
> > > >   component links.  Other techniques are not precluded.
> > > [LY] Not sure why mention hash here.
> > 
> > I am citing what little we have in documentation on hash based methods
> > (RFC2991 and RFC2992).
> > 
> > Hash based methods that snoop past BOS to IP is why we have a PWE CW.
> > We need to acknowledge very widely deployed behaviour.  In this case
> > it is every core router/LSR in the Internet for the last 15 years
> > (unless some vendor that I don't know about has missed this but if
> > Cisco and Juniper own 90%++ of the market it is in at least 90%).
>  
> [LY] Please separate my draft for enhanced ECMP and composite link
> work.  Composite link mainly use TE based placement.


In the paragraphs immediately above I'm describing what Cisco and
Juniper and Foundry and Force10 and just about anyone's routers do
today.  That has nothing to do with any draft you wrote.


> > > > Requirements:
> > > >
> > > >   These requirements refer to link bundling solely to provide a frame
> > > >   of reference.  This requirements document does not intend to
> > > >   constrain a solution to build upon link bundling.  Meeting these
> > > >   requirements useing extensions to link bundling is not precluded, if
> > > >   doing so is determined by later IETF work to be the best solution.
> > > >
> > > >   The first few requirements listed here are met or partially met by
> > > >   existing link bundling behavior including common behaviour that is
> > > >   implemented when the all ones address (for example 0xFFFFFFFF for
> > > >   IPv4) is used.  This common behaviour today makes use of a hashing
> > > >   technique as described in the introduction, though other behaviours
> > > >   are not precluded.
> > > [LY] Why mention hash here?
> > 
> > Because it is in link bundle and link bundle is used as the frame of
> > reference.  I did say that we are not bound to extending link bundle.
> > But we must consider it as a possible option because IETF requires
> > that we not reinvent the wheel.  To do so we need to discuss what link
> > bundle does in the first place so that we can decide.  If we decide
> > that we can't reuse link bundle we have to justify that decision.
> > 
> > > >   1.  Aggregated control information which summarizes multiple
> > > >       parallel links into a single advertisement is required to reduce
> > > >       information load and improve scaleability.
> > > >
> > > >   2.  A means to support very large LSP is needed, including LSP whose
> > > >       total bandwidth exceeds the size of a single component link but
> > > >       whose traffic has no single flow greater the component links.
> > > >       In link bundling this is supported by many implementations using
> > > >       the all ones address component addressing and hash based
> > > >       techniques.
> > > [LY] IMO: This is not a requirement for composite link. Original draft
> > > requires a flow BW (LSP) is less than single component link capacity.
> > > Opinion from other co-authors? Original draft requires TE based method
> > to
> > > handle both RSVP-TE LSPs and LDP LSPs.
> > 
> > This is real world today.  Apparently Dave thinks it is a requirement.
> > Probably becuase his network wouldn't work today without it.  All of
> > the customers I've spoken to (at least the IP people) think supporting
> > LSP larger than a component link is a hard requirment.  This is most
> > important for IP core where (bundle/aggregate/composite) links are
> > rapidly approaching Tb/s and we haven't seen the first commercial
> > 100GbE yet (except trade show demos, and ODU4 will come even later).
>  
> [LY] This is the reason to have FAT PW. This is my fault not to make
> it clear at beginning. In the context of composite link, a flow BW is
> less than a component link capacity.


You have that wrong.  In the composite link requirements a LSP can
have a bandwidth that is greater than any one of the component links.


> > Once again, we can ask for concensus on this.  Generally we can't
> > remove widely deployed capability (or we can but we can't expect
> > support or market success if we do).
> > 
> > The new requirement is to add LSP which *don't* behave this way (don't
> > load split by separating flows that exist within an LSP).
> > 
> > BTW - this applies to both TE and LDP LSP, and TE LSP carrying LDP
> > LSP, etc.
>  
> [LY] Yes. However, what is the flow for composite link in latter case,
> TE LSP or LDP?


It doesn't matter how an LSP is set up.  The same techniques that just
look at the label stack (or optionally past the label stack if the
payload is IP) work regardless of how the LSPs were set up.


Curtis

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

<Prev in Thread] Current Thread [Next in Thread>