[email protected]
[Top] [All Lists]

RE: Composite Link Requirements

Subject: RE: Composite Link Requirements
From: "Mcdysan, David E"
Date: Tue, 2 Mar 2010 09:25:08 -0500
Hi Curtis,

The co-authors of this draft reviewed your comments and decided to
respond with three separate messages, to separate the threads as follows
so that all of the issues you raise can be resolved efficiently.

        #1 Composite Link Trademark Issue (Was: Composite Link
        #2. Acknowledgement of Prior Work (Was: Composite Link
        #3. Proposed Resolution of Comments (Was: Composite Link

Andy will be sending #1 and I will be sending #2 and #3.



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Curtis Villamizar
> Sent: Saturday, February 27, 2010 4:00 AM
> To: [email protected]
> Subject: Composite Link Requirements
> Hi there good people of RTGWG,
> This is in regards to the goals that are embodied in the 
> RTGWG acceptance of a draft to deal with requirements for 
> composite link, currently named draft-ietf-rtgwg-cl-requirement-00.txt
> I'm bringing up two issues in this email.  One is prior 
> composite link work and the other is prior methods of 
> handling composite link, which should be acknowledged.  After 
> that I just have some comments and questions on the draft.
> First issue is prior composite link work, both under the name 
> "composite link" and under other names.
> A search of US trademarks yields:
>   Word Mark    COMPOSITE LINKS
>   Goods and Services     (ABANDONED) IC 009. US 021 023 026 
> 036 038. G &
>   S: Computer software, namely, routing software for use in enabling
>   multiple links or fiber interfaces between routers to be 
> grouped into
>   a single logical connection
>   Standard Characters Claimed 
>   Mark Drawing Code   (4) STANDARD CHARACTER MARK
>   Serial Number           78363042
>   Filing Date     February 5, 2004
>   Current Filing Basis             1B
>   Original Filing Basis            1B
>   Published for Opposition     June 21, 2005
>   Owner         (APPLICANT) Avici Systems, Inc. CORPORATION 
>   Billerica Avenue North Billerica MASSACHUSETTS 018621256
>   Attorney of Record     John L. DuPre'
>   Type of Mark   TRADEMARK
>   Register       PRINCIPAL
>   Live/Dead Indicator DEAD
>   Abandonment Date    September 14, 2007
> Since this may predate (and certainly overlaps) the ITU use 
> of the term "composite link", I think Avici should be 
> acknowledged if we continue to use the term composite link.
> The definition is nice:
>   Computer software, namely, routing software for use in enabling
>   multiple links or fiber interfaces between routers to be grouped
>   into a single logical connection.
> Note that ITU's G.800 does not define what a composite link 
> is and only mentions composite four times in the document, 
> including use of composite link and composite trail.  The 
> figure indicates that a composite link is "inverse 
> multiplexing".  For this reason, I don't think G.800 should 
> be referenced because its a big load of **** with only slight 
> mention of CL.
> Prior to the registered tradement, composite link was used by 
> Avici as a plain old trademark.  Although the registered 
> trademark does not predate Ethernet Link Aggregation, the 
> product and technique to which the registered trademark 
> applies does predate it (1998 vs circa 2000 for IEEE).  The 
> techniques are the same expect IEEE focused solely on 
> Ethernet, while Avici made use of extensions to PPP (as in POS).
> In any case the aspect of the composite link dealt with in 
> IEEE 802.1x or Avici's composite link was the protocol to 
> negociate the binding.
> Second issue is how CL has been handed in the past.
> Whether it was two links to two places that took completely 
> different paths (trails in ITU speak but this is IETF where 
> we say path), or two parallel links, this has been called 
> ECMP in IETF (and elsewhere) for two decades or more.  Both 
> ISIS and OSPF use the term ECMP.  The techniques used for 
> ECMP load balance was discussed on IETF lists quite a bit in 
> the early to mid-1990s.  The three techniques applied to IP 
> networks (in the terminology of that time) were:
>   1.  per packet load balance
>   2.  per bit or byte load balance aka bit striping or inverse-mux
>   3.  IP src/dst hash
> The second is applicable only to parallel links.  Using 
> larger chunks it is also the technique used in MPPP 
> (multilink PPP).  MPPP is also sometimes abbreviated PPP-ML, 
> though not in the RFC.  MPPP is no longer of much interest as 
> it was only applied to low speed links.
> The per packet load balance caused packet reorder and a great 
> deal of grief for service providers, hence the abundance of 
> discussion within IETF at the time.  The use of IP src/dst 
> hash, while widespread and widely discussed, did not get 
> documented in an RFC until Chris Hopps and Dave Thaler wrote 
> RFC 2991 "Multipath Issues in Unicast and Multicast Next-Hop 
> Selection" and RFC 2992 "Analysis of an Equal-Cost Multi-Path 
> Algorithm" in November 2000.  (at least AFAIK).
> The IP src/dst technique itself is beleived to have 
> originated in the T1-NSFNET, which puts its use back to circa 1987.
> The OMP work predates RFC 2991 and RFC 2992 but never made it 
> past the internet-draft stage.  In that work the use of 
> src/dst hash and the use of adaptive algorithms with src/dst 
> hash is discussed.  On the IETF mailing lists even methods of 
> implementation were discussed, table based and parallel sets 
> of comparator pairs (TCAM like).
> Circa 2000 there was a lot of discussion of the use of the 
> MPLS label stack to provide the entropy for ECMP vs looking 
> past the label stack at the IP payload.  Today's PW control 
> word acknowledges this common practice and avoids it for PW, 
> but the fat-pw aka entropy label puts better entropy back into PW.
> In practice today, all core hardware uses the same IP src/dst 
> hash to provide a load balance for ECMP and LAG.
> The existing internet-draft acknowledges link bundling, but 
> does not accurately characterize ECMP and LAG and the src/dst 
> hash technicque used by both, nor does it acknowledge the 
> prior OMP work.
> Comments on the draft:
> The following statements may be inaccurate:
>    The Link Bundle concept is somewhat limited because of the
>    requirement that all component links must have identical
>    capabilities, and because it applies only to TE links.
>      This may be inaccurate.  I don't think there is a requirement
>      that a link bundle use identical links.
>      In any case, both Avici composite links and many LAG
>      implementations allow a mix of member speeds and neither was
>      applicable to TE links only.
> The following should be replaced:
>    Traffic Flow: A set of packets that with common identifier
>    characteristics that the composite link is able to use to aggregate
>    traffic into Connections.  Identifiers can be an MPLS label stack
>    or any combination of IP addresses and protocol types for routing,
>    signaling and management packets.
> Diffserv already defined a microflow to be the same thing.  
> We should not invent new terms to mean the same thing as 
> existing terms.  We can just point out that labels can also 
> be used to identify a microflow.
> This statement is definitely inaccurate for a number of reasons:
>    o  ECMP/Hashing/LAG: IP traffic composed of a large number of flows
>       with bandwidth that is small with respect to the individual link
>       capacity can be handled relatively well using ECMP/LAG
>       approaches.  However, these approaches do not make use of MPLS
>       control plane information nor traffic volume
>       information. Distribution techniques applied only within the
>       data plane can result in less than ideal load balancing across
>       component links of a composite link.
>   Avici used feedback from the egress port to the ingress port on
>   traffic volume and queue occupancy to influence the distribution of
>   the hash.  There is nothing in the definition of ECMP to prohibit
>   this and the OMP technique explicitly called for doing so and
>   proposed protocol extension to be able to go beyond just a decision
>   within a single NE as Avici did.
> This is inaccurate:
>   o 2.  It does not support a set of component links with different
>         characteristics (e.g., different bandwidth and/or latency).
>       For example, in practice carriers commonly use link bandwidth
>       and link latency to set link TE metrics for RSVP-TE.  For
>       RSVP-TE, limiting the component links to same TE metric has the
>       practical effect of dis-allowing component links with different
>       link bandwidth and latencies.
> There is no formal meaning to the link metric in ISIS or OSPF.
> Under inverse-mux: the real problem with inverse-mux is the 
> amount of bandwidth that needs to be multiplexed greatly 
> exceeds the fastest single packet processing element and 
> therefore doesn't work.  The latency argument is not really valid.
> I think that the ability of an LSR to measure latency on and 
> LSP and report a latency figure or route based on lowest 
> latency is almost orthogonal to the problem of composite 
> link.  If latency and bandwidth at each holding priority is 
> advertised, then we have a cross product of advertisements.  
> For example, you can have 1 Gb/s at 10msec at pri#1, but if 
> you can live with 12msec you can have 2Gb/s, or at 14 msec 
> 3Gb/s, but at pri#2 you only get ... and so on for 8 priorities.
> Is this what we're aiming for?
> The table at the beginning of seciton 4 is meaningless.
> In this section:
> Traffic Flow and Connection Mapping
>    The solution SHALL support operator assignment of traffic flows to
>    specific connections.
>    The solution SHALL support operator assignment of connections to
>    specific component links.
> How is this supposed to work for signaled LSP where the 
> component links are not idendified in control signaling?  Is 
> this scalable from a configuration standpoint or only 
> applicable to staticly configured MPLS cross connect?
>    In order to prevent packet loss, the solution must employ make-
>    before-break when a change in the mapping of a connection to a
>    component link mapping change has to occur.
> Only the ingress of an LSP can initiate make-before-break and 
> the ingress doesn't know about the component links.  In 
> RFC3209, make-before-break involves a new LSP using the same 
> tunnel-id.
> Are you using a different meaning for make-before-break?
> Regarding this statelent:
>    The solution SHALL support management plane controlled parameters
>    that define at least a minimum bandwidth, maximum bandwidth,
>    preemption priority, and holding priority for each connection
>    without TE information (i.e., LDP signaled LSP that does not
>    contain the same information as an RSVP-TE signaled LSP).
> Could you explain how preemption would work for LDP?  Do you 
> plane to withdraw the FEC?  If so, for how long?  Forever?  
> If not forever would the traffic periodically come back, get 
> remeasured and withdrawn again?
> In what does this mean?
>    o  Bandwidth of the highest and lowest speed
> Overall I find many of the stated requirements to be unclear. 
>  Perhaps some discussion and improvements to the wording will 
> bring clarity.
> Or maybe I'm just dense.
> Curtis
> _______________________________________________
> rtgwg mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/rtgwg
rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>