[email protected]
[Top] [All Lists]

Re: Acknowledgement of Prior Work (Was: Composite Link Requirements)

Subject: Re: Acknowledgement of Prior Work Was: Composite Link Requirements
From: Curtis Villamizar
Date: Wed, 03 Mar 2010 01:26:45 -0500
In message 
<[email protected]xxxxxxxxx>
"Mcdysan, David E" writes:
> Hi Curtis,
> The co-authors of this draft reviewed your comments and decided to
> respond with three separate messages, to separate the threads as follows
> so that all of the issues you raise can be resolved efficiently.
>       #1 Composite Link Trademark Issue (Was: Composite Link
> Requirements)
>       #2. Acknowledgement of Prior Work (Was: Composite Link
> Requirements)
>       #3. Proposed Resolution of Comments (Was: Composite Link
> Requirements)
> This is thread #2.
> Dave  
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] 
> > On Behalf Of Curtis Villamizar
> > Sent: Saturday, February 27, 2010 4:00 AM
> > To: [email protected]
> > Subject: Composite Link Requirements
> > 
> > 
> > Hi there good people of RTGWG,
> > 
> > This is in regards to the goals that are embodied in the 
> > RTGWG acceptance of a draft to deal with requirements for 
> > composite link, currently named draft-ietf-rtgwg-cl-requirement-00.txt
> > 
> > 
> > I'm bringing up two issues in this email.  One is prior 
> > composite link work and the 
> This is the subject of this thread: 
> > other is prior methods of 
> > handling composite link, which should be acknowledged.  
> Snipped
> > 
> > Note that ITU's G.800 does not define what a composite link 
> > is and only mentions composite four times in the document, 
> > including use of composite link and composite trail.  The 
> > figure indicates that a composite link is "inverse 
> > multiplexing".  For this reason, I don't think G.800 should 
> > be referenced because its a big load of **** with only slight 
> > mention of CL.
> > 
> Part of the rtgwg acceptance of this a wg draft was to include a
> reference to G.800. Lucy pointed out that the current text in section
> 2.2 is a paraphrase. We could replace with the following quote from
> section 6.9.2 of G.800, if that is wg consensus:
> "Multiple parallel links between the same subnetworks can be bundled
> together into a single composite link. Each component link of the
> composite link is independent in the sense that each component link is
> supported by a separated service layer trail. The composite link conveys
> communication information using different server layer trails thus the
> sequence of symbols cross these links may not be preserved."
> The text related to Inverse multiplexing is one of three cases in
> section 6.9.2. The text above is the first case. G.800 states that these
> are separate cases.
> The text above Figure 16 in G.800 related to concatenated server trails
> may also be relevant (at least in the framework).
> So, the choices are replace current text with the direct quote, remove
> the reference to G.800 and replace it with some other text.
> WG comments?

If I understand this definition of CL encompasses all existing LAG (or
non-Ethernet LAG like aggregation) and existing ECMP and existing
unequal mulltipath techniques.

If so, then the requirements here define yet another instance of CL as
defined by ITU.

You might want to acknowledge that there was a very similar prior
registered tradement meaning of CL that is now abandonned if for no
other reason to say "that's not what we mean".

> Text Snipped
> > 
> > Second issue is how CL has been handed in the past.
> > 
> > Whether it was two links to two places that took completely 
> > different paths (trails in ITU speak but this is IETF where 
> > we say path), or two parallel links, this has been called 
> > ECMP in IETF (and elsewhere) for two decades or more.  Both 
> > ISIS and OSPF use the term ECMP.  The techniques used for 
> > ECMP load balance was discussed on IETF lists quite a bit in 
> > the early to mid-1990s.  The three techniques applied to IP 
> > networks (in the terminology of that time) were:
> > 
> >   1.  per packet load balance
> >   2.  per bit or byte load balance aka bit striping or inverse-mux
> >   3.  IP src/dst hash
> > 
> > The second is applicable only to parallel links.  Using 
> > larger chunks it is also the technique used in MPPP 
> > (multilink PPP).  MPPP is also sometimes abbreviated PPP-ML, 
> > though not in the RFC.  MPPP is no longer of much interest as 
> > it was only applied to low speed links.
> > 
> > The per packet load balance caused packet reorder and a great 
> > deal of grief for service providers, hence the abundance of 
> > discussion within IETF at the time.  The use of IP src/dst 
> > hash, while widespread and widely discussed, did not get 
> > documented in an RFC until Chris Hopps and Dave Thaler wrote 
> > RFC 2991 "Multipath Issues in Unicast and Multicast Next-Hop 
> > Selection" and RFC 2992 "Analysis of an Equal-Cost Multi-Path 
> > Algorithm" in November 2000.  (at least AFAIK).
> > 
> > The IP src/dst technique itself is beleived to have 
> > originated in the T1-NSFNET, which puts its use back to circa 1987.
> > 
> > The OMP work predates RFC 2991 and RFC 2992 but never made it 
> > past the internet-draft stage.  In that work the use of 
> > src/dst hash and the use of adaptive algorithms with src/dst 
> > hash is discussed.  On the IETF mailing lists even methods of 
> > implementation were discussed, table based and parallel sets 
> > of comparator pairs (TCAM like).
> > 
> > Circa 2000 there was a lot of discussion of the use of the 
> > MPLS label stack to provide the entropy for ECMP vs looking 
> > past the label stack at the IP payload.  Today's PW control 
> > word acknowledges this common practice and avoids it for PW, 
> > but the fat-pw aka entropy label puts better entropy back into PW.
> > 
> > In practice today, all core hardware uses the same IP src/dst 
> > hash to provide a load balance for ECMP and LAG.
> > 
> > 
> > The existing internet-draft acknowledges link bundling, but 
> > does not accurately characterize ECMP and LAG and the src/dst 
> > hash technicque used by both, nor does it acknowledge the 
> > prior OMP work.
> > 
> The existing draft states some of these points already, but there is
> certainly more background information that you provide. It seems that
> you have more specific suggestions on the text in the draft and that is
> where we propose specific changes to address your comments. Another
> approach could be to add more text on IP-related load balancing as
> compared with the MPLS-based load balancing which is the focus of the
> draft.

There are two things I'd like to see changed though I was not clear in
that I didn't provider suggested changes to text.  If you agree in
principle, then I can provide some text.

The two are:

  1.  Accurately characterize what exists today, what existing CL
      techniques have come before this, in use or not, and accurately
      characterize the common use cases of existing CL.

  2.  State as a requirement (we are at the requirement stage) that to
      the exent possible new CL capability will:

      1.  Continue to accommodate common use cases today, including an
          ability to carry IP traffic which MAY BE omitted in an
          implementation but MUST be accommodated, at least as an
          option, by any proposed solution.

      2.  Retain backward compatibility with existing MPLS/GMPLS LSR
          with no loss of existing capability, but possibly no gain in
          functionality if the legacy LSR is anywhere on the LSP path
          include as an LER.

If the characterization of existing CL gets too long it could be a
separate informational internet-draft that is referenced but I don't
think it will get that long.

> Could you provide a URL for the prior OMP work that we can add as
> informative reference.

Did you want one that still works?  :-)

BTW- Its a part of your current employer that very abruptly shut down
the web site that had this and some other work on it (UUNET after I
left shut down engr.ans.net).  Some content was never recovered, but
this is just an aside.

Data tracker is probably the best reference.


There are also IETF WG meeting minutes and lots of mailing list
archive discussion but no need to reference them.

> Also, the scope of the draft is MPLS, which is not covered specifically
> in the points above. 
> Remainder of orignal message snipped.

The scope of the draft being only MPLS is one of the big issues.  The
existing use cases for LAG/ECMP are:

  IP traffic:
    provider core networks:
      hash based on IP source and IP destination address
    provider non-core:
      hash based on IP source and IP destination address
      and sometimes UDP and TCP ports (but rarely)
      hash based on IP source and IP destination and UDP or TCP port

  MPLS traffic:
    hash based on a subset of the MPLS stack
    size of the subset varies with vendors
    below some total stack depth (typically 8) the BOS label is included.
    number of labels included typically varies from 3-8
    less than 3 labels doesn't work well due to inadequate diversity
    using the top label very rarely (if ever) works well
    (VPN label, LDP label typically are not diverse enough)

What an implementation does with the hashed value varies:

  At one extreme is the simplicity of just doing a modulo.  This
  doesn't work if component links are not all the same bandwidth.

  A little better is a table lookup based on the hash which allows a
  more even split across component links that are not the same.

  Still better is a table lookup (or other implementation) that uses
  feedback to adjust load balance.  This include two subcategories:

    Feedback is internal to a single NE (Avici's CL is an example), is
    transparent to other NE, and requires no signaling changes.  Avici
    calls this CL but I've also heard the term Adaptive LAG somewhere.

    Feedback is external and requires signaling extensions (OMP is an
    example but was never implemented by an equipment vendor).

The existing use of Link Bundling for MPLS is:

  Maximum LSP Bandwidth is advertised.

  Accounting is on a per component basis.

  The RRO identifies both the component link and label and therefore
  cannot be changed.

This is very rough text (an outline really).  If you agree in
principle to add something I can clean it up.

The CL requirements should specify a delta based on what exists.  We
should also acknowledge a problem with LAG that there is no way to
signal characteristics of the LAG.  Having this a requirement makes it
easier to propose a document to fix this (though not necessarily the
same document that proposes a new, possible MPLS only CL).  All that
is really needed for LAG is a maximum allowed microflow size (size of
component link minus epsilon for adaptive LAG, a configured fraction
of component link size for simple LAG).  An LSP can then signal a
largest expected microflow.  For example, an LSP carrying nothing but
a set of 1GbE PW is not going to have a microflow larger than 1G.

The requirement document need not preclude proposing a CL type that is
MPLS only, but it should definitely not mandate it.

Do you agree in principle to these sort of changes?  If you at least
partially agree I'll write something more concrete that you could

rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>