[email protected]
[Top] [All Lists]

Composite Link Requirements

Subject: Composite Link Requirements
From: Curtis Villamizar
Date: Sat, 27 Feb 2010 04:00:22 -0500
Hi there good people of RTGWG,

This is in regards to the goals that are embodied in the RTGWG
acceptance of a draft to deal with requirements for composite link,
currently named draft-ietf-rtgwg-cl-requirement-00.txt

I'm bringing up two issues in this email.  One is prior composite link
work and the other is prior methods of handling composite link, which
should be acknowledged.  After that I just have some comments and
questions on the draft.

First issue is prior composite link work, both under the name
"composite link" and under other names.

A search of US trademarks yields:

  Goods and Services     (ABANDONED) IC 009. US 021 023 026 036 038. G &
  S: Computer software, namely, routing software for use in enabling
  multiple links or fiber interfaces between routers to be grouped into
  a single logical connection
  Standard Characters Claimed   
  Mark Drawing Code   (4) STANDARD CHARACTER MARK
  Serial Number     78363042
  Filing Date       February 5, 2004
  Current Filing Basis       1B
  Original Filing Basis      1B
  Published for Opposition     June 21, 2005
  Owner   (APPLICANT) Avici Systems, Inc. CORPORATION DELAWARE 101
  Billerica Avenue North Billerica MASSACHUSETTS 018621256
  Attorney of Record     John L. DuPre'
  Type of Mark   TRADEMARK
  Register       PRINCIPAL
  Live/Dead Indicator   DEAD
  Abandonment Date      September 14, 2007

Since this may predate (and certainly overlaps) the ITU use of the
term "composite link", I think Avici should be acknowledged if we
continue to use the term composite link.

The definition is nice:

  Computer software, namely, routing software for use in enabling
  multiple links or fiber interfaces between routers to be grouped
  into a single logical connection.

Note that ITU's G.800 does not define what a composite link is and
only mentions composite four times in the document, including use of
composite link and composite trail.  The figure indicates that a
composite link is "inverse multiplexing".  For this reason, I don't
think G.800 should be referenced because its a big load of **** with
only slight mention of CL.

Prior to the registered tradement, composite link was used by Avici as
a plain old trademark.  Although the registered trademark does not
predate Ethernet Link Aggregation, the product and technique to which
the registered trademark applies does predate it (1998 vs circa 2000
for IEEE).  The techniques are the same expect IEEE focused solely on
Ethernet, while Avici made use of extensions to PPP (as in POS).

In any case the aspect of the composite link dealt with in IEEE 802.1x
or Avici's composite link was the protocol to negociate the binding.

Second issue is how CL has been handed in the past.

Whether it was two links to two places that took completely different
paths (trails in ITU speak but this is IETF where we say path), or two
parallel links, this has been called ECMP in IETF (and elsewhere) for
two decades or more.  Both ISIS and OSPF use the term ECMP.  The
techniques used for ECMP load balance was discussed on IETF lists
quite a bit in the early to mid-1990s.  The three techniques applied
to IP networks (in the terminology of that time) were:

  1.  per packet load balance
  2.  per bit or byte load balance aka bit striping or inverse-mux
  3.  IP src/dst hash

The second is applicable only to parallel links.  Using larger chunks
it is also the technique used in MPPP (multilink PPP).  MPPP is also
sometimes abbreviated PPP-ML, though not in the RFC.  MPPP is no
longer of much interest as it was only applied to low speed links.

The per packet load balance caused packet reorder and a great deal of
grief for service providers, hence the abundance of discussion within
IETF at the time.  The use of IP src/dst hash, while widespread and
widely discussed, did not get documented in an RFC until Chris Hopps
and Dave Thaler wrote RFC 2991 "Multipath Issues in Unicast and
Multicast Next-Hop Selection" and RFC 2992 "Analysis of an Equal-Cost
Multi-Path Algorithm" in November 2000.  (at least AFAIK).

The IP src/dst technique itself is beleived to have originated in the
T1-NSFNET, which puts its use back to circa 1987.

The OMP work predates RFC 2991 and RFC 2992 but never made it past the
internet-draft stage.  In that work the use of src/dst hash and the
use of adaptive algorithms with src/dst hash is discussed.  On the IETF
mailing lists even methods of implementation were discussed, table
based and parallel sets of comparator pairs (TCAM like).

Circa 2000 there was a lot of discussion of the use of the MPLS label
stack to provide the entropy for ECMP vs looking past the label stack
at the IP payload.  Today's PW control word acknowledges this common
practice and avoids it for PW, but the fat-pw aka entropy label puts
better entropy back into PW.

In practice today, all core hardware uses the same IP src/dst hash to
provide a load balance for ECMP and LAG.

The existing internet-draft acknowledges link bundling, but does not
accurately characterize ECMP and LAG and the src/dst hash technicque
used by both, nor does it acknowledge the prior OMP work.

Comments on the draft:

The following statements may be inaccurate:

   The Link Bundle concept is somewhat limited because of the
   requirement that all component links must have identical
   capabilities, and because it applies only to TE links.

     This may be inaccurate.  I don't think there is a requirement
     that a link bundle use identical links.

     In any case, both Avici composite links and many LAG
     implementations allow a mix of member speeds and neither was
     applicable to TE links only.

The following should be replaced:

   Traffic Flow: A set of packets that with common identifier
   characteristics that the composite link is able to use to aggregate
   traffic into Connections.  Identifiers can be an MPLS label stack
   or any combination of IP addresses and protocol types for routing,
   signaling and management packets.

Diffserv already defined a microflow to be the same thing.  We should
not invent new terms to mean the same thing as existing terms.  We can
just point out that labels can also be used to identify a microflow.

This statement is definitely inaccurate for a number of reasons:

   o  ECMP/Hashing/LAG: IP traffic composed of a large number of flows
      with bandwidth that is small with respect to the individual link
      capacity can be handled relatively well using ECMP/LAG
      approaches.  However, these approaches do not make use of MPLS
      control plane information nor traffic volume
      information. Distribution techniques applied only within the
      data plane can result in less than ideal load balancing across
      component links of a composite link.

  Avici used feedback from the egress port to the ingress port on
  traffic volume and queue occupancy to influence the distribution of
  the hash.  There is nothing in the definition of ECMP to prohibit
  this and the OMP technique explicitly called for doing so and
  proposed protocol extension to be able to go beyond just a decision
  within a single NE as Avici did.

This is inaccurate:

  o 2.  It does not support a set of component links with different
        characteristics (e.g., different bandwidth and/or latency).

      For example, in practice carriers commonly use link bandwidth
      and link latency to set link TE metrics for RSVP-TE.  For
      RSVP-TE, limiting the component links to same TE metric has the
      practical effect of dis-allowing component links with different
      link bandwidth and latencies.

There is no formal meaning to the link metric in ISIS or OSPF.

Under inverse-mux: the real problem with inverse-mux is the amount of
bandwidth that needs to be multiplexed greatly exceeds the fastest
single packet processing element and therefore doesn't work.  The
latency argument is not really valid.

I think that the ability of an LSR to measure latency on and LSP and
report a latency figure or route based on lowest latency is almost
orthogonal to the problem of composite link.  If latency and bandwidth
at each holding priority is advertised, then we have a cross product
of advertisements.  For example, you can have 1 Gb/s at 10msec at
pri#1, but if you can live with 12msec you can have 2Gb/s, or at 14
msec 3Gb/s, but at pri#2 you only get ... and so on for 8 priorities.
Is this what we're aiming for?

The table at the beginning of seciton 4 is meaningless.

In this section: Traffic Flow and Connection Mapping

   The solution SHALL support operator assignment of traffic flows to
   specific connections.

   The solution SHALL support operator assignment of connections to
   specific component links.

How is this supposed to work for signaled LSP where the component
links are not idendified in control signaling?  Is this scalable from
a configuration standpoint or only applicable to staticly configured
MPLS cross connect?

   In order to prevent packet loss, the solution must employ make-
   before-break when a change in the mapping of a connection to a
   component link mapping change has to occur.

Only the ingress of an LSP can initiate make-before-break and the
ingress doesn't know about the component links.  In RFC3209,
make-before-break involves a new LSP using the same tunnel-id.
Are you using a different meaning for make-before-break?

Regarding this statelent:

   The solution SHALL support management plane controlled parameters
   that define at least a minimum bandwidth, maximum bandwidth,
   preemption priority, and holding priority for each connection
   without TE information (i.e., LDP signaled LSP that does not
   contain the same information as an RSVP-TE signaled LSP).

Could you explain how preemption would work for LDP?  Do you plane to
withdraw the FEC?  If so, for how long?  Forever?  If not forever
would the traffic periodically come back, get remeasured and withdrawn

In what does this mean?

   o  Bandwidth of the highest and lowest speed

Overall I find many of the stated requirements to be unclear.  Perhaps
some discussion and improvements to the wording will bring clarity.
Or maybe I'm just dense.

rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>
  • Composite Link Requirements, Curtis Villamizar <=