[email protected]
[Top] [All Lists]

composite link - candidate for respin, maybe

Subject: composite link - candidate for respin, maybe
From: Curtis Villamizar
Date: Fri, 26 Mar 2010 03:01:54 -0400
This email is to the authors of the CL drafts and the WG as a whole.

I started out writing a quick few bullets on what we talked about in
today's WG meeting that we seemed to agree on and that made some sense
to me.  The latter part may be a filter that introduces artifacts to
the signal (or noise).

That didn't seem to stand alone so I added some terms up front and
then some introductory text, trying to keep it concise.

What I would like to ask the WG is whether I'm on a reasonable track
here.  [hopefully we won't have continued dead silence on the list.]

What I would like to ask the authors is whether they would consider
this a reasonable restart point.

I've tried to keep in mind the chair's instructions and what I sensed
to be the feelings in the audience (though we didn't formally hum - is
formal WG hum an oxymoron?).

  keep it short - 5 pages including boilerplate if possible

  include clear requirements

  do not mandate implementation details

I'm not sure if the usage scenarios at the end are needed or if this
could stand alone.

There was a lot of chatter in the room so I appologize if I missed
something.  This is based on my recollection which is known to have a
fairly high bit error rate even on better days.


btw - this is obviously not an I-D as is for more reasons than missing
boilerplate.  Its also late and I'm tired so I made no attempt to get
citation right.

Key terms:

  flow - A flow in the context of this document is a aggregate of
    traffic for which packets should not be reordered.  A flow is
    similart to a microflow or ordered aggregate as defined in
    [diffserv framework].  The term "flow" is used here for brevity.
    This definition of flow should not be interpreted to have broader
    scope than this document.

  flow identification - The means of identifying a flow or a group of
    flows may be specific to a type of payload.

  top label entry - In MPLS the top label entry contains the label on
    which an intitial forwarding decision is made.  This label may be
    popped and the forwarding decision may involve further labels but
    that is immeterial to this discussion.

  label stack - In MPLS the label stack includes all of the MPLS
    labels from the top of the stack to the label marked with the
    S-bit (Bottom of Stack bit) set.

  outer and inner LSP - The LSP associated with labels in the outer
    encapsulation are called outer LSP.  Those LSP which are
    associated with inner encapsulation (closer to the label entry
    containing the S-bit) are called inner LSP.  These are not called
    top and bottom LSP since MPLS and PWE draw the label stack in
    opposite directions with PWE putting the outermost label on the
    bottom of diagrams (and confusing people in doing so).

  component link - See RFC4201.

  composite link - [pull from existing I-D]


  There is often a need to provide large aggregates of bandwidth that
  is best provided using parallel links between routers or MPLS LSR.
  In core networks there is often no alternative since the aggregate
  capacities of core networks today far exceed the capacity of a
  single physical link or single packet processing element.

  Today this requirement can be handled by Ethernet Link Aggregation
  [IEEE802.1X], link bundling [RFC4201], or other aggregation
  techniques some of which may be vendor specific.  Each has strengths
  and weaknesses.

  The term composite link is more general than terms such as link
  aggregate which is generally considered to be specific to Ethernet
  and its use here is consistent with the broad definition in [ITU

  Large aggregates of IP traffic do not provide explicit signaling to
  indicate the expected traffic loads.  Large aggregates of MPLS
  traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP
  which are signaled using RSVP-TE extensions do provide explicit
  signaling which includes the expected traffic load for the
  aggregate.  LSP which are signaled using LDP do not provide an
  expected traffic load.

  MPLS LSP may contain other MPLS LSP arranged hierarchically.  When
  an MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
  payload, there is no signaling associated with these inner LSP.
  Therefore even when using RSVP-TE signaling there may be
  insufficient information provided by signaling to adequately
  distribute load across a composite link.

  Generally a set of label stack entries that is unique across the
  ordered set of label numbers can safely be assumed to contain a
  group of flows.  The reordering of traffic can therefore be
  considered to be acceptable unless reordering occurs within traffic
  containing a common unique set of label stack entries.  Existing
  load splitting techniques take advantage of this property in
  addition to looking beyond the bottom of the label stack and
  determining if the payload is IPv4 or IPv6 to load balance traffic

  For example a large aggregate of IP traffic may be subdivided into a
  large number of groups of flows using a hash on the IP source and
  destination addresses.  This is as described in [diffserv
  framework].  For MPLS traffic carrying IP, a similar hash can be
  performed on the set of labels in the label stack.  These techniques
  are both examples of means to subdivide traffic into groups of flows
  for the purpose of load balancing traffic across aggregated link
  capacity.  The means of identifying a flow should not be confused
  with the definition of a flow.

  Discussion of whether a hash based approach provides a sufficiently
  even load balance using any particular hashing algorithm or method
  of distributing traffic across a set of component links is outside
  of the scope of this document.

  The use of three hash based approaches are defined in RFCxxxx.  The
  use of hash based approaches is mentioned as an example of an
  existing set of techniques to distribute traffic over a set of
  component links.  Other techniques are not precluded.


  These requirements refer to link bundling solely to provide a frame
  of reference.  This requirements document does not intend to
  constrain a solution to build upon link bundling.  Meeting these
  requirements useing extensions to link bundling is not precluded, if
  doing so is determined by later IETF work to be the best solution.

  The first few requirements listed here are met or partially met by
  existing link bundling behavior including common behaviour that is
  implemented when the all ones address (for example 0xFFFFFFFF for
  IPv4) is used.  This common behaviour today makes use of a hashing
  technique as described in the introduction, though other behaviours
  are not precluded.

  1.  Aggregated control information which summarizes multiple
      parallel links into a single advertisement is required to reduce
      information load and improve scaleability.

  2.  A means to support very large LSP is needed, including LSP whose
      total bandwidth exceeds the size of a single component link but
      whose traffic has no single flow greater the component links.
      In link bundling this is supported by many implementations using
      the all ones address component addressing and hash based

      Note: some implementations impose further restrictions regarding
      the distribution of traffic across the set of identifiers used
      in flow identification.  Discussion of algorithms and
      limitations of existing implementations is out of scope for this
      requirements document.

  The remaining requirements are not met by existing link bundling.

  3.  In some more than one set of metrics is needed to accommodate a
      mix of capacity with different characteristics, particularly a
      bundle where a subset of component links have shorter delay.

  4.  A mechansism is needed to signal an LSP such that a component
      link with specific characteristics are chosen, if a preference
      exists.  For example, the shortest delay may be required for
      some LSP, but not required for others.

  5.  LSP signaling is needed to indicate a preference for placement
      on a single component link and to specifically forbid spreading
      that LSP over multiple component links based on flow
      identification beyond the outermost label entry.

  6.  A means to support non-disruptive reallocation of an existing
      LSP to another component link is needed.

  7.  A means to populate the TE-LSDB with information regarding which
      links (per end) can support distribution of large LSP across
      multiple component links based on the component flows and the
      characteristics of this capability.  Key characteristics are:

        a.  The largest single flow that can be supported.  This may
            or may not be related to the size of component links.

        b.  Characteristics of the flow identification method.  [These
            can be enumberated in this document or a later document. ]

        c.  Other characteristics?  [ Not sure if I got everything
            mentioned in the WG meeting. ]

  8.  Some means is needed for an LSP which allows distribution of
      flows across member links to indicate characteristics of flow
      distribution.  These characteristics include:

        a.  The largest flow expected.

        b.  Other? [ Did we identify any other?  There was some
            chatter about distribution of flows but no specific
            characteristics was called for - AFAIK ]

  9.  In some cases it may be useful to measure link parameters
      and reflect these in metrics.  Link delay is an example.

  10. Some uses require an ability to bound the sum of delay metrics
      along a path while otherwise taking the shorted path related to
      another metric.  [This was mentioned but seems a bit orthogonal
      to all but #3.]


  [ A set of example scenarios were discussed.  We may want to capture
    them here (and maybe refine the examples). ]

rtgwg mailing list
[email protected]

<Prev in Thread] Current Thread [Next in Thread>