[email protected]
[Top] [All Lists]

composite link - candidate for respin, second try

Subject: composite link - candidate for respin, second try
From: Curtis Villamizar
Date: Mon, 29 Mar 2010 02:29:52 -0400
Good people of RTGWG,

The first round seemed to go reasonably well so I've incorporated
comments from Dave, Tony, and Lucy.

Obviously this is still just a start.

I've added editorial comments with [CV] that are not part of the text,
just reflecting what I changed.

I hope I've adequately reflected everyone's comments.  If not, please
comment on this version.

Thanks,

Curtis



Key terms:

  flow - A flow in the context of this document is a aggregate of
    traffic for which packets should not be reordered.  The term
    "flow" is used here for brevity.  This definition of flow should
    not be interpreted to have broader scope than this document.

    A flow in this context should not be confused with a microflow or
    ordered aggregate as defined in [RFC2475] which share the
    similarity of requiring that reordering be avoided but microflow
    is specific to IP where flow can mean either IP or MPLS.

[CV] This is a rearrangement of the text in the last email.  It might
be a little more clear.  I filled in the reference to RFC2475.

  flow identification - The means of identifying a flow or a group of
    flows may be specific to a type of payload.  A particular flow
    identification method may isolate a group of one, however that
    behaviour is neither precluded or required.

[CV] Added last sentence based on Lucy's comments.

  top label entry - In MPLS the top label entry contains the label on
    which an intitial forwarding decision is made.  This label may be
    popped and the forwarding decision may involve further labels but
    that is immeterial to this discussion.

  label stack - In MPLS the label stack includes all of the MPLS
    labels from the top of the stack to the label marked with the
    S-bit (Bottom of Stack bit) set.

  outer LSP(s) and inner LSP(s) - The LSP(s) associated with labels in
    the outer encapsulation are called outer LSP.  The outer label
    stack entries are used for forwarding.  The remaining LSP(s) which
    are associated with inner encapsulation (closer to the label entry
    containing the S-bit) are called inner LSP(s).  There is a single
    outermost LSP and innermost LSP, but may be multiple outer and
    inner LSP.  These are not called top and bottom LSP since MPLS and
    PWE draw the label stack in opposite directions with PWE putting
    the outermost label on the bottom of diagrams (and confusing
    people in doing so).

  component link - a physical link (e.g., Lambda, Ethernet PHY,
   SONET/SDH, OTN, etc.) with packet transport capability, or a
   logical link (e.g., MPLS LSP, Ethernet VLAN, MPLS-TP LSP, etc.)

  composite link - a group of component links, which can be considered
   as a single MPLS TE link or as a single IP link used for MPLS.  The
   ITU-T [ITU-T G.800] defines Composite Link Characteristics as those
   which makes multiple parallel component links between two transport
   nodes appear as a single logical link from the network perspective.
   Each component link in a composite link can be supported by a
   separate server layer trail, i.e., the component links in a
   composite link can have the same or different properties such as
   latency and capacity.

Introduction:

  There is often a need to provide large aggregates of bandwidth that
  is best provided using parallel links between routers or MPLS LSR.
  In core networks there is often no alternative since the aggregate
  capacities of core networks today far exceed the capacity of a
  single physical link or single packet processing element.

[CV] Awaiting concensus on moving following to appendix.  I think it
belongs here.

  Today this requirement can be handled by Ethernet Link Aggregation
  [IEEE802.1AX], link bundling [RFC4201], or other aggregation
  techniques some of which may be vendor specific.  Each has strengths
  and weaknesses.

  The term composite link is more general than terms such as link
  aggregate which is generally considered to be specific to Ethernet
  and its use here is consistent with the broad definition in [ITU
  G.800].

  Large aggregates of IP traffic do not provide explicit signaling to
  indicate the expected traffic loads.  Large aggregates of MPLS
  traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP
  which are signaled using RSVP-TE extensions do provide explicit
  signaling which includes the expected traffic load for the
  aggregate.  LSP which are signaled using LDP do not provide an
  expected traffic load.

  MPLS LSP may contain other MPLS LSP arranged hierarchically.  When
  an MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
  payload, there is no signaling associated with these inner LSP.
  Therefore even when using RSVP-TE signaling there may be
  insufficient information provided by signaling to adequately
  distribute load across a composite link.

  Generally a set of label stack entries that is unique across the
  ordered set of label numbers can safely be assumed to contain a
  group of flows.  The reordering of traffic can therefore be
  considered to be acceptable unless reordering occurs within traffic
  containing a common unique set of label stack entries.  Existing
  load splitting techniques take advantage of this property in
  addition to looking beyond the bottom of the label stack and
  determining if the payload is IPv4 or IPv6 to load balance traffic
  accordingly.

  For example a large aggregate of IP traffic may be subdivided into a
  large number of groups of flows using a hash on the IP source and
  destination addresses.  This is as described in [diffserv
  framework].  For MPLS traffic carrying IP, a similar hash can be
  performed on the set of labels in the label stack.  These techniques
  are both examples of means to subdivide traffic into groups of flows
  for the purpose of load balancing traffic across aggregated link
  capacity.  The means of identifying a flow should not be confused
  with the definition of a flow.

  Discussion of whether a hash based approach provides a sufficiently
  even load balance using any particular hashing algorithm or method
  of distributing traffic across a set of component links is outside
  of the scope of this document.

  The current load balancing techniques are referenced in [RFC4385]
  and [RFC4928].  The use of three hash based approaches are described
  in [RFC2991] and [RFC2992].  A mechanism to identify flows within PW
  is described in [draft-ietf-pwe3-fat-pw].  The use of hash based
  approaches is mentioned as an example of an existing set of
  techniques to distribute traffic over a set of component links.
  Other techniques are not precluded.

[CV] Added RFC references.


Requirements:

  These requirements refer to link bundling solely to provide a frame
  of reference.  This requirements document does not intend to
  constrain a solution to build upon link bundling.  Meeting these
  requirements useing extensions to link bundling is not precluded, if
  doing so is determined by later IETF work to be the best solution.

  0.  The IETF imposes the following requirement on new protocol work:

      a.  New protocols should not be invented where existing
          protocols can be extended to meet the same requirments.

      b.  Protocol extensions must retain compatibility with widely
          implemented and widely deployed protocols and practices to
          the greatest extent possible.

[CV] Added this as a reminder to all of us.  If anyone has a citation
for either of these points, that would help.  The wording should be
approximately right and the spirit of this wording consistent with
IETF process.  Maybe its just the routing area that imposes these
restrictions.  Help from chairs or ADs on this would be appreciated.

  The first few requirements listed here are met or partially met by
  existing link bundling behavior including common behaviour that is
  implemented when the all ones address (for example 0xFFFFFFFF for
  IPv4) is used.  This common behaviour today makes use of a hashing
  technique as described in the introduction, though other behaviours
  are not precluded.

  1.  Aggregated control information which summarizes multiple
      parallel links into a single advertisement is required to reduce
      information load and improve scaleability.

  2.  A means to support very large LSP is needed, including LSP whose
      total bandwidth exceeds the size of a single component link but
      whose traffic has no single flow greater the component links.
      In link bundling this is supported by many implementations using
      the all ones address component addressing and hash based
      techniques.

      Note: some implementations impose further restrictions regarding
      the distribution of traffic across the set of identifiers used
      in flow identification.  Discussion of algorithms and
      limitations of existing implementations is out of scope for this
      requirements document.

  The remaining requirements are not met by existing link bundling.

  3.  In some more than one set of metrics is needed to accommodate a
      mix of capacity with different characteristics, particularly a
      bundle where a subset of component links have shorter delay.

  4.  A mechansism is needed to signal an LSP such that a component
      link with specific characteristics are chosen, if a preference
      exists.  For example, the shortest delay may be required for
      some LSP, but not required for others.

  5.  LSP signaling is needed to indicate a preference for placement
      on a single component link and to specifically forbid spreading
      that LSP over multiple component links based on flow
      identification beyond the outermost label entry.

[CV] Awaiting concensus.  What I had in mind was two choices, outer
only or all.  Do others think we need to specify looking some fixed
depth into the stack for the hash for a given LSP?  If so we need to
discuss possible forwarding speed consequences (hash and lookup can't
be done in parallel with hash disposed of if not needed).

  6.  A means to support non-disruptive reallocation of an existing
      LSP to another component link is needed.

  7.  A means to populate the TE-LSDB with information regarding which
      links (per end) can support distribution of large LSP across
      multiple component links based on the component flows and the
      characteristics of this capability.  Key characteristics are:

        a.  The largest single flow that can be supported.  This may
            or may not be related to the size of component links.

        b.  Characteristics of the flow identification method.  [These
            can be enumberated in this document or a later document. ]

        c.  Characteristics of the flow adjustment method.  [These
            can be enumberated in this document or a later document. ]

  8.  Some means is needed to specify desired characteristics of flow
      distribution for an LSP, regardless of whether the LSP is set up
      using RSVP-TE, LDP, or management plane.  Behaviour for IP must
      be configured using the management plane.  These characteristics
      include:

[CV] Reworded above paragraph to indicate that LDP and static LSP and
IP are not omitted from this definition.  Note that LDP can give
guidance but does not support TE so it can't be rejected and go
elsewhere if the guidance can't be followed.

        a.  The largest flow expected.

        b.  Characteristics of load adjustment.  For example, a
            maximum change frequency might be specified.  [These can
            be enumberated in this document or a later document. ]

  9.  In some cases it may be useful to measure link parameters
      and reflect these in metrics.  Link delay is an example.

 10.  Some uses require an ability to bound the sum of delay metrics
      along a path while otherwise taking the shorted path related to
      another metric.  Algorithms for accomplishing this are applied
      at an ingress, PCE, or in the management system and are out of
      scope.

[CV] Limited scope above.

 11.  Impact of load balancing on OAM and mitigation techniques
      applicable to OAM must be documented.

 12.  Load balancing techniques must not oscillate.

[CV] Added above two based on Dave's comments.

[CV] Took suggestion to move use scenarios to framework.
_______________________________________________
rtgwg mailing list
[email protected]xxxxxxxx
https://www.ietf.org/mailman/listinfo/rtgwg

<Prev in Thread] Current Thread [Next in Thread>