Mon, 29 Mar 2010

We have also used the terms 'containing LSP' and 'contained LSP'.  
'Hierarchical LSP' also generally refers to the containing LSP.  For MPLS-TP, 
Adrian introduced the term 'Path Segment Tunnel', which is a hierarchical LSP 
used for segment recovery.

> The first round seemed to go reasonably well so I've incorporated
> comments from Dave, Tony, and Lucy.
> Obviously this is still just a start.
> I've added editorial comments with [CV] that are not part of the text,
> just reflecting what I changed.
> I hope I've adequately reflected everyone's comments.  If not, please
> comment on this version.
> Key terms:
>   flow - A flow in the context of this document is a aggregate of
>     traffic for which packets should not be reordered.  The term
>     "flow" is used here for brevity.  This definition of flow should
>     not be interpreted to have broader scope than this document.
>     A flow in this context should not be confused with a microflow or
>     ordered aggregate as defined in [RFC2475] which share the
>     similarity of requiring that reordering be avoided but microflow
>     is specific to IP where flow can mean either IP or MPLS.
> [CV] This is a rearrangement of the text in the last email.  It might
> be a little more clear.  I filled in the reference to RFC2475.
>   flow identification - The means of identifying a flow or a group of
>     flows may be specific to a type of payload.  A particular flow
>     identification method may isolate a group of one, however that
>     behaviour is neither precluded or required.
> [CV] Added last sentence based on Lucy's comments.
>   top label entry - In MPLS the top label entry contains the label on
>     which an intitial forwarding decision is made.  This label may be
>     popped and the forwarding decision may involve further labels but
>     that is immeterial to this discussion.
>   label stack - In MPLS the label stack includes all of the MPLS
>     labels from the top of the stack to the label marked with the
>     S-bit (Bottom of Stack bit) set.
>   outer LSP(s) and inner LSP(s) - The LSP(s) associated with labels in
>     the outer encapsulation are called outer LSP.  The outer label
>     stack entries are used for forwarding.  The remaining LSP(s) which
>     are associated with inner encapsulation (closer to the label entry
>     containing the S-bit) are called inner LSP(s).  There is a single
>     outermost LSP and innermost LSP, but may be multiple outer and
>     inner LSP.  These are not called top and bottom LSP since MPLS and
>     PWE draw the label stack in opposite directions with PWE putting
>     the outermost label on the bottom of diagrams (and confusing
>     people in doing so).
>   component link - a physical link (e.g., Lambda, Ethernet PHY,
>    SONET/SDH, OTN, etc.) with packet transport capability, or a
>    logical link (e.g., MPLS LSP, Ethernet VLAN, MPLS-TP LSP, etc.)
>   composite link - a group of component links, which can be considered
>    as a single MPLS TE link or as a single IP link used for MPLS.  The
>    ITU-T [ITU-T G.800] defines Composite Link Characteristics as those
>    which makes multiple parallel component links between two transport
>    nodes appear as a single logical link from the network perspective.
>    Each component link in a composite link can be supported by a
>    separate server layer trail, i.e., the component links in a
>    composite link can have the same or different properties such as
>    latency and capacity.
> Introduction:
>   There is often a need to provide large aggregates of bandwidth that
>   is best provided using parallel links between routers or MPLS LSR.
>   In core networks there is often no alternative since the aggregate
>   capacities of core networks today far exceed the capacity of a
>   single physical link or single packet processing element.
> [CV] Awaiting concensus on moving following to appendix.  I think it
> belongs here.
>   Today this requirement can be handled by Ethernet Link Aggregation
>   [IEEE802.1AX], link bundling [RFC4201], or other aggregation
>   techniques some of which may be vendor specific.  Each has strengths
>   and weaknesses.
>   The term composite link is more general than terms such as link
>   aggregate which is generally considered to be specific to Ethernet
>   and its use here is consistent with the broad definition in [ITU
>   G.800].
>   Large aggregates of IP traffic do not provide explicit signaling to
>   indicate the expected traffic loads.  Large aggregates of MPLS
>   traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP
>   which are signaled using RSVP-TE extensions do provide explicit
>   signaling which includes the expected traffic load for the
>   aggregate.  LSP which are signaled using LDP do not provide an
>   expected traffic load.
>   MPLS LSP may contain other MPLS LSP arranged hierarchically.  When
>   an MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
>   payload, there is no signaling associated with these inner LSP.
>   Therefore even when using RSVP-TE signaling there may be
>   insufficient information provided by signaling to adequately
>   distribute load across a composite link.
>   Generally a set of label stack entries that is unique across the
>   ordered set of label numbers can safely be assumed to contain a
>   group of flows.  The reordering of traffic can therefore be
>   considered to be acceptable unless reordering occurs within traffic
>   containing a common unique set of label stack entries.  Existing
>   load splitting techniques take advantage of this property in
>   addition to looking beyond the bottom of the label stack and
>   determining if the payload is IPv4 or IPv6 to load balance traffic
>   accordingly.
>   For example a large aggregate of IP traffic may be subdivided into a
>   large number of groups of flows using a hash on the IP source and
>   destination addresses.  This is as described in [diffserv
>   framework].  For MPLS traffic carrying IP, a similar hash can be
>   performed on the set of labels in the label stack.  These techniques
>   are both examples of means to subdivide traffic into groups of flows
>   for the purpose of load balancing traffic across aggregated link
>   capacity.  The means of identifying a flow should not be confused
>   with the definition of a flow.
>   Discussion of whether a hash based approach provides a sufficiently
>   even load balance using any particular hashing algorithm or method
>   of distributing traffic across a set of component links is outside
>   of the scope of this document.
>   The current load balancing techniques are referenced in [RFC4385]
>   and [RFC4928].  The use of three hash based approaches are described
>   in [RFC2991] and [RFC2992].  A mechanism to identify flows within PW
>   is described in [draft-ietf-pwe3-fat-pw].  The use of hash based
>   approaches is mentioned as an example of an existing set of
>   techniques to distribute traffic over a set of component links.
>   Other techniques are not precluded.
> [CV] Added RFC references.
> Requirements:
>   These requirements refer to link bundling solely to provide a frame
>   of reference.  This requirements document does not intend to
>   constrain a solution to build upon link bundling.  Meeting these
>   requirements useing extensions to link bundling is not precluded, if
>   doing so is determined by later IETF work to be the best solution.
>   0.  The IETF imposes the following requirement on new protocol work:
>       a.  New protocols should not be invented where existing
>           protocols can be extended to meet the same requirments.
>       b.  Protocol extensions must retain compatibility with widely
>           implemented and widely deployed protocols and practices to
>           the greatest extent possible.
> [CV] Added this as a reminder to all of us.  If anyone has a citation
> for either of these points, that would help.  The wording should be
> approximately right and the spirit of this wording consistent with
> IETF process.  Maybe its just the routing area that imposes these
> restrictions.  Help from chairs or ADs on this would be appreciated.
>   The first few requirements listed here are met or partially met by
>   existing link bundling behavior including common behaviour that is
>   implemented when the all ones address (for example 0xFFFFFFFF for
>   IPv4) is used.  This common behaviour today makes use of a hashing
>   technique as described in the introduction, though other behaviours
>   are not precluded.
>   1.  Aggregated control information which summarizes multiple
>       parallel links into a single advertisement is required to reduce
>       information load and improve scaleability.
>   2.  A means to support very large LSP is needed, including LSP whose
>       total bandwidth exceeds the size of a single component link but
>       whose traffic has no single flow greater the component links.
>       In link bundling this is supported by many implementations using
>       the all ones address component addressing and hash based
>       techniques.
>       Note: some implementations impose further restrictions regarding
>       the distribution of traffic across the set of identifiers used
>       in flow identification.  Discussion of algorithms and
>       limitations of existing implementations is out of scope for this
>       requirements document.
>   The remaining requirements are not met by existing link bundling.
>   3.  In some more than one set of metrics is needed to accommodate a
>       mix of capacity with different characteristics, particularly a
>       bundle where a subset of component links have shorter delay.

Doesn't the T-Spec contain the necessary information?

>   4.  A mechansism is needed to signal an LSP such that a component
>       link with specific characteristics are chosen, if a preference
>       exists.  For example, the shortest delay may be required for
>       some LSP, but not required for others.

See above.

>   5.  LSP signaling is needed to indicate a preference for placement
>       on a single component link and to specifically forbid spreading
>       that LSP over multiple component links based on flow
>       identification beyond the outermost label entry.


> [CV] Awaiting concensus.  What I had in mind was two choices, outer
> only or all.  Do others think we need to specify looking some fixed
> depth into the stack for the hash for a given LSP?  If so we need to
> discuss possible forwarding speed consequences (hash and lookup can't
> be done in parallel with hash disposed of if not needed).
>   6.  A means to support non-disruptive reallocation of an existing
>       LSP to another component link is needed.

If an LSP is assigned to a specific component link, RFC 5150 
(http://datatracker.ietf.org/doc/rfc5150/) (a one hop LSP on the assigned 
component link) and MBB can be used to move the LSP non-disruptively to another 
component link.  

>   7.  A means to populate the TE-LSDB with information regarding which
>       links (per end) can support distribution of large LSP across
>       multiple component links based on the component flows and the
>       characteristics of this capability.  Key characteristics are:
>       a.  The largest single flow that can be supported.  This may
>           or may not be related to the size of component links.
>       b.  Characteristics of the flow identification method.  [These
>           can be enumberated in this document or a later document. ]
>       c.  Characteristics of the flow adjustment method.  [These
>             can be enumberated in this document or a later document. ]
>   8.  Some means is needed to specify desired characteristics of flow
>       distribution for an LSP, regardless of whether the LSP is set up
>       using RSVP-TE, LDP, or management plane.  Behaviour for IP must
>       be configured using the management plane.  These characteristics
>       include:
> [CV] Reworded above paragraph to indicate that LDP and static LSP and
> IP are not omitted from this definition.  Note that LDP can give
> guidance but does not support TE so it can't be rejected and go
> elsewhere if the guidance can't be followed.
>       a.  The largest flow expected.
>         b.  Characteristics of load adjustment.  For example, a
>             maximum change frequency might be specified.  [These can
>             be enumberated in this document or a later document. ]
>   9.  In some cases it may be useful to measure link parameters
>       and reflect these in metrics.  Link delay is an example.
>  10.  Some uses require an ability to bound the sum of delay metrics
>       along a path while otherwise taking the shorted path related to
>       another metric.  Algorithms for accomplishing this are applied
>       at an ingress, PCE, or in the management system and are out of
>       scope.
> [CV] Limited scope above.
>  11.  Impact of load balancing on OAM and mitigation techniques
>       applicable to OAM must be documented.
>  12.  Load balancing techniques must not oscillate.
> [CV] Added above two based on Dave's comments.
> [CV] Took suggestion to move use scenarios to framework.
