[email protected]
[Top] [All Lists]

RE: composite link - candidate for respin, second try

Subject: RE: composite link - candidate for respin, second try
From: "Mcdysan, David E"
Date: Mon, 29 Mar 2010 14:22:37 -0400
Hi Curtis,

Catching up on this thread again. Thanks for the consolidation to ease
commenting.

First a few general comments
* Rewriting the existing draft does not mean that the intent of
requirements there should not be included in the rewrite. In some cases,
my comment is to rewrite the intent of text from the previous draft. 
* Second, you seem to have a different recollection from the meeting
than I do. I heard several wg members wanting to hear more about service
provider problems to be solved (what you called "storytelling"), while
you seem to want to focus on writing up prior work as background. We
need to get wg member and chair feedback in this area. 

Some detailed comments in line below.

Thanks,

Dave

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Curtis Villamizar
> Sent: Monday, March 29, 2010 2:30 AM
> To: [email protected]
> Subject: composite link - candidate for respin, second try
> 
> 
> Good people of RTGWG,
> 
> The first round seemed to go reasonably well so I've 
> incorporated comments from Dave, Tony, and Lucy.
> 
> Obviously this is still just a start.
> 
> I've added editorial comments with [CV] that are not part of 
> the text, just reflecting what I changed.
> 
> I hope I've adequately reflected everyone's comments.  If 
> not, please comment on this version.
> 
> Thanks,
> 
> Curtis
> 
> 
> 
> Key terms:
> 
>   flow - A flow in the context of this document is a aggregate of
>     traffic for which packets should not be reordered.  The term
>     "flow" is used here for brevity.  This definition of flow should
>     not be interpreted to have broader scope than this document.
> 
>     A flow in this context should not be confused with a microflow or
>     ordered aggregate as defined in [RFC2475] which share the
>     similarity of requiring that reordering be avoided but microflow
>     is specific to IP where flow can mean either IP or MPLS.
> 
> [CV] This is a rearrangement of the text in the last email.  
> It might be a little more clear.  I filled in the reference 
> to RFC2475.

In the interest of brevity, I agree with Tony and don't find the
negative definition useful.

> 
>   flow identification - The means of identifying a flow or a group of
>     flows may be specific to a type of payload.  A particular flow
>     identification method may isolate a group of one, however that
>     behaviour is neither precluded or required.
> 
> [CV] Added last sentence based on Lucy's comments.
> 
>   top label entry - In MPLS the top label entry contains the label on
>     which an intitial forwarding decision is made.  This label may be
>     popped and the forwarding decision may involve further labels but
>     that is immeterial to this discussion.
> 
>   label stack - In MPLS the label stack includes all of the MPLS
>     labels from the top of the stack to the label marked with the
>     S-bit (Bottom of Stack bit) set.
> 
>   outer LSP(s) and inner LSP(s) - The LSP(s) associated with labels in
>     the outer encapsulation are called outer LSP.  The outer label
>     stack entries are used for forwarding.  The remaining LSP(s) which
>     are associated with inner encapsulation (closer to the label entry
>     containing the S-bit) are called inner LSP(s).  There is a single
>     outermost LSP and innermost LSP, but may be multiple outer and
>     inner LSP.  These are not called top and bottom LSP since MPLS and
>     PWE draw the label stack in opposite directions with PWE putting
>     the outermost label on the bottom of diagrams (and confusing
>     people in doing so).

Again, brevity -- negative definition not necessary.

> 
>   component link - a physical link (e.g., Lambda, Ethernet PHY,
>    SONET/SDH, OTN, etc.) with packet transport capability, or a
>    logical link (e.g., MPLS LSP, Ethernet VLAN, MPLS-TP LSP, etc.)
> 
>   composite link - a group of component links, which can be considered
>    as a single MPLS TE link or as a single IP link used for MPLS.  The
>    ITU-T [ITU-T G.800] defines Composite Link Characteristics as those
>    which makes multiple parallel component links between two transport
>    nodes appear as a single logical link from the network perspective.
>    Each component link in a composite link can be supported by a
>    separate server layer trail, i.e., the component links in a
>    composite link can have the same or different properties such as
>    latency and capacity.

If (part of this text) is a direct quote, indicate as such.

> 
> Introduction:
> 
>   There is often a need to provide large aggregates of bandwidth that
>   is best provided using parallel links between routers or MPLS LSR.
>   In core networks there is often no alternative since the aggregate
>   capacities of core networks today far exceed the capacity of a
>   single physical link or single packet processing element.
> 
> [CV] Awaiting concensus on moving following to appendix.  I 
> think it belongs here.
> 
>   Today this requirement can be handled by Ethernet Link Aggregation
>   [IEEE802.1AX], link bundling [RFC4201], or other aggregation
>   techniques some of which may be vendor specific.  Each has strengths
>   and weaknesses.
> 
>   The term composite link is more general than terms such as link
>   aggregate which is generally considered to be specific to Ethernet
>   and its use here is consistent with the broad definition in [ITU
>   G.800].
> 
>   Large aggregates of IP traffic do not provide explicit signaling to
>   indicate the expected traffic loads.  Large aggregates of MPLS
>   traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP
>   which are signaled using RSVP-TE extensions do provide explicit
>   signaling which includes the expected traffic load for the
>   aggregate.  LSP which are signaled using LDP do not provide an
>   expected traffic load.
> 
>   MPLS LSP may contain other MPLS LSP arranged hierarchically.  When
>   an MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
>   payload, there is no signaling associated with these inner LSP.
>   Therefore even when using RSVP-TE signaling there may be
>   insufficient information provided by signaling to adequately
>   distribute load across a composite link.
> 
>   Generally a set of label stack entries that is unique across the
>   ordered set of label numbers can safely be assumed to contain a
>   group of flows.  The reordering of traffic can therefore be
>   considered to be acceptable unless reordering occurs within traffic
>   containing a common unique set of label stack entries.  Existing
>   load splitting techniques take advantage of this property in
>   addition to looking beyond the bottom of the label stack and
>   determining if the payload is IPv4 or IPv6 to load balance traffic
>   accordingly.
> 
>   For example a large aggregate of IP traffic may be subdivided into a
>   large number of groups of flows using a hash on the IP source and
>   destination addresses.  This is as described in [diffserv
>   framework].  For MPLS traffic carrying IP, a similar hash can be
>   performed on the set of labels in the label stack.  These techniques
>   are both examples of means to subdivide traffic into groups of flows
>   for the purpose of load balancing traffic across aggregated link
>   capacity.  The means of identifying a flow should not be confused
>   with the definition of a flow.
> 
>   Discussion of whether a hash based approach provides a sufficiently
>   even load balance using any particular hashing algorithm or method
>   of distributing traffic across a set of component links is outside
>   of the scope of this document.
> 
>   The current load balancing techniques are referenced in [RFC4385]
>   and [RFC4928].  The use of three hash based approaches are described
>   in [RFC2991] and [RFC2992].  A mechanism to identify flows within PW
>   is described in [draft-ietf-pwe3-fat-pw].  The use of hash based
>   approaches is mentioned as an example of an existing set of
>   techniques to distribute traffic over a set of component links.
>   Other techniques are not precluded.
> 
> [CV] Added RFC references.

Where is the place for text on the description of service provider
problems? I recall the wg asking for this, and the acknowledgement of
further work could be in an Appendix (not counting against the 7 page
quota).

> 
> 
> Requirements:
> 
>   These requirements refer to link bundling solely to provide a frame
>   of reference.  This requirements document does not intend to
>   constrain a solution to build upon link bundling.  Meeting these
>   requirements useing extensions to link bundling is not precluded, if
>   doing so is determined by later IETF work to be the best solution.
> 

As noted previously, not all requirements necessarily fit into the link
bundling frame of reference.  

>   0.  The IETF imposes the following requirement on new protocol work:
> 
>       a.  New protocols should not be invented where existing
>           protocols can be extended to meet the same requirments.
> 
>       b.  Protocol extensions must retain compatibility with widely
>           implemented and widely deployed protocols and practices to
>           the greatest extent possible.
> 
> [CV] Added this as a reminder to all of us.  If anyone has a 
> citation for either of these points, that would help.  The 
> wording should be approximately right and the spirit of this 
> wording consistent with IETF process.  Maybe its just the 
> routing area that imposes these restrictions.  Help from 
> chairs or ADs on this would be appreciated.
> 
>   The first few requirements listed here are met or partially met by
>   existing link bundling behavior including common behaviour that is
>   implemented when the all ones address (for example 0xFFFFFFFF for
>   IPv4) is used.  This common behaviour today makes use of a hashing
>   technique as described in the introduction, though other behaviours
>   are not precluded.
> 
>   1.  Aggregated control information which summarizes multiple
>       parallel links into a single advertisement is required to reduce
>       information load and improve scaleability.

Suggest that the requirement be worded so that the wg can make a more
objective decisions between candidate solution approaches. "reduce
information load and improve scaleability" is vague - could be
interpreted as message load due to flooding, storage, computation,
signaling rate, etc.

> 
>   2.  A means to support very large LSP is needed, including LSP whose
>       total bandwidth exceeds the size of a single component link but
                                              than the sum of the
>       whose traffic has no single flow greater ^  the component links.
                                                 
Still unclear. Is the above what you meant? 

>       In link bundling this is supported by many implementations using
>       the all ones address component addressing and hash based
>       techniques.
> 
>       Note: some implementations impose further restrictions regarding
>       the distribution of traffic across the set of identifiers used
>       in flow identification.  Discussion of algorithms and
>       limitations of existing implementations is out of scope for this
>       requirements document.

Is the Note necessary? 

> 
>   The remaining requirements are not met by existing link bundling.
> 
>   3.  In some more than one set of metrics is needed to accommodate a
>       mix of capacity with different characteristics, particularly a
>       bundle where a subset of component links have shorter delay.

These characteristics need to be aritculated, and before they are
mentioned (not only in item 4, but also before item 3 where they are
first mentioned. Something like the following could be added to the
definition (possibly mentioned as extensions to 4201)

The component links in a composite link may have different
characteristics, including at least: capacity, current latency,
indication of whether latency can change, and possibly others.

> 
>   4.  A mechansism is needed to signal an LSP such that a component
>       link with specific characteristics are chosen, if a preference
>       exists.  For example, the shortest delay may be required for
>       some LSP, but not required for others.

Examples are network operator stories that are told. :) I think the
wording of this example has a particular form of solution in mind, and
what we need to identify is the underlying requirement. Possibly that is
a better organization for the document than only an ordering of
requirements that build on each other, as you clarified. 

> 
>   5.  LSP signaling is needed to indicate a preference for placement
>       on a single component link and to specifically forbid spreading
>       that LSP over multiple component links based on flow
>       identification beyond the outermost label entry.
> 
> [CV] Awaiting concensus.  What I had in mind was two choices, 
> outer only or all.  Do others think we need to specify 
> looking some fixed depth into the stack for the hash for a 
> given LSP?  If so we need to discuss possible forwarding 
> speed consequences (hash and lookup can't be done in parallel 
> with hash disposed of if not needed).

It seems to me that knowing the potential depth pf MPLS/packet header
inspection would need to be known to the sender in some circumstances to
meeting this requirement.

> 
>   6.  A means to support non-disruptive reallocation of an existing
>       LSP to another component link is needed.

In the storytelling section, I recommend that we describe that the LSP
is actually being moved, which could cause reordering or increased
jitter (which does cause disruption). That is WHY the change frequency
of item 8 is specified. Non-disruptive will not always be possible, I
recall in the meeting the term "minimally disruptive" being used, which
I believe is a more accurate description of the goal. 

> 
>   7.  A means to populate the TE-LSDB with information regarding which
>       links (per end) can support distribution of large LSP across
>       multiple component links based on the component flows and the
>       characteristics of this capability.  Key characteristics are:
> 
>       a.  The largest single flow that can be supported.  This may
>           or may not be related to the size of component links.

It seems this is also related to whether the LSP can have the behavior
described in item 2.

> 
>       b.  Characteristics of the flow identification method.  [These

>           can be enumberated in this document or a later document. ]
> 
>       c.  Characteristics of the flow adjustment method.  [These
>             can be enumberated in this document or a later document. ]

"flow adjustment" should be added to the definition section, so that the
document is clear on use of this term. 

> 
>   8.  Some means is needed to specify desired characteristics of flow
>       distribution for an LSP, regardless of whether the LSP is set up
>       using RSVP-TE, LDP, or management plane.  Behaviour for IP must
>       be configured using the management plane.  These characteristics
>       include:
> 
> [CV] Reworded above paragraph to indicate that LDP and static 
> LSP and IP are not omitted from this definition.  Note that 
> LDP can give guidance but does not support TE so it can't be 
> rejected and go elsewhere if the guidance can't be followed.
> 
>       a.  The largest flow expected.
> 
>         b.  Characteristics of load adjustment.  For example, a
>             maximum change frequency might be specified.  [These can
>             be enumberated in this document or a later document. ]

Is "load adjustment" something different from "flow adjustment"

> 
>   9.  In some cases it may be useful to measure link parameters
>       and reflect these in metrics.  Link delay is an example.

I believe this is stronger than the optional "may." A protocol
(extension or new) is definitely needed to report latency from a lower
(server) layer up to a higher (client) layer network. An ordering of
importance needs to be in the document

> 
>  10.  Some uses require an ability to bound the sum of delay metrics
>       along a path while otherwise taking the shorted path related to
>       another metric.  Algorithms for accomplishing this are applied
>       at an ingress, PCE, or in the management system and are out of
>       scope.
> 
> [CV] Limited scope above.

OK, you are proposing another scope change. As I commented before, this
is an important network operator problem. The form of routing and/or
signaling extensions used to meet other requirements may be able to be
used to (help) meet this objective. I believe that declaring it out of
scope is premature. 

> 
>  11.  Impact of load balancing on OAM and mitigation techniques
>       applicable to OAM must be documented.
> 
>  12.  Load balancing techniques must not oscillate.
> 
> [CV] Added above two based on Dave's comments.
> 
> [CV] Took suggestion to move use scenarios to framework.
> _______________________________________________
> rtgwg mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/rtgwg
> 
_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

<Prev in Thread] Current Thread [Next in Thread>