[email protected]
[Top] [All Lists]

Re: composite link - candidate for respin, second try

Subject: Re: composite link - candidate for respin, second try
From: Curtis Villamizar
Date: Tue, 30 Mar 2010 00:18:33 -0400
In message 
<[email protected]xxxxxxxxx>
"Mcdysan, David E" writes:
>  
> Hi Curtis,
>  
> Catching up on this thread again. Thanks for the consolidation to ease
> commenting.
>  
> First a few general comments
> * Rewriting the existing draft does not mean that the intent of
> requirements there should not be included in the rewrite. In some cases,
> my comment is to rewrite the intent of text from the previous draft. 
> * Second, you seem to have a different recollection from the meeting
> than I do. I heard several wg members wanting to hear more about service
> provider problems to be solved (what you called "storytelling"), while
> you seem to want to focus on writing up prior work as background. We
> need to get wg member and chair feedback in this area. 
>  
> Some detailed comments in line below.
>  
> Thanks,
>  
> Dave


Yes this is a fast moving thread.

I had put a placeholder at the end for use cases but you indicated in
prior email a preference to put that in the framework.  I'll put the
placeholder back and you can provider the text.

I've tried to capture what was agreed to in the meeting.  In the
meeting it seemed to me that the conclusion was that the original
focused too much on constraining the solution and we should focus on
requirements with as little constraint on solutions as possible.  If
there are missed requirements that are agreed to by the WG we can add
them.  I think what we have now reasonably captures what we discussed
at the meeting.  If we want to add further requirements, then we need
to bring them up and discuss them on the list.

Curtis



> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] 
> > On Behalf Of Curtis Villamizar
> > Sent: Monday, March 29, 2010 2:30 AM
> > To: [email protected]
> > Subject: composite link - candidate for respin, second try
> > 
> > 
> > Good people of RTGWG,
> > 
> > The first round seemed to go reasonably well so I've 
> > incorporated comments from Dave, Tony, and Lucy.
> > 
> > Obviously this is still just a start.
> > 
> > I've added editorial comments with [CV] that are not part of 
> > the text, just reflecting what I changed.
> > 
> > I hope I've adequately reflected everyone's comments.  If 
> > not, please comment on this version.
> > 
> > Thanks,
> > 
> > Curtis
> > 
> > 
> > 
> > Key terms:
> > 
> >   flow - A flow in the context of this document is a aggregate of
> >     traffic for which packets should not be reordered.  The term
> >     "flow" is used here for brevity.  This definition of flow should
> >     not be interpreted to have broader scope than this document.
> > 
> >     A flow in this context should not be confused with a microflow or
> >     ordered aggregate as defined in [RFC2475] which share the
> >     similarity of requiring that reordering be avoided but microflow
> >     is specific to IP where flow can mean either IP or MPLS.
> > 
> > [CV] This is a rearrangement of the text in the last email.  
> > It might be a little more clear.  I filled in the reference 
> > to RFC2475.
>  
> In the interest of brevity, I agree with Tony and don't find the
> negative definition useful.


OK.  The last paragraph will be deleted.


> >   flow identification - The means of identifying a flow or a group of
> >     flows may be specific to a type of payload.  A particular flow
> >     identification method may isolate a group of one, however that
> >     behaviour is neither precluded or required.
> > 
> > [CV] Added last sentence based on Lucy's comments.
> > 
> >   top label entry - In MPLS the top label entry contains the label on
> >     which an intitial forwarding decision is made.  This label may be
> >     popped and the forwarding decision may involve further labels but
> >     that is immeterial to this discussion.
> > 
> >   label stack - In MPLS the label stack includes all of the MPLS
> >     labels from the top of the stack to the label marked with the
> >     S-bit (Bottom of Stack bit) set.
> > 
> >   outer LSP(s) and inner LSP(s) - The LSP(s) associated with labels in
> >     the outer encapsulation are called outer LSP.  The outer label
> >     stack entries are used for forwarding.  The remaining LSP(s) which
> >     are associated with inner encapsulation (closer to the label entry
> >     containing the S-bit) are called inner LSP(s).  There is a single
> >     outermost LSP and innermost LSP, but may be multiple outer and
> >     inner LSP.  These are not called top and bottom LSP since MPLS and
> >     PWE draw the label stack in opposite directions with PWE putting
> >     the outermost label on the bottom of diagrams (and confusing
> >     people in doing so).
>  
> Again, brevity -- negative definition not necessary.


OK.  Delete from "These are not called top and bottom LSP" to the end
of the paragraph.


> >   component link - a physical link (e.g., Lambda, Ethernet PHY,
> >    SONET/SDH, OTN, etc.) with packet transport capability, or a
> >    logical link (e.g., MPLS LSP, Ethernet VLAN, MPLS-TP LSP, etc.)
> > 
> >   composite link - a group of component links, which can be considered
> >    as a single MPLS TE link or as a single IP link used for MPLS.  The
> >    ITU-T [ITU-T G.800] defines Composite Link Characteristics as those
> >    which makes multiple parallel component links between two transport
> >    nodes appear as a single logical link from the network perspective.
> >    Each component link in a composite link can be supported by a
> >    separate server layer trail, i.e., the component links in a
> >    composite link can have the same or different properties such as
> >    latency and capacity.
>  
> If (part of this text) is a direct quote, indicate as such.


Thanks for pointing this out.  I took this directly from your document
but I may have lost something in making it more concise.  Starting
from "Each component" to the end of paragraph is a quote so I'll add
"ITU G.800 states" in front of it and put in quotes.


> > Introduction:
> > 
> >   There is often a need to provide large aggregates of bandwidth that
> >   is best provided using parallel links between routers or MPLS LSR.
> >   In core networks there is often no alternative since the aggregate
> >   capacities of core networks today far exceed the capacity of a
> >   single physical link or single packet processing element.
> > 
> > [CV] Awaiting concensus on moving following to appendix.  I 
> > think it belongs here.
> > 
> >   Today this requirement can be handled by Ethernet Link Aggregation
> >   [IEEE802.1AX], link bundling [RFC4201], or other aggregation
> >   techniques some of which may be vendor specific.  Each has strengths
> >   and weaknesses.
> > 
> >   The term composite link is more general than terms such as link
> >   aggregate which is generally considered to be specific to Ethernet
> >   and its use here is consistent with the broad definition in [ITU
> >   G.800].
> > 
> >   Large aggregates of IP traffic do not provide explicit signaling to
> >   indicate the expected traffic loads.  Large aggregates of MPLS
> >   traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP
> >   which are signaled using RSVP-TE extensions do provide explicit
> >   signaling which includes the expected traffic load for the
> >   aggregate.  LSP which are signaled using LDP do not provide an
> >   expected traffic load.
> > 
> >   MPLS LSP may contain other MPLS LSP arranged hierarchically.  When
> >   an MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
> >   payload, there is no signaling associated with these inner LSP.
> >   Therefore even when using RSVP-TE signaling there may be
> >   insufficient information provided by signaling to adequately
> >   distribute load across a composite link.
> > 
> >   Generally a set of label stack entries that is unique across the
> >   ordered set of label numbers can safely be assumed to contain a
> >   group of flows.  The reordering of traffic can therefore be
> >   considered to be acceptable unless reordering occurs within traffic
> >   containing a common unique set of label stack entries.  Existing
> >   load splitting techniques take advantage of this property in
> >   addition to looking beyond the bottom of the label stack and
> >   determining if the payload is IPv4 or IPv6 to load balance traffic
> >   accordingly.
> > 
> >   For example a large aggregate of IP traffic may be subdivided into a
> >   large number of groups of flows using a hash on the IP source and
> >   destination addresses.  This is as described in [diffserv
> >   framework].  For MPLS traffic carrying IP, a similar hash can be
> >   performed on the set of labels in the label stack.  These techniques
> >   are both examples of means to subdivide traffic into groups of flows
> >   for the purpose of load balancing traffic across aggregated link
> >   capacity.  The means of identifying a flow should not be confused
> >   with the definition of a flow.
> > 
> >   Discussion of whether a hash based approach provides a sufficiently
> >   even load balance using any particular hashing algorithm or method
> >   of distributing traffic across a set of component links is outside
> >   of the scope of this document.
> > 
> >   The current load balancing techniques are referenced in [RFC4385]
> >   and [RFC4928].  The use of three hash based approaches are described
> >   in [RFC2991] and [RFC2992].  A mechanism to identify flows within PW
> >   is described in [draft-ietf-pwe3-fat-pw].  The use of hash based
> >   approaches is mentioned as an example of an existing set of
> >   techniques to distribute traffic over a set of component links.
> >   Other techniques are not precluded.
> > 
> > [CV] Added RFC references.
>  
> Where is the place for text on the description of service provider
> problems? I recall the wg asking for this, and the acknowledgement of
> further work could be in an Appendix (not counting against the 7 page
> quota).


I was going to put that at the end.  In my email to you I had a note
saying so and a place holder at the end that use cases would be
there.  You may have missed this and thought use cases meant something
else and not the specific service provider problems.  I think the
problems should be stated in general terms, called use cases rather
than service provider problems, and at the end.

I'm hoping that is OK with you.  Use cases are intended to add to
clarity through examples and as such belong as an appendix.  The main
body of the document should be a concise statement of technical
background and requirements.


> > Requirements:
> > 
> >   These requirements refer to link bundling solely to provide a frame
> >   of reference.  This requirements document does not intend to
> >   constrain a solution to build upon link bundling.  Meeting these
> >   requirements useing extensions to link bundling is not precluded, if
> >   doing so is determined by later IETF work to be the best solution.
> > 
>  
> As noted previously, not all requirements necessarily fit into the link
> bundling frame of reference.  


One again.  If the WG decides not to build on link bundle that
decision is not part of the phase we are in which is agreeing on what
the requirements are.



> >   0.  The IETF imposes the following requirement on new protocol work:
> > 
> >       a.  New protocols should not be invented where existing
> >           protocols can be extended to meet the same requirments.
> > 
> >       b.  Protocol extensions must retain compatibility with widely
> >           implemented and widely deployed protocols and practices to
> >           the greatest extent possible.
> > 
> > [CV] Added this as a reminder to all of us.  If anyone has a 
> > citation for either of these points, that would help.  The 
> > wording should be approximately right and the spirit of this 
> > wording consistent with IETF process.  Maybe its just the 
> > routing area that imposes these restrictions.  Help from 
> > chairs or ADs on this would be appreciated.
> > 
> >   The first few requirements listed here are met or partially met by
> >   existing link bundling behavior including common behaviour that is
> >   implemented when the all ones address (for example 0xFFFFFFFF for
> >   IPv4) is used.  This common behaviour today makes use of a hashing
> >   technique as described in the introduction, though other behaviours
> >   are not precluded.
> > 
> >   1.  Aggregated control information which summarizes multiple
> >       parallel links into a single advertisement is required to reduce
> >       information load and improve scaleability.
>  
> Suggest that the requirement be worded so that the wg can make a more
> objective decisions between candidate solution approaches. "reduce
> information load and improve scaleability" is vague - could be
> interpreted as message load due to flooding, storage, computation,
> signaling rate, etc.


Would it help if I replaced "is required" with "is desireable"?

That might squelch a lot of quibbling over whether we advertise every
component or groups of components with common attributes.

Then after we have agreement on everything else we can debate the
relative merit of this particular requirements.


> >   2.  A means to support very large LSP is needed, including LSP whose
> >       total bandwidth exceeds the size of a single component link but
>                                               than the sum of the
> >       whose traffic has no single flow greater ^  the component links.
>                                                  
> Still unclear. Is the above what you meant? 


I can see where this could be improved.  Your assumption that the
placement of an up arrow in word/outlook is somehow preserved is not
helping because it ends up a jumble.

The original is:

  2.  A means to support very large LSP is needed, including LSP whose
      total bandwidth exceeds the size of a single component link but
      whose traffic has no single flow greater the component links.

This might be a lot more clear.

  2.  A means to support very large LSP is needed.  A large LSP may
      includes LSP whose total bandwidth exceeds the size of a single
      component link but where the traffic on the LSP contains no
      single flow greater than the capacity of a component link.

Hopefully that will do it.  If not, I'll try again.


> >       In link bundling this is supported by many implementations using
> >       the all ones address component addressing and hash based
> >       techniques.
> > 
> >       Note: some implementations impose further restrictions regarding
> >       the distribution of traffic across the set of identifiers used
> >       in flow identification.  Discussion of algorithms and
> >       limitations of existing implementations is out of scope for this
> >       requirements document.
>  
> Is the Note necessary? 


Strictly no.  In a few places we are acknowledging that while we
strive to retain compatibility with existing widely deployed equipment
we are not exactly duplicating that behaviour.

Maybe the prior statement should be added up front and notes like this
removed.

I had already added:

      b.  Protocol extensions must retain compatibility with widely
          implemented and widely deployed protocols and practices to
          the greatest extent possible.

If there are no objections, I would like to add the following:

      c.  Widely implemented and widely deployed protocols and
          practices should be accomodated with no loss of the prior
          functionality to the greatest extent possible.

      d.  While compatibility with prior deployed protocols and
          practices with little or no loss of prior functionality is a
          goal, exactly duplicating the prior behaviour is not a goal.

This could allow eliminating a number of the notes.


> >   The remaining requirements are not met by existing link bundling.
> > 
> >   3.  In some more than one set of metrics is needed to accommodate a
> >       mix of capacity with different characteristics, particularly a
> >       bundle where a subset of component links have shorter delay.
>  
> These characteristics need to be aritculated, and before they are
> mentioned (not only in item 4, but also before item 3 where they are
> first mentioned. Something like the following could be added to the
> definition (possibly mentioned as extensions to 4201)
>  
> The component links in a composite link may have different
> characteristics, including at least: capacity, current latency,
> indication of whether latency can change, and possibly others.


What I wrote is not quite a complete sentence:

  3.  In some cases more than one set of metrics is needed to
      accommodate a mix of capacity with different characteristics,
      particularly a bundle where a subset of component links have
      shorter delay.

Capacity is aleady covered in the reservable bandwidth and LSP
bandwidth and so the tuple for a group of components would contain
these plus the delay parameters.

I did mention that I didn't want to put stddev, norms, percentiles.
If we go for more that a single delay value, maybe a range would be
considered, but the number that matters is the high end of the range
which would be a single value again.

Latency for preferred services is not likely to change substantially
unless protection is in use.  It does seem to be overkill to put in
the probability that the protect path will be used in the future.  Do
we need protect delay and working path MTBF and MTTR in addition to
working path delay?  I do think that would be overkill.


> >   4.  A mechansism is needed to signal an LSP such that a component
> >       link with specific characteristics are chosen, if a preference
> >       exists.  For example, the shortest delay may be required for
> >       some LSP, but not required for others.
>  
> Examples are network operator stories that are told. :) I think the
> wording of this example has a particular form of solution in mind, and
> what we need to identify is the underlying requirement. Possibly that is
> a better organization for the document than only an ordering of
> requirements that build on each other, as you clarified. 


Can you please indicate what type solution this statement precludes?

As far as I can tell this is as general a requirement as we can make
it.

If it bothers you (and Tony) I can change this to:

  4.  A mechansism is needed to signal an LSP such that a component
      link with specific characteristics is chosen or optionally a
      specific component link is chosen, if a such preference exists.
      For example, the shortest delay may be required for some LSP,
      but not required for others.

The added phrase is "or optionally a specific component link is
chosen".  There is a little wording clean up (awkward grammar).


> >   5.  LSP signaling is needed to indicate a preference for placement
> >       on a single component link and to specifically forbid spreading
> >       that LSP over multiple component links based on flow
> >       identification beyond the outermost label entry.
> > 
> > [CV] Awaiting concensus.  What I had in mind was two choices, 
> > outer only or all.  Do others think we need to specify 
> > looking some fixed depth into the stack for the hash for a 
> > given LSP?  If so we need to discuss possible forwarding 
> > speed consequences (hash and lookup can't be done in parallel 
> > with hash disposed of if not needed).
>  
> It seems to me that knowing the potential depth pf MPLS/packet header
> inspection would need to be known to the sender in some circumstances to
> meeting this requirement.


If useing using a control plane where the ingress is doing the
signaling, the ingress needs to indicate if the LSP can be split up.

Moving this (item #5) to just before the current #8a and then shifting
the item numbering would improve clarity.


> >   6.  A means to support non-disruptive reallocation of an existing
> >       LSP to another component link is needed.
>  
> In the storytelling section, I recommend that we describe that the LSP
> is actually being moved, which could cause reordering or increased
> jitter (which does cause disruption). That is WHY the change frequency
> of item 8 is specified. Non-disruptive will not always be possible, I
> recall in the meeting the term "minimally disruptive" being used, which
> I believe is a more accurate description of the goal. 


That wording about being minimally disruptive but not totally
non-disruptive could go right here.

  6.  A means to support minimally-disruptive reallocation of an
      existing LSP to another component link is needed.  Some
      disruption due to reordering is unavoidable.

      Control is needed of the frequency of change and/or the
      conditions under which change is permitted.  These parameters
      are considered to be included in the advertised link
      characteristics and the LSP desired characteristics described in
      the TE-LSDB and LSP requirement that follow.


> >   7.  A means to populate the TE-LSDB with information regarding which
> >       links (per end) can support distribution of large LSP across
> >       multiple component links based on the component flows and the
> >       characteristics of this capability.  Key characteristics are:
> > 
> >     a.  The largest single flow that can be supported.  This may
> >         or may not be related to the size of component links.
>  
> It seems this is also related to whether the LSP can have the behavior
> described in item 2.


Yes it could be.  But other implementations are not precluded.


> >     b.  Characteristics of the flow identification method.  [These
> >         can be enumberated in this document or a later document. ]
> > 
> >     c.  Characteristics of the flow adjustment method.  [These
> >             can be enumberated in this document or a later document. ]
>  
> "flow adjustment" should be added to the definition section, so that the
> document is clear on use of this term. 


Good catch.  Rewording would avoid adding a term.

        c.  Characteristics of the method of assignment of flows to
            component links.  [These can be enumberated in this
            document or a later document. ]

If there are no strong objections I'm going to drop the "These can be
enumberated" notes.  That would be too close to implementation.


> >   8.  Some means is needed to specify desired characteristics of flow
> >       distribution for an LSP, regardless of whether the LSP is set up
> >       using RSVP-TE, LDP, or management plane.  Behaviour for IP must
> >       be configured using the management plane.  These characteristics
> >       include:
> > 
> > [CV] Reworded above paragraph to indicate that LDP and static 
> > LSP and IP are not omitted from this definition.  Note that 
> > LDP can give guidance but does not support TE so it can't be 
> > rejected and go elsewhere if the guidance can't be followed.
> > 
> >     a.  The largest flow expected.
> > 
> >         b.  Characteristics of load adjustment.  For example, a
> >             maximum change frequency might be specified.  [These can
> >             be enumberated in this document or a later document. ]
>  
> Is "load adjustment" something different from "flow adjustment"


Not really.  Just sloppy wording on my part.  Again I need to reword.

        b.  The desired characteristics of the method of assignment of
            flows to component links.  For example, a maximum change
            frequency might be specified.  These desired
            characteristics must match characteristics offerred by the
            CL as advertise in the TE-LSDB (if a control plane is
            used).  [These can be enumberated in this document or a
            later document. ]

Again.  I'd like to drop the "These can be enumberated" note.


> >   9.  In some cases it may be useful to measure link parameters
> >       and reflect these in metrics.  Link delay is an example.
>  
> I believe this is stronger than the optional "may." A protocol
> (extension or new) is definitely needed to report latency from a lower
> (server) layer up to a higher (client) layer network. An ordering of
> importance needs to be in the document


Simple enough to change the emphaisis.

  9.  In some cases it is necessary to measure link parameters
      and reflect these in metrics.  Link delay is an example.

I removed "it may be useful" and replaced with "it is necessary".


> >  10.  Some uses require an ability to bound the sum of delay metrics
> >       along a path while otherwise taking the shorted path related to
> >       another metric.  Algorithms for accomplishing this are applied
> >       at an ingress, PCE, or in the management system and are out of
> >       scope.
> > 
> > [CV] Limited scope above.
>  
> OK, you are proposing another scope change. As I commented before, this
> is an important network operator problem. The form of routing and/or
> signaling extensions used to meet other requirements may be able to be
> used to (help) meet this objective. I believe that declaring it out of
> scope is premature. 


The requirement is that some method be supported on some given
equipement.  The algorithm that is used by the ingress or PCE or
management system is not an interoperability issue and therefore is
not a matter that need to be or should be standardized.  This is what
we alway call an opportunity for innovations.


> >  11.  Impact of load balancing on OAM and mitigation techniques
> >       applicable to OAM must be documented.
> > 
> >  12.  Load balancing techniques must not oscillate.
> > 
> > [CV] Added above two based on Dave's comments.
> > 
> > [CV] Took suggestion to move use scenarios to framework.


As mentioned above, we are moving the use cases aka scenarios aka
service provider problems aka story telling back to this point unless
there are objections to putting it here.

Curtis
_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

<Prev in Thread] Current Thread [Next in Thread>