<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"[
  <!ENTITY rfc2119 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY rfc5681 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5681.xml">
  <!ENTITY rfc8312 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8312.xml">
  <!ENTITY rfc9002 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9002.xml">
  <!ENTITY rfc9260 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9260.xml">
  <!ENTITY rfc8174 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
  <!ENTITY rfc1191 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.1191.xml">
  <!ENTITY rfc1122 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.1122.xml">
  <!ENTITY rfc4821 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4821.xml">
]>
<?rfc toc='yes' ?>
<?rfc symrefs='yes' ?>
<?rfc sortrefs='yes'?>
<?rfc compact='yes'?>
<?rfc comments="yes"?>
<?rfc inline="yes" ?>
<!-- <?rfc-ext parse-xml-in-artwork='yes' ?> -->
<!-- <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> -->

<rfc docName="draft-ietf-tcpm-hystartplusplus-13" category="std" ipr="trust200902">
  <front>
    <title abbrev='HyStart++'>HyStart++: Modified&nbsp;Slow&nbsp;Start&nbsp;for&nbsp;TCP</title>
    <author initials='P.' surname='Balasubramanian' fullname='Praveen Balasubramanian'>
      <organization>Confluent</organization>
      <address>
        <postal>
          <street>899 West Evelyn Ave</street>
          <city>Mountain View</city>
          <region>CA</region>
          <code>94041</code>
          <country>USA</country>
        </postal>
        <email>pravb.ietf@gmail.com</email>
      </address>
    </author>
    <author initials='Y.' surname='Huang' fullname='Yi Huang'>
      <organization>Microsoft</organization>
      <address>
        <postal>
          <street>One Microsoft Way</street>
          <city>Redmond</city>
          <region>WA</region>
          <code>94052</code>
          <country>USA</country>
        </postal>         
        <phone>+1 425 703 0447</phone>
        <email>huanyi@microsoft.com</email>
      </address>
    </author>
    <author initials='M.' surname='Olson' fullname='Matt Olson'>
      <organization>Microsoft</organization>
      <address>
        <phone>+1 425 538 8598</phone>
        <email>maolson@microsoft.com</email>
      </address>
    </author>
    <date/>
    <area>Transport</area>
    <keyword>TCP</keyword>
    <keyword>congestion control</keyword>
    <abstract>
      <t> This document describes HyStart++, a simple modification to
      the slow start phase of congestion control algorithms.
      Traditional slow start can overshoot the ideal send rate
      in many cases, causing high packet loss and poor performance. 
      HyStart++ uses a delay increase heuristic to find an exit
      point before possible overshoot.
      It also adds a mitigation to prevent jitter from causing
      premature slow start exit. 
      </t>
    </abstract>
  </front>

  <middle>
    <section title='Introduction'>
      <t> <xref target="RFC5681"/> describes the slow start
      congestion control algorithm for TCP. The slow start
      algorithm is used when the congestion window (cwnd)
      is less than the slow start threshold (ssthresh). 
      During slow start, in absence of packet loss signals,
      TCP increases cwnd exponentially to probe the network capacity.
      This fast growth can overshoot the ideal sending rate
      and cause significant packet loss which cannot always
      be recovered efficiently.
      </t>
      <t> HyStart++ uses delay increase as a signal to exit
      slow start before potential packet loss occurs as a result
      of overshoot. This is one of two algorithms specified in
      <xref target="HyStart"/>. 
      After the slow start exit, a novel
      Conservative Slow Start (CSS) phase is used to determine
       whether the slow start exit was premature and to resume
       slow start. This mitigation improves performance in
       presence of jitter.
      HyStart++ reduces packet loss and retransmissions, and
      improves goodput in lab measurements and real world
      deployments.
      </t>
      <t>  While this document describes Hystart++ for TCP, it can
      also be used for other transport protocols which use slow start
      such as QUIC <xref target="RFC9002"/>
      or SCTP <xref target="RFC9260"/>. 
      </t>
   </section>

    <section title="Terminology" anchor="term">
      <t> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
      "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",
      "NOT RECOMMENDED", "MAY",
      and "OPTIONAL" in this document are to be interpreted
      as described in BCP 14
       <xref target="RFC2119"/> <xref target="RFC8174"/> when,
      and only when, they appear in all capitals, as shown here.
      </t>
    </section>

    <section title='Definitions'>
      <t> We repeat here some definition from
      <xref target="RFC5681"/> to aid the reader. 
      </t>
      <t> SENDER MAXIMUM SEGMENT SIZE (SMSS):
      The SMSS is the size of the
      largest segment that the sender can transmit. This value can be
      based on the maximum transmission unit of the network, the path
      MTU discovery <xref target="RFC1191"/>,
      <xref target="RFC4821"/> algorithm, RMSS (see next item),
      or other factors. The size does not include the TCP/IP headers
      and options.
      </t>
      <t> RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the
      size of the largest segment the receiver is willing to accept.
      This is the value specified in the MSS option sent by the
      receiver during connection startup.  Or, if the MSS option
      is not used, it is 536 bytes <xref target="RFC1122"/>.
      The size does not include the TCP/IP headers and
      options.
      </t>
      <t> RECEIVER WINDOW (rwnd): The most recently advertised
      receiver window.
      </t>
      <t> CONGESTION WINDOW (cwnd): A TCP state variable that
      limits the amount of data a TCP can send.
      At any given time, a TCP MUST NOT send
      data with a sequence number higher than the sum of the highest
      acknowledged sequence number and the minimum of cwnd and rwnd.
      </t>
    </section> 

    <section title='HyStart++ Algorithm'>

      <section title='Summary'>
        <t> <xref target="HyStart"/> specifies two algorithms
        (a "Delay Increase" algorithm and an "Inter-Packet Arrival"
        algorithm) to be run in parallel to detect that the sending
        rate has reached capacity. In practice, the Inter-Packet
        Arrival algorithm does not perform well and is not able
        to detect congestion early, primarily due to ACK compression.
        The idea of the Delay Increase algorithm is to look for
        spikes in RTT (round-trip time), which suggest that the
        bottleneck buffer is filling up. 
        </t>
        <t> In HyStart++, a TCP sender uses traditional slow start
        and then uses the "Delay Increase" algorithm to trigger an
        exit from slow start. But instead of going straight from
        slow start to congestion avoidance, the sender spends a
        number of RTTs in a Conservative Slow Start (CSS) phase
        to determine whether the exit from slow start was premature.
        During CSS, the congestion window is grown exponentially like
        in regular slow start, but with a smaller exponential base,
        resulting in less aggressive growth.
        If the RTT reduces during CSS, it's concluded that the RTT
        spike was not related to congestion caused by the connection
        sending at a rate greater than the ideal send rate, and the
        connection resumes slow start. If the RTT inflation
        persists throughout CSS, the connection enters congestion
        avoidance.
        </t>
      </section>

      <section title='Algorithm Details'>

        <t> The following pseudocode uses a limit, L, to control the
        aggressiveness of the cwnd increase during both standard slow
        start and CSS. While an arriving ACK may newly acknowledge an
        arbitrary number of bytes, the Hystart++ algorithm limits the
        number of those bytes applied to increase the cwnd to L*SMSS bytes. </t>
     
        <t> lastRoundMinRTT and currentRoundMinRTT are initialized
        to infinity at the initialization time. currRTT is the RTT
        sampled from the latest incoming ACK and initialized to
        infinity. </t>

        <t>Hystart++ measures rounds using sequence numbers, as
        follows:
        Define windowEnd as a sequence number initialized to SND.NXT.
        When windowEnd is ACKed, the current round ends and windowEnd
        is set to SND.NXT.</t>

        <t> At the start of each round during standard slow start
        (<xref target="RFC5681"/>) and CSS: 
            <sourcecode type="pseudocode">
lastRoundMinRTT = currentRoundMinRTT
currentRoundMinRTT = infinity
rttSampleCount = 0
            </sourcecode> 
        </t>        

        <t> For each arriving ACK in slow start, where N is the
        number of previously unacknowledged bytes acknowledged
        in the arriving ACK: </t>
<t>Update the cwnd: </t>
<sourcecode type="pseudocode">
  cwnd = cwnd + min (N, L * SMSS)
</sourcecode>
<t> Keep track of minimum observed RTT: </t>
<sourcecode type="pseudocode">
  currentRoundMinRTT = min(currentRoundMinRTT, currRTT)
  rttSampleCount += 1
</sourcecode>
<t>For rounds where at least N_RTT_SAMPLE RTT samples have been
obtained and currentRoundMinRTT and lastRoundMinRTT are valid,
check if delay increase triggers slow start exit:</t>
<sourcecode type="pseudocode"> 
if (rttSampleCount &gt;= N_RTT_SAMPLE AND
    currentRoundMinRTT != infinity AND
    lastRoundMinRTT != infinity)
  RttThresh = clamp(MIN_RTT_THRESH,
                    lastRoundMinRTT / 8,
                    MAX_RTT_THRESH)
  if (currentRoundMinRTT &gt;= (lastRoundMinRTT + RttThresh))
    cssBaselineMinRtt = currentRoundMinRTT
    exit slow start and enter CSS
</sourcecode>
        <t> For each arriving ACK in CSS, where N is the number
        of previously unacknowledged bytes acknowledged in
        the arriving ACK:</t>
            <t> Update the cwnd: </t>
              <sourcecode type="pseudocode">
cwnd = cwnd + (min (N, L * SMSS) / CSS_GROWTH_DIVISOR)
              </sourcecode>
            <t> Keep track of minimum observed RTT: </t>
                <sourcecode type="pseudocode">
currentRoundMinRTT = min(currentRoundMinRTT, currRTT)
rttSampleCount += 1
                </sourcecode>
            <t> For CSS rounds where at least N_RTT_SAMPLE RTT
            samples have been obtained, check if current round's
            minRTT drops below baseline indicating that HyStart
            exit was spurious:
            </t>
                <sourcecode type="pseudocode">
if (currentRoundMinRTT &lt; cssBaselineMinRtt)
  cssBaselineMinRtt = infinity
  resume slow start including HyStart++
                </sourcecode>

        <t> CSS lasts at most CSS_ROUNDS rounds. If the transition
        into CSS happens in the middle of a round, that partial
        round counts towards the limit. </t>
        <t> If CSS_ROUNDS rounds are complete,
        enter congestion avoidance.  </t>
        <sourcecode type="pseudocode">
ssthresh = cwnd
        </sourcecode> 

        <t> If loss or ECN-marking is observed anytime during
        standard slow start or CSS, enter congestion avoidance. </t>
<sourcecode type="pseudocode">
ssthresh = cwnd
</sourcecode>      

      </section>

      <section title='Tuning constants and other considerations'>
      <t> It is RECOMMENDED that a HyStart++ implementation use
      the following constants:
      <sourcecode type="pseudocode">
MIN_RTT_THRESH = 4 msec
MAX_RTT_THRESH = 16 msec
N_RTT_SAMPLE = 8
CSS_GROWTH_DIVISOR = 4
CSS_ROUNDS = 5
L = infinity if paced, L = 8 if non-paced
      </sourcecode>
      </t>
      <t> These constants have been determined with lab measurements
      and real world deployments. An implementation MAY tune them for
      different network characteristics. 
      </t>
      <t> The delay increase sensitivity is determined
      by MIN_RTT_THRESH and MAX_RTT_THRESH. Smaller values of
      MIN_RTT_THRESH may cause spurious exits from slow start. Larger
      values of MAX_RTT_THRESH may result in slow start not exiting
      until loss is encountered for connections on large RTT paths.
      </t>
      <t> A TCP implementation is REQUIRED to take at least one RTT
      sample each round. Using lower values of N_RTT_SAMPLE will
      lower the accuracy of the measured RTT for the round;
      higher values will improve accuracy at the cost of more
      processing. 
      </t>  
      <t> The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2.
      A value of 1 results in the same aggressive behavior as regular
      slow start. Values larger than 4 
      will cause the algorithm to be less aggressive and maybe less
      performant.
      </t>
      <t> Smaller values of CSS_ROUNDS may miss detecting jitter
      and larger values may limit performance.
      </t>
      <t> A paced TCP implementation SHOULD use L = infinity. Burst 
      concerns are mitigated by pacing and this setting allows for optimal 
      cwnd growth on modern networks. 
      </t>
      <t> For TCP implementations that pace to mitigate burst concerns, L 
      values smaller than INFINITY may suffer performance problems due to slow 
      cwnd growth in high speed networks. For non-paced TCP implementations, L values 
      smaller than 8 may suffer performance problems due to slow cwnd growth in high 
      speed networks; L values larger than 8 may cause an increase in burstiness 
      and thereby loss rates, and result in poor performance.
      </t>  
      <t> An implementation SHOULD use HyStart++ only for the
      initial slow start (when ssthresh is at its initial value
      of arbitrarily high per <xref target="RFC5681"/>) and fall
      back to using traditional slow start for the remainder of
      the connection lifetime. This is acceptable because subsequent
      slow starts will use the discovered ssthresh value to exit slow
      start and avoid the overshoot problem. An implementation MAY
      use HyStart++ to grow the restart window
      (<xref target="RFC5681"/>) after a long idle period. 
      </t>
      <t>
      In application limited scenarios, the amount of data in
      flight could fall below the BDP and result in smaller RTT
      samples which can trigger an exit back to slow start.
      It is expected that a connection might oscillate between
      CSS and slow start in such scenarios. But this behavior
      will neither result in a connection prematurely entering
      congestion avoidance nor cause overshooting compared to
      slow start.
      </t>
      </section>

    </section>

    <section title='Deployments and Performance Evaluations'>

        <t> As of the time of writing, HyStart++ as described
        in this document has been default enabled for all TCP
        connections in the Windows operating system for over
        two years with pacing disabled and an actual L = 8.
        </t>
        <t> In lab measurements with Windows TCP, HyStart++ shows
        both goodput improvements as well as reductions in packet
        loss and retransmissions compared to traditional slow start.
        For example, across a variety of tests on a 100 Mbps link
        with a bottleneck buffer size of bandwidth-delay product,
        HyStart++ reduces bytes retransmitted by 50% and
        retransmission timeouts (RTOs) by 36%.
        </t>
        <t> In an A/B test where we compare HyStart++ draft 01 to
        traditional slow start across a large Windows device
        population, out of 52 billion TCP connections, 0.7% of
        connections move from 1 RTO to 0 RTOs and another 0.7%
        connections move from 2 RTOs to 1 RTO with HyStart++.
        This test did not focus on send heavy connections and
        the impact on send heavy connections is likely much
        higher. We plan to conduct more such production
        experiments to gather more data in the future. 
        </t>

    </section>

    <section title='Security Considerations'>
      <t> HyStart++ enhances slow start and inherits the general
      security considerations discussed in <xref target="RFC5681"/>.
      </t>

      <t>An attacker can cause Hystart++ to exit slow start prematurely
      and impair the performance of a TCP connection by, for example,
      dropping data packets or their acknowledgments.</t>

      <t>The ACK division attack outlined in <xref target="SCWA99"/> does not affect
      Hystart++ because the congestion window increase in Hystart++ is based
      on the number of bytes newly acknowledged in each arriving ACK rather than by
      a particular constant on each arriving ACK.
      </t>
    </section>

    <section title='IANA Considerations'>
      <t> This document has no actions for IANA.
      </t>
    </section>
    
    <section title='Acknowledgements'>
      <t> During the discussions of this work on the TCPM mailing list, in working group meetings, 
        helpful comments, critiques, and reviews were received from (listed alphabetically by last name): 
        Mark Allman, Bob Briscoe, Neal Cardwell, Yuchung Cheng, Junho Choi, Martin Duke, Reese Enghardt, 
        Christian Huitema, Ilpo Järvinen, Yoshifumi Nishida, Randall Stewart, and Michael Tuexen.        
      </t>
    </section>  

  </middle>

  <back>
    <references title='Normative References'>
      &rfc2119;
      &rfc5681;     
    </references>

    <references title='Informative References'>
      <reference anchor='HyStart' target='https://doi.org/10.1016/j.comnet.2011.01.014'>
        <front>
          <title>Taming the elephants: New TCP slow start</title>
          <author initials="S." surname="Ha">
          </author>
          <author initials="I." surname="Ree">
          </author>
          <date year="2011"/>
        </front>
        <seriesInfo name="" value="Computer Networks vol. 55, no. 9, pp. 2092-2110"/>
        <seriesInfo name="DOI" value="10.1016/j.comnet.2011.01.014"/>
      </reference>
      &rfc9002;
      &rfc9260;
      &rfc8174;
      &rfc1191;
      &rfc4821;
      &rfc1122;
      <reference anchor='SCWA99' target='https://doi.org/10.1145/505696.505704'>
        <front>
          <title>TCP congestion control with a misbehaving receiver</title>
          <author initials="S." surname="Savage">
          </author>
          <author initials="N." surname="Cardwell">
          </author>
          <author initials="D." surname="Wetherall">
          </author>
          <author initials="T." surname="Anderson">
          </author>
          <date year="1999"/>
        </front>
        <seriesInfo name="" value="ACM Computer Communication Review, 29(5)"/>
        <seriesInfo name="DOI" value="10.1145/505696.505704"/>
      </reference>
    </references>
  </back>
</rfc>
