<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc strict='yes'?>
<?rfc iprnotified='no'?>
<rfc category="std" docName="draft-templin-intarea-parcels-16"
     ipr="trust200902" updates="RFC2675">
  <front>
    <title abbrev="IP Parcels">IP Parcels</title>

    <author fullname="Fred L. Templin" initials="F. L." role="editor"
            surname="Templin">
      <organization>Boeing Research &amp; Technology</organization>

      <address>
        <postal>
          <street>P.O. Box 3707</street>

          <city>Seattle</city>

          <region>WA</region>

          <code>98124</code>

          <country>USA</country>
        </postal>

        <email>fltemplin@acm.org</email>
      </address>
    </author>

    <date day="6" month="October" year="2022"/>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <abstract>
      <t>IP packets (both IPv4 and IPv6) contain a single unit of upper layer
      protocol data which becomes the retransmission unit in case of loss.
      Upper layer protocols including the Transmission Control Protocol (TCP)
      and transports over the User Datagram Protocol (UDP) prepare data units
      known as "segments", with traditional arrangements including a single
      segment per IP packet. This document presents a new construct known as
      the "IP Parcel" which permits a single packet to carry multiple upper
      layer protocol segments, essentially creating a "packet-of-packets". IP
      parcels provide an essential building block for improved performance and
      efficiency by supporting larger Maximum Transmission Units (MTUs) in the
      Internet as discussed in this document.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>IP packets (both IPv4 <xref target="RFC0791"/> and IPv6 <xref
      target="RFC8200"/>) contain a single unit of upper layer protocol data
      which becomes the retransmission unit in case of loss. Upper layer
      protocols such as the Transmission Control Protocol (TCP) <xref
      target="RFC0793"/> and transports over the User Datagram Protocol (UDP)
      <xref target="RFC0768"/> (including QUIC <xref target="RFC9000"/>, LTP
      <xref target="RFC5326"/> and others) prepare data units known as
      "segments", with traditional arrangements including a single segment per
      IP packet. This document presents a new construct known as the "IP
      Parcel" which permits a single packet to carry multiple upper layer
      protocol segments. This essentially creates a "packet-of-packets" with
      the IP layer and full {TCP,UDP} headers appearing only once but with
      possibly multiple segments included.</t>

      <t>Parcels are formed when an upper layer protocol entity identified by
      the "5-tuple" (source address, destination address, source port,
      destination port, protocol number) prepares a data buffer beginning with
      a checksum header block followed by the concatenation of up to 64
      properly-formed segments that can be broken out into smaller sub-parcels
      and/or individual packets using a copy of the {TCP,UDP}/IP headers if
      necessary. All segments except the final one must be equal in length and
      no larger than 65535 octets (minus headers), while the final segment
      must be no larger than the others but may be smaller. The upper layer
      protocol entity then delivers the buffer, number of segments and
      non-final segment size to lower layers which append a {TCP,UDP} header
      and an IP header plus extensions that identify this as a parcel and not
      an ordinary packet.</t>

      <t>Parcels can be forwarded over consecutive parcel-capable links in
      the path until arriving at a router with a next hop link that does not
      support parcels or an ingress middlebox OMNI interface <xref
      target="I-D.templin-6man-omni"/> that spans intermediate Internetworks
      using adaptation layer encapsulation and fragmentation. In the former
      case, the router transforms the parcel into individual IP packets or
      smaller sub-parcels then forwards each via the next hop link. In the 
      latter case, the OMNI interface breaks the parcel out into smaller
      sub-parcels if necessary then encapsulates each (sub-)parcel in headers
      suitable for traversing the Internetworks while applying fragmentation
      if necessary.</t>

      <t>These sub-parcels may then be recombined into one or more larger
      parcels by an egress middlebox OMNI interface which either delivers them
      locally or forwards them over additional parcel-capable links on the
      path to the final destination. Reordering and even loss or damage of
      individual segments in the network is therefore possible, but
      what matters is that the number of parcels delivered to the final
      destination should be kept to a minimum for the sake of efficiency and
      that the loss or receipt of individual segments (and not parcel size)
      determines the retransmission unit.</t>

      <t>The following sections discuss rationale for creating and shipping
      parcels as well as the actual protocol constructs and procedures
      involved. IP parcels provide an essential building block for improved
      performance and efficiency while supporting larger Maximum Transmission
      Units (MTUs) in the Internet. It is further expected that the parcel
      concept will drive future innovation in applications, operating systems,
      network equipment and data links.</t>
    </section>

    <section anchor="terms" title="Terminology">
      <t>The Oxford Languages dictionary defines a "parcel" as "a thing or
      collection of things wrapped in paper in order to be carried or sent by
      mail". Indeed, there are many examples of parcel delivery services
      worldwide that provide an essential transit backbone for efficient
      business and consumer transactions.</t>

      <t>In this same spirit, an "IP parcel" is simply a collection of up to
      64 upper layer protocol segments wrapped in an efficient package for
      transmission and delivery (i.e., a "packet-of-packets") while a
      "singleton IP parcel" is simply a parcel that contains a single segment.
      IP parcels are distinguished from ordinary packets through the special
      header constructions discussed in this document.</t>

      <t>The IP parcel construct is defined for both IPv4 and IPv6. Where the
      document refers to "IPv4 header length", it means the total length of
      the base IPv4 header plus all included options, i.e., as determined by
      consulting the Internet Header Length (IHL) field. Where the document
      refers to "IPv6 header length", however, it means only the length of the
      base IPv6 header (i.e., 40 octets), while the length of any extension
      headers is referred to separately as the "IPv6 extension header length".
      Finally, the term "IP header plus extensions" refers generically to an
      IPv4 header plus all included options or an IPv6 header plus all
      included extension headers.</t>

      <t>Where the document refers to "upper layer header length", it means
      the length of either the UDP header (8 octets) or the TCP header plus
      options (20 octets or more). It is important to note that only a single
      IP header and a single full {TCP,UDP} header appears in each parcel
      regardless of the number of segments included. This distinction often
      provides a significant savings in overhead made possible only by IP
      parcels.</t>

      <t>Where the document refers to checksum calculations, it means the
      standard Internet checksum unless otherwise specified. The same as for
      TCP <xref target="RFC0793"/>, UDP <xref target="RFC0768"/> and IPv4
      <xref target="RFC0791"/>, the standard Internet checksum is defined as
      (sic) "the 16-bit one's complement of the one's complement sum of all
      (pseudo-)headers plus data, padded with zero octets at the end (if
      necessary) to make a multiple of two octets". A notional Internet
      checksum algorithm can be found in <xref target="RFC1071"/>, with
      the understanding that practical implementations require special
      attention to byte ordering "endianness" to ensure interoperability
      between diverse architectures.</t>

      <t>Where the document refers to "parcel path MTU", it means the
      maximum-sized IP parcel that can traverse the forward path to the
      destination as determined through parcel path qualification (see:
      <xref target="probe"/>). Note that this size may be larger than
      the maximum-sized singleton IP packet that can traverse the same
      path, since intermediate nodes can break oversized IP parcels into
      smaller sub-parcels but cannot do so for singleton IP packets.</t>

      <t>Finally, the term "parcel-capable link" refers to any data link
      medium (physical or virtual) capable of transiting a {TCP,UDP}/IP
      packet that employs the parcel-specific constructions specified in
      this document. The link MUST be capable of forwarding parcels with
      at least one segment of maximum size, therefore each parcel-capable
      link MUST configure an MTU of at least 64KB and SHOULD configure a
      larger MTU. Currently, only the OMNI link satisfies these properties.</t>

      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      <xref target="RFC2119"/><xref target="RFC8174"/> when, and only when,
      they appear in all capitals, as shown here.</t>
    </section>

    <section anchor="aero-omni" title="Background and Motivation">
      <t>Studies have shown that applications can improve their performance by
      sending and receiving larger packets due to reduced numbers of system
      calls and interrupts as well as larger atomic data copies between kernel
      and user space. Larger packets also result in reduced numbers of network
      device interrupts and better network utilization (e.g., due to header
      overhead reduction) in comparison with smaller packets.</t>

      <t>A first study <xref target="QUIC"/> involved performance enhancement
      of the QUIC protocol <xref target="RFC9000"/> using the linux Generic
      Segment/Receive Offload (GSO/GRO) facility. GSO/GRO provides a robust
      (but non-standard) service very similar in nature to the IP parcel
      service described here, and its application has shown significant
      performance increases due to the increased transfer unit size between
      the operating system kernel and QUIC application. Unlike IP parcels,
      however, GSO/GRO perform fragmentation and reassembly at the transport
      layer with the transport segment size limited by the path MTU.</t>

      <t>A second study <xref target="I-D.templin-dtn-ltpfrag"/> showed that
      GSO/GRO also improves performance for the Licklider Transmission
      Protocol (LTP) <xref target="RFC5326"/> used for the Delay Tolerant
      Networking (DTN) Bundle Protocol <xref target="RFC9171"/> for segments
      larger than the actual path MTU through the use of fragmentation.
      Historically, the NFS protocol also saw
      significant performance increases using larger (single-segment) UDP
      datagrams even when IP fragmentation is invoked, and LTP still follows
      this profile today. Moreover, LTP shows this (single-segment)
      performance increase profile extending to the largest possible segment
      size which suggests that additional performance gains are possible using
      (multi-segment) IP parcels that approach or even exceed 65535 octets.</t>

      <t>TCP also benefits from larger packet sizes and efforts have
      investigated TCP performance using jumbograms internally with changes to
      the linux GSO/GRO facilities <xref target="BIG-TCP"/>. The idea is to
      use the jumbo payload option internally and to allow GSO/GRO to use
      buffer sizes larger than 65535 octets, but with the understanding that
      links that support jumbos natively are not yet widely available. Hence,
      IP parcels provides a packaging that can be considered in the near term
      under current deployment limitations.</t>

      <t>A limiting consideration for sending large packets is that they are
      often lost at links with smaller MTUs, and
      the resulting Packet Too Big (PTB) message may be lost somewhere in the
      path back to the original source. This "Path MTU black hole" condition
      can degrade performance unless robust path probing techniques are used,
      however the best case performance always occurs when no packets are lost
      due to size restrictions.</t>

      <t>These considerations therefore motivate a design where transport
      protocols should employ a maximum segment size no larger than 65535
      octets (minus headers), while parcels that carry multiple segments may
      themselves be significantly larger. Then, even if the network needs to
      sub-divide the parcels into smaller sub-parcels to forward further
      toward the final destination, an important performance optimization for
      the original source, final destination and network path as a whole can
      be realized.</t>

      <t>An analogy: when a consumer orders 50 small items from a major online
      retailer, the retailer does not ship the order in 50 separate small
      boxes. Instead, the retailer packs as many of the small items as
      possible into one or a few larger boxes (i.e., parcels) then places the
      parcels on a semi-truck or airplane. The parcels may then pass through
      one or more regional distribution centers where they may be repackaged
      into different parcel configurations and forwarded further until they
      are finally delivered to the consumer. But most often, the consumer will
      only find one or a few parcels at their doorstep and not 50 separate
      small boxes. This flexible parcel delivery service greatly reduces
      shipping and handling cost for all including the retailer, regional
      distribution centers and finally the consumer.</t>
    </section>

    <section anchor="parcels" title="IP Parcel Formation">
      <t>IP parcels are formed by an upper layer protocol entity (identified
      by the 5-tuple described above) when it prepares a data buffer
      containing the concatenation of a block of up to 64 Checksums followed
      by their corresponding upper layer protocol segments (with each TCP
      non-first segment preceded by a 4-octet Sequence Number). All non-final
      segments MUST be equal in length while the final segment MUST NOT be
      larger and MAY be smaller. Each non-final segment MUST NOT be larger
      than 65535 octets minus the length of the UDP header or TCP header and
      its options, minus the length of the IPv4/IPv6 header and its
      options/extensions minus 2 octets for the per-segment Checksum. (Note
      that this also satisfies the case of ingress middlebox OMNI interfaces
      in the path that would regard the headers as upper layer protocol
      payload during IPv6 encapsulation/fragmentation.)</t>

      <t>The upper layer protocol entity then presents the buffer and
      non-final segment size to lower layers, noting that the buffer may be
      larger than 65535 octets if it includes sufficient segments of a large
      enough size to exceed that value. If the buffer plus headers would
      together be no larger than the parcel path MTU, the lower
      layers then append a single full {TCP,UDP} header (plus options) followed
      by a single IPv4/IPv6 header plus options/extensions. If the buffer would
      cause a single parcel to exceed the parcel path MTU, lower layers instead
      break the buffer up into multiple smaller buffers (each with an integral
      number of segments) and append separate {TCP,UDP}/IP headers for each.</t>

      <t>The IP layer then presents each parcel to a network interface attachment
      to either an ordinary parcel-capable link or an OMNI link that performs
      adaptation layer encapsulation and fragmentation (see: <xref target="xmit"/>).
      The IP layer includes a Jumbo Payload option in the IP header formed as shown
      in <xref target="jumbo-fmt"/>:<figure anchor="jumbo-fmt"
              title="Jumbo Payload Option Format">
          <artwork><![CDATA[                                   |<------- Option Header ------->|
                                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                   |  Option Type  |  Opt Data Len |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Nsegs     |             Jumbo Payload Length              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |<------------------------ Option Data ------------------------>|]]></artwork>
        </figure>For IPv4, the Jumbo Payload option format follows from <xref
      target="RFC2675"/> except that the IP layer sets option type to
      '00001011' and option length to '00000110' noting that the length
      distinguishes this type from its obsoleted use as the "IPv4 Probe MTU"
      option <xref target="RFC1063"/>. The IP layer also interprets the most
      significant option data octet as an "Nsegs" field that encodes a value J
      between 1 and 64 and interprets the following 3-octets as a
      "Jumbo Payload Length" field that encodes the length of the IPv4 header
      plus the length of the {TCP,UDP} header (plus options) plus the combined
      length of the checksum block plus all concatenated segments. The IP layer
      next sets the IPv4 header DF bit to 1, then sets the IPv4 header Total
      Length field to the length of the first segment only. Note that the IP layer
      can form true IPv4 jumbograms (as opposed to parcels) by instead setting
      the IPv4 header Total Length field to 0 and treating the entire 4 octets
      of the option data as the Jumbo Payload Length (see: <xref target="jumbo"/>).</t>

      <t>For IPv6, the IP layer includes a Jumbo Payload option in an IPv6
      Hop-by-Hop Options extension header formatted the same as for IPv4
      above, but with option type set to '11000010' and option length set to
      '00000100'. The IP layer sets the option data "Nsegs" field to a 1-octet
      value J between 1 and 64 and sets the "Jumbo Payload Length" field to a
      3-octet value that encodes the lengths of all IPv6 extension headers
      present plus the length of the {TCP,UDP} header (plus options) plus the
      combined length of the checksum block plus all concatenated segments.
      The IP layer next sets the IPv6 header Payload Length field to the
      length of the first segment only. Note that the IP layer can form true
      IPv6 jumbograms (as opposed to parcels) by instead setting the IPv6
      header Payload Length field to 0 and treating the entire 4 octets of
      the option data as the Jumbo Payload Length (see: <xref target="RFC2675"/>).</t>

      <t>{TCP,UDP}/IP parcel formats are shown in <xref target="struct"/>:</t>

      <t><figure anchor="struct"
              title="{TCP,UDP}/IP Parcel Structure">
          <artwork><![CDATA[        TCP/IP Parcel Structure            UDP/IP Parcel Structure
   +------------------------------+   +------------------------------+
   |IP Hdr plus options/extensions|   |IP Hdr plus options/extensions|
   ~ {Total, Payload} Length = L  ~   ~ {Total, Payload} Length = L  ~
   | Nsegs = J; Jumbo Length = M  |   | Nsegs = J; Jumbo Length = M  |
   +------------------------------+   +------------------------------+
   |                              |   |                              |
   ~   TCP header (plus options)  ~   ~         UDP header           ~
   | (Includes Sequence Number 1) |   |                              |
   +------------------------------+   +------------------------------+
   |   Checksum 1    Checksum 2   |   |   Checksum 1    Checksum 2   |
   +------------------------------+   +------------------------------+
   |   Checksum 3        ...      ~   |   Checksum 3        ...      ~
   +----------------     ...      ~   +---------------      ...      ~
   ~     ...             ...      ~   ~    ...              ...      ~
   ~     ...      ----------------+   ~    ...      -----------------+
   ~     ...         Checksum J   |   ~    ...          Checksum J   |
   +------------------------------+   +------------------------------+
   ~                              ~   ~                              ~
   ~    Segment 1 (L-4 octets)    ~   ~     Segment 1 (L octets)     ~
   +------------------------------+   +------------------------------+
   ~Sequence Number 2 followed by ~   ~                              ~
   ~     Segment 2 (L octets)     ~   ~     Segment 2 (L octets)     ~
   +------------------------------+   +------------------------------+
   ~Sequence Number 3 followed by ~   ~                              ~
   ~     Segment 3 (L octets)     ~   ~     Segment 3 (L octets)     ~
   +------------------------------+   +------------------------------+
   ~             ...              ~   ~             ...              ~
   ~             ...              ~   ~             ...              ~
   +------------------------------+   +------------------------------+
   ~Sequence Number J followed by ~   ~                              ~
   ~     Segment J (K octets)     ~   ~     Segment J (K octets)     ~
   +------------------------------+   +------------------------------+]]></artwork>
        </figure>where J is the total number of segments (between 1 and 64), L
      is the length of each non-final segment which MUST NOT be larger than
      65535 octets (minus headers) and K is the length of the final segment
      which MUST NOT be larger than L. For both TCP and UDP, the {TCP,UDP}
      header is immediately followed by a block of J 2-octet Checksums which
      are then followed by J upper layer protocol segments. For TCP, the TCP
      header sequence number field encodes a 4-octet starting sequence number
      for the first segment only, while each non-first segment is preceded by
      its own 4-octet Sequence Number field. For this reason, the length of
      the first TCP segment is only (L-4) octets since the 4 octet TCP header
      sequence number field applies to that segment.</t>

      <t>The {Total, Payload} Length is then set to L if there are multiple
      segments or K if there is only a single segment. Next, the 1-octet Nsegs
      field is set to J and the 3-octet Jumbo Payload Length "M" is set to the
      length of the IP header plus extensions for IPv4 (or to the length of
      the extension headers only for IPv6), plus the length of the UDP header
      or TCP header plus options, plus the lengths of the Checksum block and
      all concatenated Segments that follow.</t>

      <t>The Nsegs value unambiguously determines the number of 2-octet fields
      present in the checksum block and jointly determines the number of parcel
      data segments in conjunction with the Jumbo Payload Length. Receivers
      therefore observe the following robustness considerations:<list
          style="symbols">
          <t>if Nsegs is less than 1 or greater than 64, or if the Jumbo
          Payload Length indicates insufficient space for the full checksum
          block plus at least one octet of data, the receiver discards the
          parcel.</t>

          <t>if the data length following the checksum block is less than
          (((Nsegs - 1) * L) + 1) the receiver processes all initial fields
          of the checksum block along with their corresponding segments up
          to the end of the data and ignores any remaining checksums.</t>

          <t>if the data length following the checksum block is greater than
          (Nsegs * L) the receiver processes all checksums with their
          corresponding segments and ignores any remaining data beyond the end
          of the final segment.</t>
        </list></t>

      <t>Note: per-segment checksums appear in a contiguous data block
      immediately following the {TCP,UDP}/IP headers instead of inline with
      the parcel segments to greatly increase the likelihood that they will
      appear in the contiguous head of a kernel receive buffer even if the
      parcel was subject to OMNI interface IPv6 fragmentation. This condition
      may not always hold if the IPv6 fragments also incur IPv4 encapsulation
      and fragmentation over paths that traverse slow IPv4 links with small
      MTUs. In that case, performance is bounded by the unavoidable slow link
      traversal and not the overhead for pulling the fragmented checksum block
      into the contiguous head of a kernel receive buffer.</t>
    </section>

    <section anchor="tcp-parcel" title="TCP Parcels">
      <t>A TCP Parcel is an IP Parcel that includes an IP header plus
      extensions with a Jumbo Payload option encoding the number of
      segments (Nsegs) and Jumbo Payload length up to 4MB. The IP header plus
      extensions is then followed by a 20-octet TCP header plus any additional
      TCP option octets. The TCP header is then followed by J consecutive
      2-octet Checksum fields followed by J consecutive segments, where the
      final segment is K octets in length, each non-first segment is L octets
      in length and includes a 4-octet Sequence Number and the first segment
      is (L-4) octets in length and uses the sequence number found in the TCP
      header. The segment length "L" is encoded in the IP header {Total, Payload}
      Length field while the number of segments J is encoded in the Nsegs octet.
      The overall length of the parcel as well as final segment length are
      determined by the Jumbo Payload length "M" as discussed above.</t>

      <t>The source prepares TCP Parcels in a similar fashion as for TCP
      jumbograms <xref target="RFC2675"/>. The source calculates the checksum
      of the TCP header plus IP pseudo-header only, but with the TCP header
      sequence number field temporarily set to 0 during the calculation since
      the true sequence number will be included as a pseudo header for the
      first segment. The source then writes the calculated value in the TCP
      header checksum field as-is (i.e., without converting calculated '0'
      values to 'ffff') and finally re-writes the actual sequence number
      back into the sequence number field. (Nodes that verify the header
      checksum first perform the same operation of temporarily setting
      the sequence number field to 0 and then resetting to the actual
      value after checksum verification.)</t>

      <t>The source then calculates the checksum of the first segment
      beginning with the sequence number found in the full TCP header as a
      4-octet pseudo-header and extending over the (L-4) octet length of the
      segment. The source next calculates the checksum for each non-first
      segment independently over the L octet length of the segment beginning
      with the per-segment Sequence Number. As the source calculates each
      per-segment checksum for segments i=(1 thru J), it writes the value into
      the corresponding segment Checksum(i) field with calculated '0' values
      written as 'ffff'.</t>

      <t>See: <xref target="integrity"/> for further discussion.</t>
    </section>

    <section anchor="udp-parcel" title="UDP Parcels">
      <t>A UDP Parcel is an IP Parcel that includes an IP header plus
      extensions with a Jumbo Payload option encoding the number of
      segments (Nsegs) and Jumbo Payload length up to 4MB. The IP header
      plus extensions is then followed by an 8-octet UDP header followed
      by J consecutive 2-octet
      Checksum fields followed by J upper layer protocol segments. Each
      segment must begin with a transport-specific start delimiter (e.g., a
      segment identifier) included by the transport layer user of UDP. The
      length "L" of each non-final segment is encoded in the IP {Total,
      Payload} Length field while the number of segments J is encoded
      in the Nsegs octet. The overall length of the parcel as well as
      the final segment length are determined by the Jumbo Payload
      length "M" as discussed above.</t>

      <t>The source prepares UDP Parcels in a similar fashion as for UDP
      jumbograms <xref target="RFC2675"/> and MUST set the UDP header
      length field to 0. The source then calculates the checksum of the UDP
      header plus IP pseudo-header and writes the calculated value in the UDP
      header checksum field as-is (i.e., without converting calculated '0'
      values to 'ffff').</t>

      <t>The source then calculates a separate checksum for each segment
      independently over the length of the segment. As the source calculates
      each per-segment checksum for segments i=(1 thru J), it writes the value
      into the corresponding Checksum(i) field with calculated '0' values
      written as 'ffff'.</t>

      <t>See: <xref target="integrity"/> for further discussion.</t>
    </section>

    <section anchor="xmit" title="Transmission of IP Parcels">
      <t>The IP layer of the source next presents each parcel to a network
      interface for transmission over a parcel-capable link. For ordinary
      IP interface attachments to parcel-capable links, the interface simply
      admits each parcel into the link the same as for any IP packet after
      which it may then be forwarded by any number of routers over additional
      consecutive parcel-capable links possibly even traversing the entire
      forward path to the final destination itself. If any router in the path
      does not recognize the parcel construct, it drops the parcel and may
      return an ICMP "Parameter Problem" message.</t>

      <t>If the router recognizes parcels but the next hop link in the path
      does not, or if the parcel would exceed the parcel path MTU, the router
      instead opens the parcel. The router then forwards each enclosed segment
      in singleton IP packets or in a set of smaller sub-parcels that each
      contain a subset of all segments. The router prepares each singleton
      IP packet or smaller sub-parcel for transmission to the next hop as
      follows.</t>

      <t>For transmission of singleton IP packets over links that do not
      support parcels, the router removes the Jumbo Payload option and the
      per segment Checksum and Sequence Number fields then sets IP
      {Total, Payload} length according
      to the standards <xref target="RFC0791"/><xref target="RFC8200"/>. For
      TCP, the router then sets the TCP header Sequence Number field based on
      the starting sequence number for the segment according to <xref
      target="RFC0793"/> and also clears the ACK flag in all but the first
      packet. For UDP, the router then sets the UDP length field according to
      <xref target="RFC0768"/>. For both TCP and UDP, the router next
      calculates the checksum over the length of the packet according to the
      native {TCP,UDP} protocol specification, then writes the value in the
      {TCP,UDP} header checksum field and finally forwards the packet.</t>

      <t>For transmission of smaller sub-parcels over parcel capable links,
      the router breaks the original parcel into smaller groups of segments
      that would fit within the parcel path MTU by determining the number of
      segments of length L that can fit into each sub-parcel under the size
      constraints. For example, if the router determines that a sub-parcel
      can contain 3 segments of length L, it creates sub-parcels with the
      first containing segments 1-3, the second containing segments 4-6,
      etc., and with the final containing any remaining segments. The router
      then appends identical {TCP,UDP}/IP headers plus extensions to each
      sub-parcel while resetting L and M in each according to the above
      equations with Nsegs (J) set to 3 (and K = L) for each non-final
      sub-parcel and with Nsegs set to the remaining number of segments
      for the final sub-parcel. For TCP, the router then sets the TCP
      Sequence Number field to the value that appears in the first sub-parcel
      segment while omitting the first segment Sequence Number header (if present)
      and also clearing the ACK flag in all sub-parcels except the first. For both
      TCP and UDP, the router finally resets the {TCP,UDP} header checksum
      according to ordinary parcel formation procedures (see above) then
      forwards each (sub-)parcel over the outgoing parcel-capable link.</t>
      
      <t>If the outgoing network interface for the original parcel or sub-parcel
      is an OMNI interface <xref target="I-D.templin-6man-omni"/>, the OMNI
      Adaptation Layer (OAL) of this First Hop Segment (FHS) OAL source node
      then forwards the parcel to the next OAL hop which may be either an OAL
      intermediate node or a Last Hop Segment (LHS) OAL destination node. Note
      that OMNI interface upper layer protocol processing procedures are specified 
      in detail the remainder of this section, while lower layer encapsulation
      and fragmentation procedures are specified in detail in
      <xref target="I-D.templin-6man-omni"/>.</t>

      <t>When the OAL source forwards a parcel (whether generated by a local
      application or generated by another node then forwarded over one or more
      parcel capable links), it first assigns a monotonically-incrementing
      (modulo 127) "Parcel ID" for adaptation layer processing. If necessary,
      the OAL source then subdivides the parcel into sub-parcels the same
      as discussed for the IP layer parcel subdivision procedures discussed
      above. The OAL source next selects a monotonically-incrementing
      Identification value for each sub-parcel then performs adaptation
      layer encapsulation and fragmentation and finally forwards them to
      the next OAL hop which forwards further toward the OAL destination
      as necessary.</t>

      <t>When the sub-parcels arrive at the OAL destination, the node can
      optionally retain them along with their Parcel ID and Identifications
      for a brief time to support re-combining with peer sub-parcels of the 
      same original parcel identified by the adaptation layer 4-tuple
      consisting of the (source, destination, Identification, Parcel ID)
      fields. This re-combining entails the concatenation of segments included
      in sub-parcels with the same Parcel ID and with Identification values
      within 64 of one another to create a larger sub-parcel possibly even as
      large as the entire original parcel. Order of concatenation need not be
      strictly enforced, with the exception that the sub-parcel containing the
      final segment must occur as a final concatenation and not as an
      intermediate. The OAL destination then appends a common {TCP,UDP}/IP
      header plus extensions to each re-combined sub-parcel while resetting J,
      K, L and M in each according to the above equations. For TCP, if any
      sub-parcels have the ACK bit set the OAL destination also sets the ACK
      bit in the re-combined sub-parcel TCP header. The OAL destination then
      resets the {TCP,UDP}/IP header checksum for each re-combined sub-parcel.
      If the OAL destination is also the final destination, it then delivers
      the sub-parcels to the IP layer which processes them according to the
      5-tuple information supplied by the original source. Otherwise, the OAL
      destination forwards each sub-parcel toward the final destination the
      same as for an ordinary IP packet as discussed above.</t>

      <t>Note: sub-dividing a larger parcel into two or more sub-parcels
      entails replication of the {TCP,UDP}/IP headers. For TCP, the process
      entails copying the full TCP/IP header from the original parcel while
      writing the sequence number of the first sub-parcel segment into the TCP
      Sequence Number field, clearing the ACK bit if necessary as discussed
      above and truncating the (new) first segment Sequence Number field. For
      UDP, the process entails copying the full UDP/IP header
      from the original parcel into each sub-parcel. For both TCP and UDP, the
      process finally includes recalculating and resetting Nsegs and Jumbo
      Payload Length then recalculating the {TCP,UDP} header checksum. Note
      that the per-segment Checksum values in the sub-parcel segments
      themselves are still valid and need not be recalculated.</t>

      <t>Note: re-combining two or more sub-parcels into a larger parcel
      entails a reverse process of the above in which the {TCP,UDP}/IP
      headers of non-first sub-parcels are discarded and their included segments
      concatenated following those of a first sub-parcel.  For TCP, the process
      includes setting the ACK in the TCP header only if ACK was set in any of
      the original sub-parcels. For both TCP and UDP, the process finally
      includes recalculating and resetting Nsegs and Jumbo Payload Length then
      recalculating the {TCP,UDP} header checksum as discussed above. Note
      that the per-segment Checksum values in the combined parcel segments
      themselves are still valid and need not be recalculated. (This process
      should not be performed by the OAL destination if it would negatively
      impact performance, noting that it is always acceptable to forward
      individual sub-parcels without attempting to re-combine them and
      without delay.)</t>

      <t>Note: while the OAL destination and/or final destination could
      theoretically re-combine the sub-parcels of multiple different parcels
      with identical upper layer protocol 5-tuples and non-final segment
      lengths, this process could become complicated when the different
      parcels each have differing final segment lengths. Since this might
      interfere with any perceived performance advantage, the decision of
      whether and how to perform inter-parcel concatenation is an
      implementation matter.</t>

      <t>Note: sub-dividing of IP parcels occurs only at OMNI link ingress
      nodes while re-combining of IP parcels occurs only at OMNI link egress
      nodes. Therefore, intermediate OAL nodes do not participate in the
      sub-dividing or recombining processes. For TCP, the ACK bit must be
      managed as specified above to avoid confusing receivers with spurious
      duplicate ACK indications.</t>
    </section>

    <section anchor="probe" title="Parcel Path Qualification">
      <t>To determine whether parcels are supported over at least a leading
      portion of the forward path up to and including the final destination,
      the original source can send IP parcels that contain Jumbo Payload
      options formatted as "Parcel Probes". The purpose of the probe is to
      elicit a "Parcel Reply" and possibly also an ordinary upper layer
      protocol probe reply from the final destination. The former is
      used to establish the parcel path MTU, while the latter determines
      the (transport layer) maximum segment size.</t>

      <t>If the original source receives a positive Parcel Reply, it marks
      the path as "parcels supported" and ignores any ordinary ICMP <xref
      target="RFC0792"/><xref target="RFC4443"/> and/or Packet Too Big (PTB)
      messages <xref target="RFC1191"/><xref target="RFC8201"/> concerning the
      probe. If the original source instead receives a negative Parcel Reply
      or no reply, it marks the path as "parcels not supported" and may regard
      any ordinary ICMP and/or PTB messages concerning the probe (or its
      contents) as indications of a possible MTU restriction.</t>

      <t>The original source can therefore send Parcel Probes in parallel with
      sending real data as ordinary IP packets/parcels. The parcel probes will
      traverse parcel-capable links joined by routers on the forward path
      possibly extending all the way to the destination. If the original
      source receives a Parcel Reply, it can continue using IP parcels.</t>

      <t>Parcel Probes use the same Jumbo Payload option type used for
      ordinary parcels (see: <xref target="parcels"/>) but set a different
      option length and include a 4-octet "(Parcel) Path MTU" field into which
      conformant routers write the minimum link MTU observed in a similar
      fashion as described in <xref target="RFC1063"/><xref
      target="I-D.ietf-6man-mtu-option"/>. Parcel Probes include one or more
      upper layer protocol segments corresponding to the 5-tuple for the flow,
      which may also include {TCP,UDP} segment size probes used for
      packetization layer path MTU discovery <xref target="RFC4821"/>
      <xref target="RFC8899"/>.</t>

      <t>The original source sends Parcel Probes unidirectionally in the
      forward path toward the final destination to elicit a Parcel Reply,
      since it will often be the case that IP parcels are supported only in
      the forward path and not in the return path. Parcel Probes may be
      filtered in the forward path by any node that does not recognize IP
      parcels, but Parcel Replys must be packaged to avoid filtering since
      parcels may not be recognized along portions of the return path. For
      this reason, the Jumbo Payload options included in Parcel Probes are
      always packaged as IPv4 header options or IPv6 Hop-by-Hop options while
      Parcel Replys are returned as UDP/IP encapsulated ICMPv6 PTB messages
      with a "Parcel Reply" Code value (see: <xref
      target="I-D.templin-6man-omni"/>).</t>

      <t>Original sources send Parcel Probes that include a Jumbo Payload
      option coded in an alternate format as shown in <xref target="probe-fmt"/>:
      <figure anchor="probe-fmt"
              title="Parcel Probe Jumbo Payload Option Format">
          <artwork><![CDATA[   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Option Type  |  Opt Data Len |    Nonce-1    |     Check     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Nsegs     |             Jumbo Payload Length              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    (Parcel) Path MTU (PMTU)                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-                       Nonce-2                       -+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>
        </figure>For IPv4, the original source includes the option as an IPv4
      option with Type set to '00001011' the same as for an ordinary IPv4
      parcel (see: <xref target="parcels"/>) but with Length set to '00010100'
      to distinguish this as a probe. The original source sets Nonce-1 to
      '11111111', sets Check to the same value that will appear in the TTL
      of the outgoing IPv4 header, sets PMTU
      to the MTU of the outgoing IPv4 interface and sets Nonce-2 to a 64-bit
      random number. The source next includes a {TCP,UDP} header followed by
      upper layer protocol segments in the same format as for an ordinary
      parcel. The source then sets {Nsegs, Jumbo Payload Length, IPv4 Total
      Length} and calculates the header and per-segment checksums the same as
      for an ordinary parcel. The source finally sends the Parcel Probe via the
      outbound IPv4 interface. According to <xref target="RFC7126"/>,
      middleboxes (i.e., routers, security gateways, firewalls, etc.) that do
      not observe this specification SHOULD drop IP packets that contain
      option type '00001011' ("IPv4 Probe MTU") but some might instead either
      attempt to implement <xref target="RFC1063"/> or ignore the option
      altogether. IPv4 middleboxes that observe this specification instead
      MUST process the option as a Parcel Probe as specified below.</t>

      <t>For IPv6, the original source includes the probe option as an IPv6
      Hop-by-Hop option with Type set to '11000010' the same as for an
      ordinary IPv6 parcel (see: <xref target="parcels"/>) but with Length set
      to '00010010' to distinguish this as a probe. The original source sets
      Nonce-1 to '11111111', sets Check to the same value that will appear in
      the Hop Limit of the outgoing IPv6 header, sets PMTU to the MTU of the
      outgoing IPv6 interface and sets Nonce-2 to a 64-bit random number. The
      source next includes a {TCP,UDP} header followed by one or more upper
      layer protocol segments in the same format as for an ordinary parcel.
      The source then sets {Nsegs, Jumbo Payload Length, IPv6 Payload Length}
      and calculates the header and per-segment checksums the same as for an
      ordinary parcel, then finally sends the Parcel Probe via the outbound
      IPv6 interface. According to <xref target="RFC2675"/>, middleboxes
      (i.e., routers, security gateways, firewalls, etc.) that recognize the
      IPv6 Jumbo Payload option but do not observe this specification SHOULD
      return an ICMPv6 Parameter Problem message (and presumably also drop the
      packet) due to the different option length. IPv6 middleboxes that
      observe this specification instead MUST process the option as a Parcel
      Probe as specified below.</t>

      <t>When a router that observes this specification receives either an
      IPv4 or IPv6 Parcel Probe it first compares Nonce-1 with '11111111' and
      Check with the IP header TTL/Hop Limit; if either value differs, the router
      MUST drop the probe and return a negative Parcel Reply (see below). Otherwise,
      if the next hop link is non-parcel-capable or configures an MTU that is
      too small to pass the probe, the router compares the PMTU value with
      the MTU of the inbound link for the probe and MUST (re)set PMTU to
      the lower MTU. The router then MUST return a positive Parcel Reply (see
      below) and convert the probe into an ordinary IP packet(s) the same as was
      described previously for routers forwarding to non-parcel-capable links.
      If the next hop IP link configures a sufficiently large MTU to pass the
      packet(s), the router then MUST forward each packet to the next hop;
      otherwise, it MUST drop each packet and return a suitable PTB. If the
      next hop IP link both supports parcels and configures an MTU that is
      large enough to pass the probe, the router instead compares the probe
      PMTU value with the MTUs of both the inbound and outbound links for the
      probe and MUST (re)set PMTU to the lower MTU. The router then MUST reset
      Check to the same value that will appear in the TTL/Hop Limit of the
      outgoing IP header, and MUST forward the Parcel Probe to the next hop.</t>

      <t>The final destination may therefore receive either one or more
      ordinary IP packets or an intact Parcel Probe. If the final destination
      receives ordinary IP packets, it performs any necessary integrity checks
      then delivers the packets to upper layers which will return an upper layer
      probe response if necessary. If the final destination receives a Parcel
      Probe, it first compares Nonce-1 with '11111111' and Check with the IP
      header TTL/Hop Limit; if either value differs, the final destination
      MUST drop the probe and return a negative Parcel Reply. Otherwise, the
      final destination compares the probe PMTU value with the MTU of the
      inbound link and MUST (re)set PMTU to the lower MTU. The final destination
      then MUST return a positive Parcel Reply and deliver the probe contents
      to upper layers the same as for an ordinary IP parcel.</t>

      <t>When a router or final destination returns a Parcel Reply, it
      prepares an ICMPv6 PTB message <xref target="RFC4443"/> with Code set to
      "Parcel Reply" <xref target="I-D.templin-6man-omni"/> and with MTU set
      to either the PMTU value reported in the Parcel Probe for a positive
      reply or to the value 0 for a negative reply. The node then writes its
      own IP address as the Parcel Reply source and writes the source of the
      Parcel Probe as the Parcel Reply destination (for IPv4 Parcel Probes,
      the node writes the Parcel Reply addresses as IPv4-Compatible IPv6
      addresses <xref target="RFC4291"/>). The node next copies as much of
      the leading portion of the Parcel Probe (beginning with the IP header)
      as possible into the "packet in error" field without causing the Parcel
      Reply to exceed 512 octets in length, then calculates the ICMPv6 header
      checksum. Since IPv6 packets cannot traverse IPv4 paths, and since
      middleboxes often filter ICMPv6 messages as they traverse IPv6 paths,
      the node next wraps the Parcel Reply in UDP/IP headers of the correct
      IP version with the IP source and destination addresses copied from
      the Parcel Reply and with UDP port numbers set to the UDP port number
      for OMNI <xref target="I-D.templin-6man-omni"/>. In the process, the
      node either calculates or omits the UDP checksum as appropriate and
      (for IPv4) clears the DF bit. The node finally sends the prepared
      Parcel Reply to the original source of the probe.</t>

      <t>After sending a Parcel Probe the original source may therefore
      receive a UDP/IP encapsulated Parcel Reply (see above) and/or an upper
      layer protocol probe reply. If the source receives a Parcel Reply, it
      first verifies the checksum(s) then matches the enclosed PTB message
      with the original Parcel Probe by examining the Nonce-2 field echoed in
      the ICMPv6 "packet in error" field containing the leading portion of the
      probe. If PTB does not match, the source discards the Parcel Reply;
      otherwise, it continues to process. If the Parcel Reply MTU is 0,
      the source marks the path as "parcels not supported; otherwise, it
      marks the path as "parcels supported" and also records the MTU value
      as the parcel path MTU for the forward path to this destination. (Note
      that this size may be larger than the maximum-sized singleton
      jumbogram that can traverse the path.)</t>

      <t>After receiving a positive Parcel Reply, the original source can
      continue sending IP parcels addressed to the final destination up to
      the size of the parcel path MTU; any upper layer protocol probe replies
      will determine the maximum segment size that can be included in the
      parcel as an upper layer consideration. After receiving a negative
      Parcel Reply (or no reply) the original source should refrain from
      sending parcels. In both cases, the original source should then
      periodically re-initiate Parcel Path Qualification as long as it
      continues to prefer to use the IP parcel service. If at any time
      performance appears to degrade, the original source should cease
      sending IP parcels and/or reduce the size of the parcels it sends.</t>

      <t>Nodes can also use this Parcel Path Qualification procedure to
      qualify the path for ordinary IP jumbograms simply by setting Nonce-1
      to the value '11111110' and formatting the probe body as an ordinary
      jumbogram no larger than the maximum size that can be represented in the
      32-bit Jumbo Payload Length. Nodes that forward the (Jumbogram) Parcel
      Probe will recognize the Nonce-1 value as an indication that the probe
      is a true Jumbogram (i.e., and not a parcel). The node then sets PMTU
      to the largest possible Jumbogram size and forwards the probe to the
      next hop. Nodes that return (Jumbogram) Parcel Replys will then return
      the resulting PMTU value. This especially implies the largest possible
      Jumbogram size may be less than the largest possible parcel size,
      since forwarding nodes can sub-divide parcels but cannot sub-divide
      singleton Jumbograms.</t>

      <t>Note: when a Parcel Probe forwarded into an ingress OMNI interface is
      broken into sub-parcels, each sub-parcel includes its own copy of the
      Parcel Probe header. When multiple sub-parcels of the same Parcel Probe
      arrive at an egress OMNI interface, the interface optionally re-combines
      the sub-parcels while retaining the Parcel Probe header. It is therefore
      possible that a single Parcel Probe with multiple upper layer protocol
      segments could generate multiple Parcel Replys.</t>

      <t>Note: The original source includes Nonce-1 and Check fields as the
      first two octets of Parcel Probes in case a router on the path
      overwrites the values in a wayward attempt to implement <xref
      target="RFC1063"/>. Parcel Probe recipients should therefore regard a
      Nonce-1 value other than '11111111' or '11111110' as an indication that
      the field was either intentionally or accidentally altered by a previous
      hop node that does not recognize parcels.</t>

      <t>Note: The MTU value returned in a Parcel Reply determines only the
      maximum IP parcel size for the path, while the maximum upper layer
      protocol segment size may be significantly smaller. The upper layer
      protocol segment size is instead determined separately according
      to any upper layer protocol probing.</t>

      <t>Note: When the OMNI interface of an ingress middlebox receives a
      Parcel Probe with PMTU larger than 64KB, it can optionally leave PMTU
      unchanged (i.e., if it intends to support parcel subdivision internally)
      or rewrite PMTU to 64KB to disable adaptation layer parcel sub-division.
      Regardless of the decision taken by the ingress middlebox, correct
      behavior will be observed by the final destination whether or not
      the egress middlebox elects to recombine sub-parcels.</t>

      <t>Note: If a router or final destination receives a Parcel Probe but
      does not recognize the parcel construct, it simply drops the probe. The
      original source will then deem the probe as lost and parcels cannot be
      used.</t>
    </section>

    <section anchor="integrity" title="Integrity">
      <t>The {TCP,UDP}/IP header plus each segment of a (multi-segment) IP
      parcel includes its own integrity check. This means that IP parcels can
      support stronger and more discrete integrity checks for the same amount
      of upper layer protocol data compared to an ordinary IP packet or
      Jumbogram. The {TCP/UDP} header integrity checks can be verified at
      each hop to ensure that parcels with errored headers are dropped to
      avoid mis-delivery (nodes that set and verify TCP parcel header
      checksums must honor the sequence number discipline discussed in
      <xref target="tcp-parcel"/>). The header and per-segment integrity
      checks must then be verified at the final destination.</t>

      <t>IP parcels can range in length from as small as only the {TCP,UDP}/IP
      headers plus a single checksum with a segment of length 1 to as large as
      the headers plus (64 * (65535 minus headers)) octets. Although 32-bit
      link layer integrity checks provide sufficient protection for contiguous
      data blocks up to approximately 9KB, reliance on link-layer integrity
      checks may be inadvisable for links with significantly larger MTUs and
      may not be possible at all for links such as tunnels over IPv4 that invoke
      fragmentation. Moreover, the segment contents of a received parcel may
      arrive in an incomplete and/or rearranged order with respect to their
      original packaging.</t>

      <t>Lower layer protocol entities always calculate and verify the
      {TCP,UDP}/IP parcel header checksum at their layer, since an errored
      header could result in mis-delivery to the wrong upper layer protocol
      entity. If the lower layer protocol entity of the destination detects
      an incorrect {TCP,UDP}/IP checksum it discards the entire IP parcel.</t>

      <t>To support the parcel header checksum calculation, lower layer
      protocol entities use modified versions of the {TCP,UDP}/IPv4
      "pseudo-header" found in <xref target="RFC0768"/><xref target="RFC0793"/>,
      or the {TCP,UDP}/IPv6 "pseudo-header" found in Section 8.1 of
      <xref target="RFC8200"/>. Note that while the contents of the
      two IP protocol version-specific pseudo-headers beyond the address
      fields are the same, the order in which the contents are arranged
      differs and must be honored according to the specific IP protocol
      version as shown in <xref target="pseudo"/>. This allows for maximum
      reuse of widely deployed code while ensuring interoperability.</t>

      <t><figure anchor="pseudo"
              title="{TCP,UDP}/IP Parcel Pseudo-Header Formats">
        <artwork><![CDATA[                       IPv4 Parcel Pseudo-Header
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      IPv4 Source Address                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    IPv4 Destination Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      zero     |  Next Header  |        Segment Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Nsegs     |          Upper-Layer Packet Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                       IPv6 Parcel Pseudo-Header
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                      IPv6 Source Address                      ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                   IPv6 Destination Address                    ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Nsegs     |          Upper-Layer Packet Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Segment Length         |      zero     |  Next Header  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
        </figure>where:<list style="symbols">
          <t>Source Address is the 4-octet IPv4 or 16-octet IPv6 source
          address of the prepared parcel.</t>

          <t>Destination Address is the 4-octet IPv4 or 16-octet IPv6
          destination address of the prepared parcel.</t>

          <t>zero encodes the constant value '0'.</t>

          <t>Next Header is the IP protocol number corresponding to the upper
          layer protocol, i.e., TCP or UDP.</t>

          <t>Segment Length is the value that appears in the IPv4 Total
          Length or IPv6 Payload Length field of the prepared parcel.</t>

          <t>Nsegs is the 1-octet number of segments included, and must
          contain a number between 1 and 64 (this is the same value
          that appears in the Jumbo Payload Option).</t>

          <t>Upper-Layer Packet Length is the 3-octet length of the
          {TCP,UDP} header plus TCP data (this value can be derived from
          the Jumbo Payload Length by subtracting the IPv4 header length
          for IPv4 or IPv6 extension header length for IPv6).</t>
        </list></t>

      <t>Upper layer protocol entities use socket options to coordinate
      per-segment checksum processing with lower layers. If the upper layer
      sets a SO_NO_CHECK(TX) socket option, the upper layer is responsible for
      supplying per-segment checksums on transmission and the lower layer
      forwards the IP parcel to the next hop without further processing;
      otherwise, the lower layer calculates the per-segment checksums before
      forwarding. If the upper layer sets a SO_NO_CHECK(RX) socket option, the
      upper layer is responsible for verifying per-segment checksums on
      reception and the lower layer delivers each received parcel body to
      the upper layer without further processing; otherwise, the lower
      layer verifies the per-segment parcel checksums before delivering.</t>

      <t>When the upper layer protocol entity of the source sends a parcel
      body to lower layers, it prepends a block of Nsegs 2-octet Checksum
      fields and includes a 4-octet Sequence Number field with each TCP
      non-first segment. If the SO_NO_CHECK(TX) socket option is set, the
      upper layer protocol either calculates each segment checksum and writes
      the value into the checksum field (with '0' values written as 'ffff') or
      writes the value '0' to disable checksums for specific segments. If the
      SO_NO_CHECK(TX) socket options is clear, the upper layer instead writes
      the value '0' to disable or any non-zero value to enable checksums for
      specific segments.</t>

      <t>When the lower layer protocol entity of the source receives the
      parcel body from upper layers, if the SO_NO_CHECK(TX) socket option is
      set the lower layer appends the {TCP,UDP}/IP headers and forwards the
      parcel to the next hop without further processing. If the
      SO_NO_CHECK(TX) socket option is clear, the lower layer instead
      calculates the checksum for each segment with a non-zero value in the
      Checksum field and overwrites the calculated value into the Checksum
      field (with '0' values written as 'ffff').</t>

      <t>When the lower layer protocol entity of the destination receives a
      parcel from the source, if the SO_NO_CHECK(RX) socket option is set the
      lower layer delivers the parcel body to the upper layer without further
      processing, and the upper layer is responsible for per-segment checksum
      verification. If the SO_NO_CHECK(RX) socket option is clear, the lower
      layer instead calculates the checksum for each segment with a non-zero
      value in the Checksum field and overwrites the field with the value '1'
      if the checksum is correct or any other non-zero value if the checksum
      is incorrect. The lower layer then delivers the parcel body (beginning
      with the Checksum block) to the upper layer, which will see the value
      '0' for checksums disabled, '1' for checksum correct or any other value
      for checksum incorrect in each segment Checksum.</t>

      <t>Note: The checksum block itself is intentionally omitted from the IP
      Parcel {TCP,UDP} header checksum calculation. This permits destinations
      to accept as many intact segments as possible from received parcels with
      checksum block bit errors, whereas the entire parcel would need to be
      discarded if the header checksum also covered the checksum block.</t>

      <t>Note: Implementations may provide a configuration option that allows
      lower layers to deliver the actual checksum received in an errored parcel
      segment to upper layers instead of a random value other than '0' or '1',
      e.g., for logging purposes. If so, the lower layer should rewrite actual
      '1' checksums to 'ffff' to allow upper layers to discern correct from
      errored checksums.</t>
    </section>

    <section anchor="issues" title="RFC2675 Updates">
      <t>Section 3 of <xref target="RFC2675"/> provides a list of certain
      conditions to be considered as errors. In particular:<list style="empty">
          <t>error: IPv6 Payload Length != 0 and Jumbo Payload option
          present</t>

          <t>error: Jumbo Payload option present and Jumbo Payload Length &lt;
          65,536</t>
        </list></t>

      <t>Implementations that obey this specification ignore these conditions
      and do not consider them as errors.</t>
    </section>

    <section anchor="jumbo" title="IPv4 Jumbograms">
      <t>By defining a new IPv4 Jumbo Payload option, this document also
      implicitly enables a true IPv4 jumbogram service defined as an IPv4
      packet with a Jumbo Payload option included and with Total Length set to
      0. All other aspects of IPv4 jumbograms are the same as for IPv6
      jumbograms <xref target="RFC2675"/>.</t>
    </section>

    <section anchor="implement" title="Implementation Status">
      <t>Common widely-deployed implementations include services such as TCP
      Segmentation Offload (TSO) and Generic Segmentation/Receive Offload
      (GSO/GRO). These services support a robust (but not standardized)
      service that has been shown to improve performance in many
      instances.</t>

      <t>UDP/IPv4 parcels have been implemented in the linux-5.10.67 kernel and
      ION-DTN ion-open-source-4.1.0 source distributions. Patch distribution
      found at: "https://github.com/fltemplin/ip-parcels.git".</t>

      <t>Testing with a
      single-threaded receiver has shown that including increasing numbers
      of segments in a single parcel produces modest performance gains over
      fewer numbers of segments due to more efficient packaging and reduced
      system calls/interrupts. Since performance is strongly bounded by single
      segment receiver processing time (with larger segments producing dramatic
      performance increases), it is expected that parcels with increasing
      numbers of segments will provide a performance multiplier on
      multi-threaded receivers running in parallel processing environments.</t>
    </section>

    <section anchor="iana" title="IANA Considerations">
      <t>The IANA is instructed to change the "MTUP - MTU Probe" entry in the
      'ip option numbers' registry to the "JUMBO - IPv4 Jumbo Payload" option.
      The Copy and Class fields must both be set to 0, and the Number and
      Value fields must both be set to '11'. The reference must be changed to
      this document [RFCXXXX].</t>
    </section>

    <section anchor="secure" title="Security Considerations">
      <t>Original sources match the Nonce values in received Parcel Replys
      with their corresponding Parcel Probes. If the values match, the reply
      is likely an authentic response to the probe. In environments where
      stronger authentication is necessary, nodes that send Parcel Replys
      can apply the message authentication services of OMNI and Automatic
      Extended Route Optimization (AERO) <xref target="I-D.templin-6man-aero"/>.</t>

      <t>Multi-layer security solutions may be necessary to ensure
      confidentiality, integrity and availability in some environments.</t>
    </section>

    <section anchor="ack" title="Acknowledgements">
      <t>This work was inspired by ongoing AERO/OMNI/DTN investigations. The
      concepts were further motivated through discussions on the intarea and
      6man lists.</t>

      <t>A considerable body of work over recent years has produced useful
      "segmentation offload" facilities available in widely-deployed
      implementations.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.2675"?>

      <?rfc include="reference.RFC.0768"?>

      <?rfc include="reference.RFC.0791"?>

      <?rfc include="reference.RFC.0792"?>

      <?rfc include="reference.RFC.0793"?>

      <?rfc include="reference.RFC.4443"?>

      <?rfc include="reference.RFC.4291"?>

      <?rfc include="reference.RFC.8200" ?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.I-D.templin-6man-fragrep"?>

      <?rfc include="reference.I-D.templin-6man-aero"?>

      <?rfc include="reference.I-D.templin-6man-omni"?>

      <?rfc include="reference.RFC.9000"?>

      <?rfc include="reference.RFC.1071"?>

      <?rfc include="reference.RFC.5326"?>

      <?rfc include="reference.RFC.1063"?>

      <?rfc include="reference.RFC.7126"?>

      <?rfc include="reference.RFC.1191"?>

      <?rfc include="reference.RFC.4821"?>

      <?rfc include="reference.RFC.8201"?>

      <?rfc include="reference.RFC.8899"?>

      <?rfc include="reference.RFC.9171"?>

      <?rfc include="reference.I-D.ietf-tcpm-rfc793bis"?>

      <?rfc include="reference.I-D.templin-dtn-ltpfrag"?>

      <?rfc include="reference.I-D.ietf-6man-mtu-option"?>

      <reference anchor="QUIC">
        <front>
          <title>Accelerating UDP packet transmission for QUIC,
          https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/</title>

          <author fullname="Alessandro Ghedini" initials="A."
                  surname="Ghedini">
            <organization/>
          </author>

          <date day="8" month="January" year="2020"/>
        </front>
      </reference>

      <reference anchor="BIG-TCP">
        <front>
          <title>BIG TCP, Netdev 0x15 Conference (virtual),
          https://netdevconf.info/0x15/session.html?BIG-TCP</title>

          <author fullname="Eric Dumazet" initials="E." surname="Dumazet">
            <organization/>
          </author>

          <date day="31" month="August" year="2021"/>
        </front>
      </reference>
    </references>

    <section anchor="futures" title="IP Parcel Futures">
      <t>Historic and current-day data links configure Maximum Transmission
      Units (MTUs) that are far smaller than the desired state for the future
      of IP parcel transmission. When the first Ethernet data links were
      deployed many decades ago, their 1500 octet MTU set a strong precedent
      that was widely adopted. This same size now appears as the predominant
      MTU limit for most paths in the Internet today, although modern link
      deployments with larger MTUs up to 9KB have begun to emerge.</t>

      <t>In the late 1980's, the Fiber Distributed Data Interface (FDDI)
      standard defined a new link type with MTU slightly larger than 4500
      octets. The goal of the larger MTU was to increase performance by a
      factor of 10 over the ubiquitous 10Mbps and 1500 octet MTU Ethernet
      technologies of the time. Many factors including a failure to harmonize
      MTU diversity and an Ethernet performance increase to 100Mbps led to
      the demise of FDDI. Moving into the next decade, the 1990's saw new
      initiatives including ATM/AAL5 (9KB MTU) and HiPPI (64KB MTU) which
      offered high-speed data link alternatives with larger MTUs but again
      the inability to harmonize diversity derailed their momentum. By the
      end of the 1990s and leading into 2000's, emergence of the 1Gbps and
      faster Ethernet performance levels seen today has obscured the fact
      that the modern Internet of the 21st century is still operating with
      20th century MTUs!</t>

      <t>To bridge this gap, increased OMNI interface deployment in the
      near future will provide an unlimited MTU virtual link type that can
      pass IP parcels over paths that traverse traditional data links with
      small MTUs. Experiments have shown that (single-threaded) receive-side
      performance is bounded by upper layer protocol segment size, with
      performance increasing in direct proportion with segment size.
      Experiments have also shown that (single-threaded) performance increases
      moderately by including larger numbers of segments per parcel. However,
      parallel receive-side processing will provide performance multiplier
      benefits since the multiple segments that arrive in a single parcel
      can be processed simultaneously instead of serially.</t>

      <t>But, the true power of IP parcels will become evident as future
      parcel-capable links with extremely large MTUs begin
      to emerge. These links will provide MTUs far in excess of 64KB (up
      to as large as 4MB). With such large MTUs, the traditional CRC-32 (or
      even CRC-64) error checking with errored packet discard discipline will
      no longer apply for large parcels. Instead, parcels larger than a
      link-specific threshold will include Forward Error Correction (FEC)
      codes so that errored parcels can be repaired at the receiver's data
      link layer then delivered to upper layers rather than being discarded
      and triggering retransmission of large amounts of data. Even if the
      FEC repairs are incomplete or imperfect, all parcels can still be
      delivered to upper layers where the individual segment checksums
      will detect and discard any damaged data not repaired by lower layers.</t>

      <t>These new "super-links" will most likely occur in the network edges
      (e.g., high-speed data centers) and would not be expected to occur in
      the middle of the Internet. (However, some space-domain links that
      extend over enormous distances may also benefit.) For this reason, a
      common use case will include parcel-capable super-links in the edge
      networks of both parties of an end-to-end session with an OMNI link
      connecting the two over wide area Internetworks. Small- to medium-sized
      IP parcels over OMNI links will already provide considerable performance
      benefits for wide-area end-to-end communications while truly large IP
      parcels over super-links can provide boundless increases for localized
      bulk transfers in edge networks or for deep space long haul
      transmissions. The ability to grow and adapt without practical bound
      enabled by IP parcels will inevitably encourage new data link
      development leading to future innovations in new markets that will
      revolutionize the Internet.</t>

      <t>Until these new links begin to emerge, however, parcels will already
      provide a tremendous benefit to end systems by allowing applications to
      send and receive segment buffers larger than 65535 octets in a single
      system call. By expanding the current operating system call data copy
      limit from its current 16-bit length to a 32-bit length, applications
      will be able to send and receive maximum-length parcel buffers even if
      lower layers need to break them into multiple parcels to fit within the
      underlying interface MTU. For applications such as the Delay Tolerant
      Networking (DTN) Bundle Protocol <xref target="RFC9171"/>, this will
      allow applications to send and receive entire large DTN bundles in 
      a single system call.</t>
    </section>
  </back>
</rfc>
