<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.6.7 (Ruby 3.1.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-bormann-jsonpath-iregexp-04" category="std" consensus="true" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.12.3 -->
  <front>
    <title abbrev="I-Regexp">I-Regexp: An Interoperable Regexp Format</title>
    <seriesInfo name="Internet-Draft" value="draft-bormann-jsonpath-iregexp-04"/>
    <author initials="C." surname="Bormann" fullname="Carsten Bormann">
      <organization>Universität Bremen TZI</organization>
      <address>
        <postal>
          <street>Postfach 330440</street>
          <city>Bremen</city>
          <code>D-28359</code>
          <country>Germany</country>
        </postal>
        <phone>+49-421-218-63921</phone>
        <email>cabo@tzi.org</email>
      </address>
    </author>
    <author initials="T." surname="Bray" fullname="Tim Bray">
      <organization>Textuality</organization>
      <address>
        <email>tbray@textuality.com</email>
      </address>
    </author>
    <date year="2022" month="April" day="25"/>
    <keyword>Internet-Draft</keyword>
    <abstract>
      <t>This document specifies I-Regexp, a flavor of regular expressions that is
limited in scope with the goal of interoperation across many different
regular-expression libraries.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-bormann-jsonpath-iregexp/"/>.
      </t>
      <t>
        Discussion of this document takes place on the
        JSONpath Working Group mailing list (<eref target="mailto:JSONpath@ietf.org"/>),
        which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/JSONpath/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://github.com/cabo/iregexp"/>.</t>
    </note>
  </front>
  <middle>
    <section anchor="intro">
      <name>Introduction</name>
      <t>This specification describes an interoperable regular expression flavor, I-Regexp.</t>
      <t>This document uses the abbreviation "regexp" for what are usually
called regular expressions in programming.
"I-Regexp" is used as a noun meaning a character string which conforms to the requirements
in this specification; the plural is "I-Regexps".</t>
      <t>I-Regexp does not provide advanced regexp features such as capture groups, lookahead, or backreferences.
It supports only a Boolean matching capability, i.e., testing whether a given regexp matches a given piece of text.</t>
      <t>I-Regexp supports the entire repertoire of Unicode characters.</t>
      <t>I-Regexp is a subset of XSD regexps <xref target="XSD-2"/>.</t>
      <t>This document includes rules for converting I-Regexps for use with several well-known regexp libraries.</t>
      <section anchor="terminology">
        <name>Terminology</name>
        <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
        <t>The grammatical rules in this document are to be interpreted as ABNF,
as described in <xref target="RFC5234"/> and <xref target="RFC7405"/>.</t>
      </section>
    </section>
    <section anchor="requirements">
      <name>Requirements</name>
      <t>I-Regexps should handle the vast majority of practical cases where a
matching regexp is needed in a data model specification or a query
language expression.</t>
      <t>A brief survey of published RFCs yielded the regexp patterns in
Appendix A (with no attempt at completeness).
With certain exceptions as discussed there,
these should be covered by I-Regexps, both syntactically and with
their intended semantics.</t>
    </section>
    <section anchor="defn">
      <name>I-Regexp Syntax</name>
      <t>An I-Regexp <bcp14>MUST</bcp14> conform to the ABNF specification in
<xref target="iregexp-abnf"/>.</t>
      <figure anchor="iregexp-abnf">
        <name>I-Regexp Syntax in ABNF</name>
        <sourcecode type="abnf"><![CDATA[
i-regexp = branch *( "|" branch )
branch = *piece
piece = atom [ quantifier ]
quantifier = ( %x2A-2B ; '*'-'+'
 / "?" ) / ( "{" quantity "}" )
quantity = QuantExact [ "," [ QuantExact ] ]
QuantExact = 1*%x30-39 ; '0'-'9'

atom = NormalChar / charClass / ( "(" i-regexp ")" )
NormalChar = ( %x00-27 / %x2C-2D ; ','-'-'
 / %x2F-3E ; '/'-'>'
 / %x40-5A ; '@'-'Z'
 / %x5E-7A ; '^'-'z'
 / %x7E-10FFFF )
charClass = "." / SingleCharEsc / charClassEsc / charClassExpr
SingleCharEsc = "\" ( %x28-2B ; '('-'+'
 / %x2D-2E ; '-'-'.'
 / "?" / %x5B-5E ; '['-'^'
 / %s"n" / %s"r" / %s"t" / %x7B-7D ; '{'-'}'
 )
charClassEsc = catEsc / complEsc
charClassExpr = "[" [ "^" ] ( "-" / CCE1 ) *CCE1 [ "-" ] "]"
CCE1 = ( CCchar [ "-" CCchar ] ) / charClassEsc
CCchar = ( %x00-2C / %x2E-5A ; '.'-'Z'
 / %x5E-10FFFF ) / SingleCharEsc
catEsc = %s"\p{" charProp "}"
complEsc = %s"\P{" charProp "}"
charProp = IsCategory / IsBlock
IsCategory = Letters / Marks / Numbers / Punctuation / Separators /
    Symbols / Others
Letters = %s"L" [ ( %x6C-6D ; 'l'-'m'
 / %s"o" / %x74-75 ; 't'-'u'
 ) ]
Marks = %s"M" [ ( %s"c" / %s"e" / %s"n" ) ]
Numbers = %s"N" [ ( %s"d" / %s"l" / %s"o" ) ]
Punctuation = %s"P" [ ( %x63-66 ; 'c'-'f'
 / %s"i" / %s"o" / %s"s" ) ]
Separators = %s"Z" [ ( %s"l" / %s"p" / %s"s" ) ]
Symbols = %s"S" [ ( %s"c" / %s"k" / %s"m" / %s"o" ) ]
Others = %s"C" [ ( %s"c" / %s"f" / %x6E-6F ; 'n'-'o'
 ) ]
IsBlock = %s"Is" 1*( "-" / %x30-39 ; '0'-'9'
 / %x41-5A ; 'A'-'Z'
 / %x61-7A ; 'a'-'z'
 )
]]></sourcecode>
      </figure>
      <t>As an additional restriction, <tt>charClassExpr</tt> is not allowed to
match <tt>[^]</tt>, which according to this grammar would parse as a
positive character class containing the single character <tt>^</tt>.</t>
      <t>This is essentially XSD regexp without character class
subtraction and multi-character escapes such as <tt>\s</tt>,
<tt>\S</tt>, and <tt>\w</tt>.</t>
      <t>An I-Regexp implementation <bcp14>MUST</bcp14> be a complete implementation of this
limited subset.
In particular, full Unicode support is <bcp14>REQUIRED</bcp14>; the implementation
<bcp14>MUST NOT</bcp14> limit itself to 7- or 8-bit character sets such as ASCII and
<bcp14>MUST</bcp14> support the Unicode character property set in character classes.</t>
    </section>
    <section anchor="i-regexp-semantics">
      <name>I-Regexp Semantics</name>
      <t>This syntax is a subset of that of <xref target="XSD-2"/>.
Implementations which interpret I-Regexps <bcp14>MUST</bcp14>
yield Boolean results as specified in <xref target="XSD-2"/>.
(See also <xref target="xsd-regexps"/>.)</t>
    </section>
    <section anchor="mapping-i-regexp-to-regexp-dialects">
      <name>Mapping I-Regexp to Regexp Dialects</name>
      <t>(TBD; these mappings need to be further verified in implementation work.)</t>
      <section anchor="xsd-regexps">
        <name>XSD Regexps</name>
        <t>Any I-Regexp also is an XSD Regexp <xref target="XSD-2"/>, so the mapping is an identity
function.</t>
        <t>Note that a few errata for <xref target="XSD-2"/> have been fixed in <xref target="XSD11-2"/>, which
is therefore also included as a normative reference.
XSD 1.1 is less widely implemented than XSD 1.0, and implementations
of XSD 1.0 are likely to include these bugfixes, so for the intents
and purposes of this specification an implementation of XSD 1.0
regexps is equivalent to an implementation of XSD 1.1 regexps.</t>
      </section>
      <section anchor="toESreg">
        <name>ECMAScript Regexps</name>
        <t>Perform the following steps on an I-Regexp to obtain an ECMAScript
regexp <xref target="ECMA-262"/>:</t>
        <ul spacing="normal">
          <li>For any dots (<tt>.</tt>) outside character classes (first alternative
of <tt>charClass</tt> production): replace dot by <tt>[^\n\r]</tt>.</li>
          <li>Envelope the result in <tt>^</tt> and <tt>$</tt>.</li>
        </ul>
        <t>Note that where a regexp literal is required,
the actual regexp needs to be enclosed in <tt>/</tt>.</t>
      </section>
      <section anchor="pcre-re2-ruby-regexps">
        <name>PCRE, RE2, Ruby Regexps</name>
        <t>Perform the same steps as in <xref target="toESreg"/> to obtain a valid regexp in
PCRE <xref target="PCRE2"/>, the Go programming language <xref target="RE2"/>, and the Ruby
programming language, except that the last step is:</t>
        <ul spacing="normal">
          <li>Enclose the regexp in <tt>\A</tt> and <tt>\z</tt>.</li>
        </ul>
      </section>
    </section>
    <section anchor="background">
      <name>Motivation and Background</name>
      <t>While regular expressions originally were intended to describe a
formal language to support a Boolean matching function, they
have been enhanced with parsing functions that support the extraction
and replacement of arbitrary portions of the matched text. With this
accretion of features, parsing regexp libraries have become
more susceptible to bugs and surprising performance degradations which
can be exploited in Denial of Service attacks by
an attacker who controls the regexp submitted for
processing. I-Regexp is designed to offer interoperability, and to be
less vulnerable to such attacks, with the trade-off that its only
function is to offer a boolean response as to whether a character
sequence is matched by a regexp.</t>
      <section anchor="subsetting">
        <name>Implementing I-Regexp</name>
        <t>XSD regexps are relatively easy to implement or map to widely
implemented parsing regexp dialects, with these notable
exceptions:</t>
        <ul spacing="normal">
          <li>Character class subtraction.  This is a very useful feature in many
specifications, but it is unfortunately mostly absent from parsing
regexp dialects. Thus, it is omitted from I-Regexp.</li>
          <li>Multi-character escapes.  <tt>\d</tt>, <tt>\w</tt>, <tt>\s</tt> and their uppercase
complement classes exhibit a
large amount of variation between regexp flavors.  Thus, they are
omitted from I-Regexp.</li>
          <li>Not all regexp implementations
support accesses to Unicode tables that enable
executing on constructs such as <tt>\p{IsCoptic}</tt>,
although the <tt>\p</tt>/<tt>\P</tt> feature in general is now quite
widely available. While in principle it's possible to
translate these into codepoint-range matches, this also requires
access to those tables. Thus, regexp libraries in severely
constrained environments may not be able to support I-Regexp
conformance.</li>
        </ul>
      </section>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document makes no requests of IANA.</t>
    </section>
    <section anchor="security-considerations">
      <name>Security considerations</name>
      <t>As discussed in <xref target="background"/>, more complex regexp libraries may
contain exploitable bugs leading to crashes and remote code
execution.  There is also the problem that such libraries often have
hard-to-predict performance characteristics, leading to attacks
that overload an implementation by matching against an expensive
attacker-controlled regexp.</t>
      <t>I-Regexps have been designed to allow implementation in a way that is
resilient to both threats; this objective needs to be addressed
throughout the implementation effort.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="XSD-2" target="https://www.w3.org/TR/2004/REC-xmlschema-2-20041028">
          <front>
            <title>XML Schema Part 2: Datatypes Second Edition</title>
            <author fullname="Paul V. Biron" initials="P." surname="Biron">
              <organization/>
            </author>
            <author fullname="Ashok Malhotra" initials="A." surname="Malhotra">
              <organization/>
            </author>
            <date day="28" month="October" year="2004"/>
          </front>
          <seriesInfo name="World Wide Web Consortium Recommendation" value="REC-xmlschema-2-20041028"/>
        </reference>
        <reference anchor="XSD11-2" target="https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405">
          <front>
            <title>W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes</title>
            <author fullname="David Peterson" initials="D." surname="Peterson">
              <organization/>
            </author>
            <author fullname="Sandy Gao" initials="S." surname="Gao">
              <organization/>
            </author>
            <author fullname="Ashok Malhotra" initials="A." surname="Malhotra">
              <organization/>
            </author>
            <author fullname="Michael Sperberg-McQueen" initials="M." surname="Sperberg-McQueen">
              <organization/>
            </author>
            <author fullname="Henry Thompson" initials="H." surname="Thompson">
              <organization/>
            </author>
            <author fullname="Paul V. Biron" initials="P." surname="Biron">
              <organization/>
            </author>
            <date day="5" month="April" year="2012"/>
          </front>
          <seriesInfo name="World Wide Web Consortium Recommendation" value="REC-xmlschema11-2-20120405"/>
        </reference>
        <reference anchor="RFC5234" target="https://www.rfc-editor.org/info/rfc5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author fullname="D. Crocker" initials="D." role="editor" surname="Crocker">
              <organization/>
            </author>
            <author fullname="P. Overell" initials="P." surname="Overell">
              <organization/>
            </author>
            <date month="January" year="2008"/>
            <abstract>
              <t>Internet technical specifications often need to define a formal syntax.  Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications.  The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power.  The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges.  This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications.  [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
          <seriesInfo name="DOI" value="10.17487/RFC5234"/>
        </reference>
        <reference anchor="RFC7405" target="https://www.rfc-editor.org/info/rfc7405">
          <front>
            <title>Case-Sensitive String Support in ABNF</title>
            <author fullname="P. Kyzivat" initials="P." surname="Kyzivat">
              <organization/>
            </author>
            <date month="December" year="2014"/>
            <abstract>
              <t>This document extends the base definition of ABNF (Augmented Backus-Naur Form) to include a way to specify US-ASCII string literals that are matched in a case-sensitive manner.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7405"/>
          <seriesInfo name="DOI" value="10.17487/RFC7405"/>
        </reference>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner">
              <organization/>
            </author>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba">
              <organization/>
            </author>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol  specifications.  This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the  defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="RE2" target="https://github.com/google/re2">
          <front>
            <title>RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.</title>
            <author>
              <organization/>
            </author>
            <date>n.d.</date>
          </front>
        </reference>
        <reference anchor="PCRE2" target="http://pcre.org/current/doc/html/">
          <front>
            <title>Perl-compatible Regular Expressions (revised API: PCRE2)</title>
            <author>
              <organization/>
            </author>
            <date>n.d.</date>
          </front>
        </reference>
        <reference anchor="ECMA-262" target="https://www.ecma-international.org/wp-content/uploads/ECMA-262.pdf">
          <front>
            <title>ECMAScript 2020 Language Specification</title>
            <author>
              <organization>Ecma International</organization>
            </author>
            <date year="2020" month="June"/>
          </front>
          <seriesInfo name="ECMA" value="Standard ECMA-262, 11th Edition"/>
        </reference>
        <reference anchor="RFC7493" target="https://www.rfc-editor.org/info/rfc7493">
          <front>
            <title>The I-JSON Message Format</title>
            <author fullname="T. Bray" initials="T." role="editor" surname="Bray">
              <organization/>
            </author>
            <date month="March" year="2015"/>
            <abstract>
              <t>I-JSON (short for "Internet JSON") is a restricted profile of JSON designed to maximize interoperability and increase confidence that software can process it successfully with predictable results.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7493"/>
          <seriesInfo name="DOI" value="10.17487/RFC7493"/>
        </reference>
      </references>
    </references>
    <section anchor="rfcs">
      <name>Regexps and Similar Constructs in Recent Published RFCs</name>
      <t>This appendix contains a number of regular expressions that have been
extracted from some recently published RFCs based on some ad-hoc matching.
Multi-line constructions were not included.
With the exception of some (often surprisingly dubious) usage of multi-character
escapes, all regular expressions validate against the ABNF in <xref target="iregexp-abnf"/>.</t>
      <figure anchor="iregexp-examples">
        <name>Example regular expressions extracted from RFCs</name>
        <artwork><![CDATA[
rfc6021.txt  459 (([0-1](\.[1-3]?[0-9]))|(2\.(0|([1-9]\d*))))
rfc6021.txt  513 \d*(\.\d*){1,127}
rfc6021.txt  529 \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?
rfc6021.txt  631 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6021.txt  647 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}
rfc6021.txt  933 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6021.txt  938 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6021.txt 1026 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6021.txt 1031 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6020.txt 6647 [0-9a-fA-F]*
rfc6095.txt 2544 \S(.*\S)?
rfc6110.txt 1583 [aeiouy]*
rfc6110.txt 3222 [A-Z][a-z]*
rfc6536.txt 1583 \*
rfc6536.txt 1632 [^\*].*
rfc6643.txt  524 \p{IsBasicLatin}{0,255}
rfc6728.txt 3480 \S+
rfc6728.txt 3500 \S(.*\S)?
rfc6991.txt  477 (([0-1](\.[1-3]?[0-9]))|(2\.(0|([1-9]\d*))))
rfc6991.txt  525 \d*(\.\d*){1,127}
rfc6991.txt  541 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc6991.txt  542 .|..|[^xX].*|.[^mM].*|..[^lL].*
rfc6991.txt  571 \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?
rfc6991.txt  665 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6991.txt  693 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}
rfc6991.txt  725 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6991.txt  743 [0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-
rfc6991.txt 1041 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6991.txt 1046 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6991.txt 1099 [0-9\.]*
rfc6991.txt 1109 [0-9a-fA-F:\.]*
rfc6991.txt 1164 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6991.txt 1169 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc7407.txt  933 ([0-9a-fA-F]){2}(:([0-9a-fA-F]){2}){0,254}
rfc7407.txt 1494 ([0-9a-fA-F]){2}(:([0-9a-fA-F]){2}){4,31}
rfc7758.txt  703 \d{2}:\d{2}:\d{2}(\.\d+)?
rfc7758.txt 1358 \d{2}:\d{2}:\d{2}(\.\d+)?
rfc7895.txt  349 \d{4}-\d{2}-\d{2}
rfc7950.txt 8323 [0-9a-fA-F]*
rfc7950.txt 8355 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc7950.txt 8356 [xX][mM][lL].*
rfc8040.txt 4713 \d{4}-\d{2}-\d{2}
rfc8049.txt 6704 [A-Z]{2}
rfc8194.txt  629 \*
rfc8194.txt  637 [0-9]{8}\.[0-9]{6}
rfc8194.txt  905 Z|[\+\-]\d{2}:\d{2}
rfc8194.txt  963 (2((2[4-9])|(3[0-9]))\.).*
rfc8194.txt  974 (([fF]{2}[0-9a-fA-F]{2}):).*
rfc8299.txt 7986 [A-Z]{2}
rfc8341.txt 1878 \*
rfc8341.txt 1927 [^\*].*
rfc8407.txt 1723 [0-9\.]*
rfc8407.txt 1749 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc8407.txt 1750 .|..|[^xX].*|.[^mM].*|..[^lL].*
rfc8525.txt  550 \d{4}-\d{2}-\d{2}
rfc8776.txt  838 /?([a-zA-Z0-9\-_.]+)(/[a-zA-Z0-9\-_.]+)*
rfc8776.txt  874 ([a-zA-Z0-9\-_.]+:)*
rfc8819.txt  311 [\S ]+
rfc8944.txt  596 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){7}
]]></artwork>
      </figure>
      <t>The multi-character escapes (MCE) or the character classes built
around them used here can be substituted as shown in <xref target="tbl-sub"/>.</t>
      <table anchor="tbl-sub">
        <name>Substitutes for multi-character escapes in examples</name>
        <thead>
          <tr>
            <th align="left">MCE/class</th>
            <th align="left">Substitute class</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">
              <tt>\S</tt></td>
            <td align="left">
              <tt>[^ \t\n\r]</tt></td>
          </tr>
          <tr>
            <td align="left">
              <tt>[\S ]</tt></td>
            <td align="left">
              <tt>[^\t\n\r]</tt></td>
          </tr>
          <tr>
            <td align="left">
              <tt>\d</tt></td>
            <td align="left">
              <tt>[0-9]</tt></td>
          </tr>
        </tbody>
      </table>
      <t>Note that the semantics of <tt>\d</tt> in XSD regular expressions is that of
<tt>\p{Nd}</tt>; however, this would include all Unicode characters that are
digits in various writing systems and certainly is not actually meant
in the RFCs listed.</t>
    </section>
    <section numbered="false" anchor="acknowledgements">
      <name>Acknowledgements</name>
      <t>This draft has been motivated by the discussion in the IETF JSONPATH
WG about whether to include a regexp mechanism into the JSONPath query
expression specification, as well as by previous discussions about the
YANG <tt>pattern</tt> and CDDL <tt>.regexp</tt> features.</t>
      <t>The basic approach for this draft was inspired by <xref target="RFC7493">The
I-JSON Message Format</xref>.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA6Vb7XbbNpr+j6vAKjunkiPK+palrqdVFKX1ntjxROnptJYz
gkhIYkORGoK0rdru2dvYf/tjr2T3TvZK9nkBkKJkN21nfHIkEXgBvN9fYBzH
YTcD3mIs8ZNADvifGednznu5lHebAR+G/CxMZBxtZCzmgeRmgr+J4rVImJjP
Y4nl2QLmRW4o1tjGi8UiceYEFobOTyoKNyJZOX6s4ZxAJFIlzMPXgDfrzaZT
bzvNDmOf5PY2ir2BOTaUifOadmKuSAZcJR5zo1DJUKVqwJM4lUyl87WvlB+F
yXaDzc7GH94wdiPDVA5Aylr4wYD/++TdBZ3/tS+TRS2Kl5hZ+skqnQ+4K+bR
scWLMZEmqyimlQ43lIxErBIZ8leGFsxwjh0G/LvQv5Gx8pP//e+Ev4rlGkAf
fjzTACqJpQTGl5FKFsJd8Var3m7X9ZzrJ9uBXWAGIg/nvHaaJ61O346kYRID
6htJh2714GYVhYB72e477WbDaTZOnG6r32zoSWkIJWK+Tn72LY0ZDR/8NQ4U
2x3yH+RdkooAqBSXJ3MAfZ3kczU3WjMWamGDWGLLXydAdMC/b41q78cj524d
KHeF5U7Tadbr7Ua9eWKgGo1n4WgYkI1mvV3vAPL9m1Gn2WoPuJiHC/Pcw4x5
dlzFmB8uigi8HzcHGueCwtIfxrmvuOALoZIqV2IhqzxZxVJ4ziL2ZegFWy4C
0iq9F08iPhfupyTGhx8uOVQgDUTMoQex1BrFZbj0Q6l44H8C/CpSkqdKetwP
+eXo/bjKL2UcVLkIPX65xXxY42eJwWL08iWWgZ/xtmbQFfGSVKK0SpKNGhwf
GwUkFh8vo2gZyONYNkuApa2fp5GOc7AAuuxba9Qoj3OUFS/DIn1Ccnh5NjB7
VZ4igPM3bixJT47dNI5lmBzDeI9XyTo4JiTGo/Oh0+xmeNi1Ge63t7c16ULq
fmgZGoUi0LvdboAhRrFhugki4anjbK/axlsUyaLxiRv7m4R8QJ2/FeEyFUvJ
Jxvp+gvf1fvqFTu7JK3WKjzG+dZN2PP1bO5T6k69a4xRQvyK1Ghg+UgHD/gk
geBE7OW0Vnmjkaz42PP1ucxxHKihIg1JGPuwgmDBoxR2m3BlUIRyZM6vSroX
iJso5tHiGW1S0CBB2sECf+0nRo2UC9fKb6EKmJV8GYmAVvu50yVMuHDjSClO
roB7/mIhSV7MHuEUFNZoHLCqGezXvucFkjGwKY681NW72b/7Fz6NPrLTwh+z
dKqiBLgnFcQ0B7EiLOBGKviM1RgmVHPG1A55BxtSmlwTP3xzSMn44BKHufNb
YpWIyd7gioItAkAQgGXPsRVs3MTRMhbrNey4xkrZwSUyRW2wgkwyhFflaylC
snbB3ZUgwcqYvDUN3a58eGooL/kbRf6BcIzl31NfO+tEwRdh7JA9X2q4TZDG
EB4m8/NVCaRnDyAfVIdRQsje+B6o925E6BqiCGAhRZKCKq5S4AGUXbGhAb6M
o3SjqjyIok9iBYdWhQVo5xVLrQsuCRyeR6WbTRQnikchOTvErCgAvVCcxF0R
idhRzH1y7VXu12QNHhJx2BAvQUWMRUs4xzDDSa8kudvhjS9dSRpKQaJIXX40
8QK8AsuwB7QkiegnViBeUqTb8V0V12uniWCuZELACCAWBcXv73XQeXx8okh+
6AYplJPHaYBPUhyID1FZk5SLQU9AD4ydKQkASOpWBoHzKYxuc2KL1vPiBWJk
DIWKgmi5pYMlR3LCKTuBiM+/m3woVc03v3inf78f/+W7s/fj1/R78u3w7dv8
B7MQk2/ffff29e7XbuXo3fn5+OK1WYxRvjfESufDH0om0JTeXX44e3cxfFvi
mTbm/CCDobAmjZHCRBKt/CwzYO1zXo0u/+e/Gm3w9V8QbZuNRv/x0T6cNHpt
PEAZQnOaViTzCMFumdhsJMwPu8AeSZ38RARQTWirWhEroUMS7Du6Is5cD/i/
zd1No/1nO0AE7w1mPNsb1Dx7OvJksWHiM0PPHJNzc2/8gNP7+A5/2HvO+F4Y
NGqhPQ88AVyUVcTfLRg+fHXxpsrwY09C9/cOpT+QBMnAPiEZ0ibwAmG/4JLY
Ts0hgTTw+AqLAqkN8Qa5EGz4pyiGyZNdbcj0NKauIC98S/LiguUeIs7NMZTS
M+gICqqCr2G9wUFkiMhj/D2V8ZYFWfTe+WZgO+RzWNQCph3fSINCOg98tcLW
UDjFt74M6Bzja/XhSG8oqhMf2RAKF3r+HR/ysjbfMOI0vUbSgBBByVAAZiJL
U5Ua+54gXNi/ANryzpWbRIcIYrCv3FQpc1IsqwxfcAmWZ5CMG8EvYHq+3XmO
Kp9H5DK2YWL5Rn4VMiFUaAc/1hINiQKF9BaOz9XuI9+DT2jxHcKtJxfhIzgS
7ua0TdiIkwUcUokDLoMP9/dZ7WQ0A2f88ssvJmv2Hcu4UzAbIWXFj8q89FDK
nirM/jjlR9qBM+PGT8HBaM2vIEBCHNlMzK9Z4eGUl/mf7ppIjl7xL/kXR184
X7z8gvFjXvqqxCv4xjH3JbscClZ6xDDLH0/5X+jn+A68wymlagmfhaFrnFZ4
POWNoz/dtepOq0+n1XFa/wuUZITjKb+gIiAYIXjgXIoho0AgJ9I4lOELMx6U
KoRDAdoQUa87zR6gQc7Iab6mA6o4wNHkYPCN0xrT4DGG/mwH23WnM6TBrzH4
ox3sjJ2eHvyIwZ/tYG/sNOpv8Iejd7id8lKthPkJDCuQhMxYuUXsDx9hOGwf
GFtMS0YKJ1YK5VwKGERc1GgTJbVcNBrPV05HT11h6qOBV6WwZL5j+50Y4N4r
p6d5cg/gRwAXyDBoQBMttmRx+Mn20CZEr0i8pY8lyBUycWjn0WjcgKIc6e8r
PXjNS9clpgdIMqMR7WPn7MO11q0iAszO7GQ5MvSPrYRq+xLKhHHIfGbJOCXa
pxvoLm17iXSWdJdltNn5yyfz2cMpP1MjFBrLKN7ijDP1KojcT6wweMrfSvJi
pKHnIv5E3xfpem5GLtPQRY2tjRsoyg1SoiSiKV2hTLbreRQQ4DtyVople2m8
3hKfiQ/dkdPVUgtA/ToTcWRF2nZ6HZpMMJmSSGFtBhO9y7ndRZVcqwuylOsI
wWbYauiLHNqzUEEpP46gixTpFZc5li2n2yVEXCCyyLD0S0VsVUmZbQqs0Lv8
mJ+bnbc5WGBZpaEnT2j6ZL/X+9gatppFoyeLFoaD3bHTfUOIh0A8shy0ojZL
z4BE4yhT9qe+y3iRhtXRYUFHuw3rRYT1IhVy5+x+wF8U/byplE9Lh8EE0Y3C
RInCiS7KhGeKVkpBJNUzutSr8tmelc50XEcJgjgW3VIojEzk57Orj9ezqi2B
hOsiy6VsQIckrDEpDsoyHSwhIkROqqnYJlK+7qbsqilX+z7qASAI600Q1JS2
wgLU7OMsS+fxD9GbagYdXnepv46yUZocbk5NP12U69oY0XidBonv7KCQS4lN
oZCaTdWsymbTycxktbPpLZ1ejMQ+5RGUURkV1oEZSYHIM4xDCKqBVoVi3tQu
KMNC4g/SAKpTq3yRIk/OSh9bJBHFWeJrisf9vVmWKnO9OfcTJYMFCaNHvQ9+
4sz9IlNw7o7W4WR0dkZUml2yI+mUJxUYlaJUom1pC1KqA0bLw1Qmy3CyLoHV
xv3aTbc58F2o28726FNWz/JcuFCqEdJM54R59QqFhnx1Epd1XWyWnO9fnkjI
KlARBu+UZ3MBypcrRMA5ypZiRUistL9eQ+mkS3l0+cMrIw3o9tosMEmwzdwX
aawrZGSJOQoHOoHi8JM+8YXWYksTKdoupzRo+tpqd0A7YqpcmTzQ4mBBfU/q
pIotyNGa3PoiSqThtuALectlHFOeTtVuvh2qAVjnXKJ2X/h3BcZRI5ZO06Jg
vjJpMdZaRtraOu+d2BYsz1sONUboN2oNwhBVD6QKJGHAOVN0rm3JbNTqxvb2
WaaYLfYxr6skarVijyRHwEpkni4Jf6XZQxRqs9F9RsVo300awxnB6K1hHqTQ
4omwdgezrNNAngil1Q1UAkUbcPjMqkbWnzDNgkIvM1Pl+xdJNJ4ACl76UsYm
xwfWi4i8L4lWJXJDzRo6p6ib0VwXMBjdbWuRhPCyduXj4wCFNt3FcN0YjGAj
5VltVuHwmcr3njhlMKe88GOVFPvgSDhA0y5OzMgp2FZhZUA9nECgVsDuVBkh
TkzDaXwN93nEx+GNDKh/aWo3slJSL7h242X/dbanpLbW3HVbEmk7ZrbH5umi
DOGHrh8yMLJAZU0QahdEtgE/O54ZzptO/PtxEx8pUMytrshzJdbSslsoYwOZ
cB6LHEfFHPh5Rw51F+0OYN1HJ3Ohzb6Jig1Hnhe+9/cWiKgnQMKHPQdatcWp
YQyBBlSpE4Lghxbr2NBarIuJ6unQMnf6M9F/HkGGIg+Er4T7ibqF1DV4Mc8f
9nu8p/sN3+9X/rN9XChm7C/9UAflWxJdXumCYVm/AjmAvp4JdmzAbBZ1nmlB
Zv7LNpR27kmGK9MO1WU+5RhFcNs9L4YzeZclAdoBWE3V/RZotIgRJOn6hdMC
Q8/COlbqaXqmhcm/N713RHKkPYhF1sizZmw1x+SwSZh5ViQIkq3Jb6pU6Y4D
9cVJYdOl0lJRcE2xrzfZGJ0kQsFC6IVXjIgoT0Kt53ebIMquCF7L0De3AhMZ
3/hYKJIEolWwR0apn36S1DOPdNoVUz5cUBt9RZrQbjia1NElAYfLGi92XyFQ
fxka4UZ0xVBs9NuWsdZrskSm/f1NGoT2GkDLnBIQg1p1d6UBGXjSwY72+sN2
p/M4RmfnRwo+3wX9DV31krlieteezn0aU/AaFIhoh0ym823uYIx3yDOPvfh/
/8KkKzQK71zsNgvduA60b4TiS6FMNMr2oQQMoVkjpeMdK8a7A2XxbHax4wco
QgJOPGO7/pQ2+NFBBl1IcmucZ5myoPxjS91sZJaZlpKa2HvivahH/auUeK4v
QqjLlKTw+kTXOlIJ9bLmlHjzRRytM9yxyQH2NZyeYi+zT5QpE60p3PEc8fPn
03BgP5t6SL0p7a7qZDzzkH7MYdAypmYk4zbX1lzOApa8W/mU7QpMB3QLycWa
bsfJHm5gh8b3zWVyK3e3FuYGSmm2EeLkaUiwFOt+FfsLUxflzvYgT+E7p+aS
AUmtl1lKrSVqfZQMtXg5cJduqhUPKNKLC0mMuFosSzb3Z2oUQQncRxQonOIy
Cp6lsRxMI8hNL2dFMS9lmEXNMLrliJsJHWVTL3Ej/IBOh1/Tbl3fjSGT8jf0
kPzff/yngj+E+RujxUroWKjofQyrnbD6SL+SsInw08HsMnOZmpOkg5Qf2phN
jDEMMcWiDlmaGZnWPPGbdO1JdzBkO9zyBcEXQpHhjR9HoW5s48ytrlUpxuQu
xkggf9+EZ81ToRNSdja8GPIRdgQ7Yiu452Le/jXSWnzSV3OaJlTPOlLQVtiR
TSBD3T13f2tbu/ew2GvWmUYhECM50KHCaPrdU+aAamZr5ywMaOp1MIFnzMpy
NxZKX8zpyLemJIuExqzOWZ+ho7YVmL6kjCNsts5iKdRwd3K0oNdcKKghKMee
k0QOUgHPd5O9qJWbt6+oEqwWkbLen5kKECKm+/9nkmh46TwfEEuQSimppleC
x0Agi2mOjWdBfktaK1547JKHYvTSzY3DI3VydwuVyi7hobqIajbP111+/bJI
or40Sh7Nf4Lvo4qnmIEKz6P0SHqgMSZLpfbE0wqeywU5W3sLT/JnLEOaJDZB
XU/Z1mjnFIDge+kSPpf7FyT3L+KFqz6Twv3Gn9V2kV2kWO3SVZ1u8X32dYWc
xcymW5nvVEh7sIwwhuM5uNSZC1J+8EFDCc9ZRW4u8hozgSKAze/cosmCSGHJ
6LPq017pmHTPhkvCV+9bNiq7S6+AiJfO/ShVFURIykUBetAcYjYqVTNn/4Ru
XQKQQ8xUM7+V0db83D0Mg4i69WajltwlnLc7fV4uX9WdxnV5WrtqOK3rr/DU
v65UHsrNaa1cfyhjtH899Y4q+Ntf3Wm0OCawkqbvG9VGs/d4ANLsA+S+/ejg
s2k/P+jPQeFTb/Gy8tX+4m6rwQm5vnAWQ+fNNQEO9p8rR08WtXv8N9bcdw6w
7LdaYMPgoQhXr7YfK4NK+XBsUMHX0w1OiI8fB9cvMd99zH7rT7CydjSt1YiD
D3vrGvVm9586uFEnFv2Rg+t6XfeAS0dmtt/Rs81Ou82nE1o8sextNMzCRuek
xa+EhOZu7apsqtVsNvnV0Pnx+ko4P9vJTqu7Wzc9GOu2sODj9Oi6Zia67Vam
NTifMo5XQvnuWziq8BHENzuW/F7zxBzZPqkD0Zf7g516/QD7fj/T917vj+t7
vrrT7PyKvu9A2g1O5IMNf7u2P7D/1PlbzXKkANrktYda7eHq491fwYKH2tXH
9bn+gV/B24wruwW9xh+2pXxxt9v53ba0W9Rv/V5bytf0mv/AQb32/kEnj07x
sf3kcW95o95u/ENmVNig+0fMaLeu39d4Tw+lC6PoFygaPAPQbf9zODe6/T+A
c69d7xWcXeGEipbQ4UhF21v7cW9to91v/6617WqrYZb2OidWxHUKFp/R1Ry0
0eqc/AboifVU8ADPxBcN0u8Yt3TSaraeuLrCbKfzeYMtgnb5FUz1ClZ6ldvn
Sb1t5ts9HQ6fwQUgfeN2e/W2cZHZTKPftqZGgfLoYKxlnDTZA7yV/tU9WNev
d/iPD1fTl1PnusCtA6AuRN4sl5tXbfJ1D+WWdXrTWqV2cGi/R3p5tdCWemC3
gwy62TcE9fon3X2CWm2rnie9k4ygfKzf7BUd/kmuVz0ro8xMCjMQ8GflUwDt
1H+PQz2BG7cOFQueF1ivZ0IUhH7Cj78qHxz8slI+fjJ0dLCS+HgINLBQYLfV
3wbixXTCr3UMO+m3rRQ6/e5vel7EnsObWHknKLtX2W3s2Dw/mzzuZ8mMsmG6
paX3xH7tlrJ8PhpXuL3SeNqyn6d+kDBhuroAWZtXWnVtZ/uF1MwCbql9qcy8
iWea3PPAwazOUx84Djo2zaUHPsnX2H7TA3twdn/F39kQdqA7VPP68APdBfBp
Yq4DzBABaL7PMoC9eQMw9Qo7kMHMeP73oPlukc7YvcPUvMn5a3zUFbORFLF8
d/GgLwCyq0t920E4+GF21fz0jWKVXWIy6tFceI+zLzmYSl0L2wMxN+HZHZUo
XPLu3m+1d3OxZJ6/9E2NRz0r1Cf8NvZ1Z0htVSLXpiq0L67RFZq9ptc3IdSt
k0DevH0sTYWFaiuh8ogNXXqHFQXy0r4Q+KT+ux+koan0pPeY9T7of9WguFOm
fF6bawTTQKUzbAfDVs40Qv+xRv9Hmsvhh2/Z999wMafaN2vMFi7s8huetQQv
Ql+tTVOJdtEbCFR05p3Bwlvje41L/UIpvaBL30BpQ6+JE9t2eCmLAHZlPwwv
vuEz+9qgaS6OXr9+y2c1g0neQVM1Y4pzyoCpJI4j+t855j4xZ8utviJSG9++
Dnh//5X+bynI28pYzc4cIoOfA3OqMc1/hqqQjf0/nsdDSE81AAA=

-->

</rfc>
