<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.6.7 (Ruby 3.1.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-ietf-jsonpath-iregexp-00" category="std" consensus="true" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.12.3 -->
  <front>
    <title abbrev="I-Regexp">I-Regexp: An Interoperable Regexp Format</title>
    <seriesInfo name="Internet-Draft" value="draft-ietf-jsonpath-iregexp-00"/>
    <author initials="C." surname="Bormann" fullname="Carsten Bormann">
      <organization>Universität Bremen TZI</organization>
      <address>
        <postal>
          <street>Postfach 330440</street>
          <city>Bremen</city>
          <code>D-28359</code>
          <country>Germany</country>
        </postal>
        <phone>+49-421-218-63921</phone>
        <email>cabo@tzi.org</email>
      </address>
    </author>
    <author initials="T." surname="Bray" fullname="Tim Bray">
      <organization>Textuality</organization>
      <address>
        <email>tbray@textuality.com</email>
      </address>
    </author>
    <date year="2022" month="April" day="28"/>
    <keyword>Internet-Draft</keyword>
    <abstract>
      <t>This document specifies I-Regexp, a flavor of regular expressions that is
limited in scope with the goal of interoperation across many different
regular-expression libraries.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-ietf-jsonpath-iregexp/"/>.
      </t>
      <t>
        Discussion of this document takes place on the
        JSONPath Working Group mailing list (<eref target="mailto:JSONPath@ietf.org"/>),
        which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/JSONPath/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://github.com/cabo/iregexp"/>.</t>
    </note>
  </front>
  <middle>
    <section anchor="intro">
      <name>Introduction</name>
      <t>This specification describes an interoperable regular expression flavor, I-Regexp.</t>
      <t>This document uses the abbreviation "regexp" for what are usually
called regular expressions in programming.
"I-Regexp" is used as a noun meaning a character string which conforms to the requirements
in this specification; the plural is "I-Regexps".</t>
      <t>I-Regexp does not provide advanced regexp features such as capture groups, lookahead, or backreferences.
It supports only a Boolean matching capability, i.e., testing whether a given regexp matches a given piece of text.</t>
      <t>I-Regexp supports the entire repertoire of Unicode characters.</t>
      <t>I-Regexp is a subset of XSD regexps <xref target="XSD-2"/>.</t>
      <t>This document includes rules for converting I-Regexps for use with several well-known regexp libraries.</t>
      <section anchor="terminology">
        <name>Terminology</name>
        <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
        <t>The grammatical rules in this document are to be interpreted as ABNF,
as described in <xref target="RFC5234"/> and <xref target="RFC7405"/>.</t>
      </section>
    </section>
    <section anchor="requirements">
      <name>Requirements</name>
      <t>I-Regexps should handle the vast majority of practical cases where a
matching regexp is needed in a data model specification or a query
language expression.</t>
      <t>A brief survey of published RFCs yielded the regexp patterns in
Appendix A (with no attempt at completeness).
With certain exceptions as discussed there,
these should be covered by I-Regexps, both syntactically and with
their intended semantics.</t>
    </section>
    <section anchor="defn">
      <name>I-Regexp Syntax</name>
      <t>An I-Regexp <bcp14>MUST</bcp14> conform to the ABNF specification in
<xref target="iregexp-abnf"/>.</t>
      <figure anchor="iregexp-abnf">
        <name>I-Regexp Syntax in ABNF</name>
        <sourcecode type="abnf"><![CDATA[
i-regexp = branch *( "|" branch )
branch = *piece
piece = atom [ quantifier ]
quantifier = ( %x2A-2B ; '*'-'+'
 / "?" ) / ( "{" quantity "}" )
quantity = QuantExact [ "," [ QuantExact ] ]
QuantExact = 1*%x30-39 ; '0'-'9'

atom = NormalChar / charClass / ( "(" i-regexp ")" )
NormalChar = ( %x00-27 / %x2C-2D ; ','-'-'
 / %x2F-3E ; '/'-'>'
 / %x40-5A ; '@'-'Z'
 / %x5E-7A ; '^'-'z'
 / %x7E-10FFFF )
charClass = "." / SingleCharEsc / charClassEsc / charClassExpr
SingleCharEsc = "\" ( %x28-2B ; '('-'+'
 / %x2D-2E ; '-'-'.'
 / "?" / %x5B-5E ; '['-'^'
 / %s"n" / %s"r" / %s"t" / %x7B-7D ; '{'-'}'
 )
charClassEsc = catEsc / complEsc
charClassExpr = "[" [ "^" ] ( "-" / CCE1 ) *CCE1 [ "-" ] "]"
CCE1 = ( CCchar [ "-" CCchar ] ) / charClassEsc
CCchar = ( %x00-2C / %x2E-5A ; '.'-'Z'
 / %x5E-10FFFF ) / SingleCharEsc
catEsc = %s"\p{" charProp "}"
complEsc = %s"\P{" charProp "}"
charProp = IsCategory / IsBlock
IsCategory = Letters / Marks / Numbers / Punctuation / Separators /
    Symbols / Others
Letters = %s"L" [ ( %x6C-6D ; 'l'-'m'
 / %s"o" / %x74-75 ; 't'-'u'
 ) ]
Marks = %s"M" [ ( %s"c" / %s"e" / %s"n" ) ]
Numbers = %s"N" [ ( %s"d" / %s"l" / %s"o" ) ]
Punctuation = %s"P" [ ( %x63-66 ; 'c'-'f'
 / %s"i" / %s"o" / %s"s" ) ]
Separators = %s"Z" [ ( %s"l" / %s"p" / %s"s" ) ]
Symbols = %s"S" [ ( %s"c" / %s"k" / %s"m" / %s"o" ) ]
Others = %s"C" [ ( %s"c" / %s"f" / %x6E-6F ; 'n'-'o'
 ) ]
IsBlock = %s"Is" 1*( "-" / %x30-39 ; '0'-'9'
 / %x41-5A ; 'A'-'Z'
 / %x61-7A ; 'a'-'z'
 )
]]></sourcecode>
      </figure>
      <t>As an additional restriction, <tt>charClassExpr</tt> is not allowed to
match <tt>[^]</tt>, which according to this grammar would parse as a
positive character class containing the single character <tt>^</tt>.</t>
      <t>This is essentially XSD regexp without character class
subtraction and multi-character escapes such as <tt>\s</tt>,
<tt>\S</tt>, and <tt>\w</tt>.</t>
      <t>An I-Regexp implementation <bcp14>MUST</bcp14> be a complete implementation of this
limited subset.
In particular, full Unicode support is <bcp14>REQUIRED</bcp14>; the implementation
<bcp14>MUST NOT</bcp14> limit itself to 7- or 8-bit character sets such as ASCII and
<bcp14>MUST</bcp14> support the Unicode character property set in character classes.</t>
    </section>
    <section anchor="i-regexp-semantics">
      <name>I-Regexp Semantics</name>
      <t>This syntax is a subset of that of <xref target="XSD-2"/>.
Implementations which interpret I-Regexps <bcp14>MUST</bcp14>
yield Boolean results as specified in <xref target="XSD-2"/>.
(See also <xref target="xsd-regexps"/>.)</t>
    </section>
    <section anchor="mapping-i-regexp-to-regexp-dialects">
      <name>Mapping I-Regexp to Regexp Dialects</name>
      <t>(TBD; these mappings need to be further verified in implementation work.)</t>
      <section anchor="xsd-regexps">
        <name>XSD Regexps</name>
        <t>Any I-Regexp also is an XSD Regexp <xref target="XSD-2"/>, so the mapping is an identity
function.</t>
        <t>Note that a few errata for <xref target="XSD-2"/> have been fixed in <xref target="XSD11-2"/>, which
is therefore also included as a normative reference.
XSD 1.1 is less widely implemented than XSD 1.0, and implementations
of XSD 1.0 are likely to include these bugfixes, so for the intents
and purposes of this specification an implementation of XSD 1.0
regexps is equivalent to an implementation of XSD 1.1 regexps.</t>
      </section>
      <section anchor="toESreg">
        <name>ECMAScript Regexps</name>
        <t>Perform the following steps on an I-Regexp to obtain an ECMAScript
regexp <xref target="ECMA-262"/>:</t>
        <ul spacing="normal">
          <li>For any dots (<tt>.</tt>) outside character classes (first alternative
of <tt>charClass</tt> production): replace dot by <tt>[^\n\r]</tt>.</li>
          <li>Envelope the result in <tt>^</tt> and <tt>$</tt>.</li>
        </ul>
        <t>Note that where a regexp literal is required,
the actual regexp needs to be enclosed in <tt>/</tt>.</t>
      </section>
      <section anchor="pcre-re2-ruby-regexps">
        <name>PCRE, RE2, Ruby Regexps</name>
        <t>Perform the same steps as in <xref target="toESreg"/> to obtain a valid regexp in
PCRE <xref target="PCRE2"/>, the Go programming language <xref target="RE2"/>, and the Ruby
programming language, except that the last step is:</t>
        <ul spacing="normal">
          <li>Enclose the regexp in <tt>\A</tt> and <tt>\z</tt>.</li>
        </ul>
      </section>
    </section>
    <section anchor="background">
      <name>Motivation and Background</name>
      <t>While regular expressions originally were intended to describe a
formal language to support a Boolean matching function, they
have been enhanced with parsing functions that support the extraction
and replacement of arbitrary portions of the matched text. With this
accretion of features, parsing regexp libraries have become
more susceptible to bugs and surprising performance degradations which
can be exploited in Denial of Service attacks by
an attacker who controls the regexp submitted for
processing. I-Regexp is designed to offer interoperability, and to be
less vulnerable to such attacks, with the trade-off that its only
function is to offer a boolean response as to whether a character
sequence is matched by a regexp.</t>
      <section anchor="subsetting">
        <name>Implementing I-Regexp</name>
        <t>XSD regexps are relatively easy to implement or map to widely
implemented parsing regexp dialects, with these notable
exceptions:</t>
        <ul spacing="normal">
          <li>Character class subtraction.  This is a very useful feature in many
specifications, but it is unfortunately mostly absent from parsing
regexp dialects. Thus, it is omitted from I-Regexp.</li>
          <li>Multi-character escapes.  <tt>\d</tt>, <tt>\w</tt>, <tt>\s</tt> and their uppercase
complement classes exhibit a
large amount of variation between regexp flavors.  Thus, they are
omitted from I-Regexp.</li>
          <li>Not all regexp implementations
support accesses to Unicode tables that enable
executing on constructs such as <tt>\p{IsCoptic}</tt>,
although the <tt>\p</tt>/<tt>\P</tt> feature in general is now quite
widely available. While in principle it's possible to
translate these into codepoint-range matches, this also requires
access to those tables. Thus, regexp libraries in severely
constrained environments may not be able to support I-Regexp
conformance.</li>
        </ul>
      </section>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document makes no requests of IANA.</t>
    </section>
    <section anchor="security-considerations">
      <name>Security considerations</name>
      <t>As discussed in <xref target="background"/>, more complex regexp libraries may
contain exploitable bugs leading to crashes and remote code
execution.  There is also the problem that such libraries often have
hard-to-predict performance characteristics, leading to attacks
that overload an implementation by matching against an expensive
attacker-controlled regexp.</t>
      <t>I-Regexps have been designed to allow implementation in a way that is
resilient to both threats; this objective needs to be addressed
throughout the implementation effort.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="XSD-2" target="https://www.w3.org/TR/2004/REC-xmlschema-2-20041028">
          <front>
            <title>XML Schema Part 2: Datatypes Second Edition</title>
            <author fullname="Paul V. Biron" initials="P." surname="Biron">
              <organization/>
            </author>
            <author fullname="Ashok Malhotra" initials="A." surname="Malhotra">
              <organization/>
            </author>
            <date day="28" month="October" year="2004"/>
          </front>
          <seriesInfo name="World Wide Web Consortium Recommendation" value="REC-xmlschema-2-20041028"/>
        </reference>
        <reference anchor="XSD11-2" target="https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405">
          <front>
            <title>W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes</title>
            <author fullname="David Peterson" initials="D." surname="Peterson">
              <organization/>
            </author>
            <author fullname="Sandy Gao" initials="S." surname="Gao">
              <organization/>
            </author>
            <author fullname="Ashok Malhotra" initials="A." surname="Malhotra">
              <organization/>
            </author>
            <author fullname="Michael Sperberg-McQueen" initials="M." surname="Sperberg-McQueen">
              <organization/>
            </author>
            <author fullname="Henry Thompson" initials="H." surname="Thompson">
              <organization/>
            </author>
            <author fullname="Paul V. Biron" initials="P." surname="Biron">
              <organization/>
            </author>
            <date day="5" month="April" year="2012"/>
          </front>
          <seriesInfo name="World Wide Web Consortium Recommendation" value="REC-xmlschema11-2-20120405"/>
        </reference>
        <reference anchor="RFC5234" target="https://www.rfc-editor.org/info/rfc5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author fullname="D. Crocker" initials="D." role="editor" surname="Crocker">
              <organization/>
            </author>
            <author fullname="P. Overell" initials="P." surname="Overell">
              <organization/>
            </author>
            <date month="January" year="2008"/>
            <abstract>
              <t>Internet technical specifications often need to define a formal syntax.  Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications.  The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power.  The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges.  This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications.  [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
          <seriesInfo name="DOI" value="10.17487/RFC5234"/>
        </reference>
        <reference anchor="RFC7405" target="https://www.rfc-editor.org/info/rfc7405">
          <front>
            <title>Case-Sensitive String Support in ABNF</title>
            <author fullname="P. Kyzivat" initials="P." surname="Kyzivat">
              <organization/>
            </author>
            <date month="December" year="2014"/>
            <abstract>
              <t>This document extends the base definition of ABNF (Augmented Backus-Naur Form) to include a way to specify US-ASCII string literals that are matched in a case-sensitive manner.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7405"/>
          <seriesInfo name="DOI" value="10.17487/RFC7405"/>
        </reference>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner">
              <organization/>
            </author>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba">
              <organization/>
            </author>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol  specifications.  This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the  defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="RE2" target="https://github.com/google/re2">
          <front>
            <title>RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.</title>
            <author>
              <organization/>
            </author>
            <date>n.d.</date>
          </front>
        </reference>
        <reference anchor="PCRE2" target="http://pcre.org/current/doc/html/">
          <front>
            <title>Perl-compatible Regular Expressions (revised API: PCRE2)</title>
            <author>
              <organization/>
            </author>
            <date>n.d.</date>
          </front>
        </reference>
        <reference anchor="ECMA-262" target="https://www.ecma-international.org/wp-content/uploads/ECMA-262.pdf">
          <front>
            <title>ECMAScript 2020 Language Specification</title>
            <author>
              <organization>Ecma International</organization>
            </author>
            <date year="2020" month="June"/>
          </front>
          <seriesInfo name="ECMA" value="Standard ECMA-262, 11th Edition"/>
        </reference>
        <reference anchor="RFC7493" target="https://www.rfc-editor.org/info/rfc7493">
          <front>
            <title>The I-JSON Message Format</title>
            <author fullname="T. Bray" initials="T." role="editor" surname="Bray">
              <organization/>
            </author>
            <date month="March" year="2015"/>
            <abstract>
              <t>I-JSON (short for "Internet JSON") is a restricted profile of JSON designed to maximize interoperability and increase confidence that software can process it successfully with predictable results.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7493"/>
          <seriesInfo name="DOI" value="10.17487/RFC7493"/>
        </reference>
      </references>
    </references>
    <section anchor="rfcs">
      <name>Regexps and Similar Constructs in Recent Published RFCs</name>
      <t>This appendix contains a number of regular expressions that have been
extracted from some recently published RFCs based on some ad-hoc matching.
Multi-line constructions were not included.
With the exception of some (often surprisingly dubious) usage of multi-character
escapes, all regular expressions validate against the ABNF in <xref target="iregexp-abnf"/>.</t>
      <figure anchor="iregexp-examples">
        <name>Example regular expressions extracted from RFCs</name>
        <artwork><![CDATA[
rfc6021.txt  459 (([0-1](\.[1-3]?[0-9]))|(2\.(0|([1-9]\d*))))
rfc6021.txt  513 \d*(\.\d*){1,127}
rfc6021.txt  529 \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?
rfc6021.txt  631 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6021.txt  647 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}
rfc6021.txt  933 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6021.txt  938 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6021.txt 1026 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6021.txt 1031 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6020.txt 6647 [0-9a-fA-F]*
rfc6095.txt 2544 \S(.*\S)?
rfc6110.txt 1583 [aeiouy]*
rfc6110.txt 3222 [A-Z][a-z]*
rfc6536.txt 1583 \*
rfc6536.txt 1632 [^\*].*
rfc6643.txt  524 \p{IsBasicLatin}{0,255}
rfc6728.txt 3480 \S+
rfc6728.txt 3500 \S(.*\S)?
rfc6991.txt  477 (([0-1](\.[1-3]?[0-9]))|(2\.(0|([1-9]\d*))))
rfc6991.txt  525 \d*(\.\d*){1,127}
rfc6991.txt  541 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc6991.txt  542 .|..|[^xX].*|.[^mM].*|..[^lL].*
rfc6991.txt  571 \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?
rfc6991.txt  665 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6991.txt  693 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}
rfc6991.txt  725 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6991.txt  743 [0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-
rfc6991.txt 1041 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6991.txt 1046 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6991.txt 1099 [0-9\.]*
rfc6991.txt 1109 [0-9a-fA-F:\.]*
rfc6991.txt 1164 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6991.txt 1169 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc7407.txt  933 ([0-9a-fA-F]){2}(:([0-9a-fA-F]){2}){0,254}
rfc7407.txt 1494 ([0-9a-fA-F]){2}(:([0-9a-fA-F]){2}){4,31}
rfc7758.txt  703 \d{2}:\d{2}:\d{2}(\.\d+)?
rfc7758.txt 1358 \d{2}:\d{2}:\d{2}(\.\d+)?
rfc7895.txt  349 \d{4}-\d{2}-\d{2}
rfc7950.txt 8323 [0-9a-fA-F]*
rfc7950.txt 8355 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc7950.txt 8356 [xX][mM][lL].*
rfc8040.txt 4713 \d{4}-\d{2}-\d{2}
rfc8049.txt 6704 [A-Z]{2}
rfc8194.txt  629 \*
rfc8194.txt  637 [0-9]{8}\.[0-9]{6}
rfc8194.txt  905 Z|[\+\-]\d{2}:\d{2}
rfc8194.txt  963 (2((2[4-9])|(3[0-9]))\.).*
rfc8194.txt  974 (([fF]{2}[0-9a-fA-F]{2}):).*
rfc8299.txt 7986 [A-Z]{2}
rfc8341.txt 1878 \*
rfc8341.txt 1927 [^\*].*
rfc8407.txt 1723 [0-9\.]*
rfc8407.txt 1749 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc8407.txt 1750 .|..|[^xX].*|.[^mM].*|..[^lL].*
rfc8525.txt  550 \d{4}-\d{2}-\d{2}
rfc8776.txt  838 /?([a-zA-Z0-9\-_.]+)(/[a-zA-Z0-9\-_.]+)*
rfc8776.txt  874 ([a-zA-Z0-9\-_.]+:)*
rfc8819.txt  311 [\S ]+
rfc8944.txt  596 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){7}
]]></artwork>
      </figure>
      <t>The multi-character escapes (MCE) or the character classes built
around them used here can be substituted as shown in <xref target="tbl-sub"/>.</t>
      <table anchor="tbl-sub">
        <name>Substitutes for multi-character escapes in examples</name>
        <thead>
          <tr>
            <th align="left">MCE/class</th>
            <th align="left">Substitute class</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">
              <tt>\S</tt></td>
            <td align="left">
              <tt>[^ \t\n\r]</tt></td>
          </tr>
          <tr>
            <td align="left">
              <tt>[\S ]</tt></td>
            <td align="left">
              <tt>[^\t\n\r]</tt></td>
          </tr>
          <tr>
            <td align="left">
              <tt>\d</tt></td>
            <td align="left">
              <tt>[0-9]</tt></td>
          </tr>
        </tbody>
      </table>
      <t>Note that the semantics of <tt>\d</tt> in XSD regular expressions is that of
<tt>\p{Nd}</tt>; however, this would include all Unicode characters that are
digits in various writing systems and certainly is not actually meant
in the RFCs listed.</t>
    </section>
    <section numbered="false" anchor="acknowledgements">
      <name>Acknowledgements</name>
      <t>This draft has been motivated by the discussion in the IETF JSONPATH
WG about whether to include a regexp mechanism into the JSONPath query
expression specification, as well as by previous discussions about the
YANG <tt>pattern</tt> and CDDL <tt>.regexp</tt> features.</t>
      <t>The basic approach for this draft was inspired by <xref target="RFC7493">The
I-JSON Message Format</xref>.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA6Vb7XbbNpr+j6vAKjunkiPK+palrqdVFKX1ntjxROnptJYz
gkhIYkORGoK0rdru2dvYf/tjr2T3TvZK9nkBkKJkN21nfHIkEXgBvN9fYBzH
YTcD3mIs8ZNADvifGednznu5lHebAR+G/CxMZBxtZCzmgeRmgr+J4rVImJjP
Y4nl2QLmRW4o1tjGi8UicXyZLJyfVBRuRLJy/FgDOYFIpEqYh68Bb9abTafe
dponjH2S29so9gbmzFAmzmvahrkiGXCVeMyNQiVDlaoBT+JUMpXO175SfhQm
2w02Oxt/eMPYjQxTOQAda+EHA/7vk3cXlzj/a8KmFsVLzCz9ZJXOB9wV8+jY
4sWYSJNVFNNKhxsyRiJWiQz5K6I3DDHDOXYY8O9C/0bGyk/+978T/iqWawB9
+PFMA6gklhIYX0YqWQh3xVutertd13Oun2wHdoEZiDyc8xr0tzp9O5KGSQyo
byQdutWDm1UUAu5lu++0mw2n2Thxuq1+s6EnpSGUiPk6+dm3NGY0fPDXOFBs
d8h/kHdJKgKgUlyezAH0dZLP1dxozVioJQ1iiS1/nQDRAf++Naq9H4+cu3Wg
3BWWO02nWa+3G3VIUUM1Gs/C0TAgG816u94B5Ps3o06z1R5wMQ8X5rmHGfPs
uIoxP1wUEXg/bg40zgVtpT+Mc19xwRdCJVWuxEJWebKKpfCcRezL0Au2XASk
VXovnkR8LtxPSYwPP1xyqEAaiJhDD2KpNYrLcOmHUvHA/wT4VaQkT5X0uB/y
y9H7cZVfyjiochF6/HKL+bDGzxKDxejlSywDP+NtzaAr4iWpRGmVJBs1OD42
CkgsPl5G0TKQx7FslgBLWz9PIx3nYAFsybemqFEe5ygrXoY5+oTk8PJsYPaq
PEUA52/cWJKeHLtpHMswOYblHq+SdXBMSIxH50On2c3wsGsz3G9vb2vShdT9
0DI0CkWgd7vdAEOMYsN0E0TCU8fZXrWNtyiSReMTN/Y3CfmAOn8rwmUqlpJP
NtL1F76r99UrdnZJWq1VeIzzrZuw5+vZ3KfUnXrXGKOE+BWp0cDykQ4e8EkC
wYnYy2mt8kYjWfGx5+tzmeM4UENFGpIw9mEFwYJHKew24cqgCOXIPF+VdC8Q
N1HMo8Uz2qSgQYK0gwX+2k+MGikXfpXfQhUwK/kyEgGt9nOPS5hw4caRUpxc
Aff8xUKSvJg9wikorNE4YFUz2K99zwskY2BTHHmpq3ezf/cvfBp9ZKeFP2bp
VEUJcE8qiGkOYkVYwI1U8BmrMUyo5oypHfIONqQ0uSZ4+OaQkvHBJQ5z57fE
KhGTvcEVBVsEgCAAy55jK9i4iaNlLNZr2HGNlbKDS2SK2mAFmWQIr8rXUoRk
7YK7K0GClTF5axq6Xfnw1FBe8jeK/APhGMu/p7521omCL8LYIXu+1HCbII0h
PEzm56sSSM8eQD6oDqOEkL3xPVDv3YjQNUQRwEKKJAVVXKXAAyi7YkMDfBlH
6UZVeRBFn8QKDq0KC9DOK5ZaF1wSODyPSjebKE4Uj0JydohZUQB6oTiJuyIS
saOY++Taq9yvyRo8JOKwIV6CihiLlnCOYYaTXklyt8MbX7qSNJSCRJG6/Gji
BXgFlmEPaEkS0U+sQLykSLfjuyqu104TwVzJhIARQCwKit/f66Dz+PhEkfzQ
DVIoJ4/TAJ+kOBAforImKReDnoAeGDtTEgCQ1K0MAudTGN3mxBat58ULxMgY
ChUF0XJLB0uO5IRTdgIRn383+VCqmm9+8U7/fj/+y3dn78ev6ffk2+Hbt/kP
ZiEm37777u3r3a/dytG78/PxxWuzGKN8b4iVzoc/lEygKb27/HD27mL4tsQz
bcz5QQZDYU0aI4WJJFr5WWbA2ue8Gl3+z3812uDrvyDaNhuN/uOjfThp9Np4
gDKE5jStSOYRgt0ysdlImB92gT2SOvmJCKCa0Fa1IlZChyTYd3RFnLke8H+b
u5tG+892gAjeG8x4tjeoefZ05Mliw8Rnhp45Jufm3vgBp/fxHf6w95zxvTBo
1EJ7HngCuCiriL9bMHz46uJNleHHnoTu7x1KfyAJkoF9QjKkTeAFwn7BJbGd
mkMCaeDxFRYFUhviDXIh2PBPUQyTJ7vakOlpTF1BXviW5MUFyz1EnJtjKKVn
0BEUVAVfw3qDg8gQkcf4eyrjLQuy6L3zzcB2yOewqAVMO76RBoV0Hvhqha2h
cIpvfRnQOcbX6sOR3lBUJz6yIRQu9Pw7PuRlbb5hxGl6jaQBIYKSoQDMRJam
KjX2PUG4sH8BtOWdKzeJDhHEYF+5qVLmpFhWGb7gEizPIBk3gl/A9Hy78xxV
Po/IZWzDxPKN/CpkQqjQDn6sJRoSBQrpLRyfq91Hvgef0OI7hFtPLsJHcCTc
zWmbsBEnCzikEgdcBh/u77PayWgGzvjll19M1uw7lnGnYDZCyooflXnpoZQ9
VZj9ccqPtANnxo2fgoPRml9BgIQ4spmYX7PCwykv8z/dNZEcveJf8i+OvnC+
ePkF48e89FWJV/CNY+5LdjkUrPSIYZY/nvK/0M/xHXiHU0rVEj4LQ9c4rfB4
yhtHf7pr1Z1Wn06r47T+FyjJCMdTfkFFQDBC8MC5FENGgUBOpHEowxdmPChV
CIcCtCGiXneaPUCDnJHTfE0HVHGAo8nB4BunNabBYwz92Q62605nSINfY/BH
O9gZOz09+BGDP9vB3thp1N/gD0fvcDvlpVoJ8xMYViAJmbFyi9gfPsJw2D4w
tpiWjBROrBTKuRQwiLio0SZKarloNJ6vnI6eusLURwOvSmHJfMf2OzHAvVdO
T/PkHsCPAC6QYdCAJlpsyeLwk+2hTYhekXhLH0uQK2Ti0M6j0bgBRTnS31d6
8JqXrktMD5BkRiPax87Zh2utW0UEmJ3ZyXJk6B9bCdX2JZQJ45D5zJJxSrRP
N9Bd2vYS6SzpLstos/OXT+azh1N+pkYoNJZRvMUZZ+pVELmfWGHwlL+V5MVI
Q89F/Im+L9L13IxcpqGLGlsbN1CUG6RESURTukKZbNfzKCDAd+SsFMv20ni9
JT4TH7ojp6ulFoD6dSbiyIq07fQ6NJlgMiWRwtoMJnqXc7uLKrlWF2Qp1xGC
zbDV0Bc5tGehglJ+HEEXKdIrLnMsW063S4i4QGSRYemXitiqkjLbFFihd/kx
Pzc7b3OwwLJKQ0+e0PTJfq/3sTVsNYtGTxYtDAe7Y6f7hhAPgXhkOWhFbZae
AYnGUabsT32X8SINq6PDgo52G9aLCOtFKuTO2f2Avyj6eVMpn5YOgwmiG4WJ
EoUTXZQJzxStlIJIqmd0qVflsz0rnem4jhIEcSy6pVAYmcjPZ1cfr2dVWwIJ
10WWS9mADklYY1IclGU6WEJEiJxUU7FNpHzdTdlVU672fdQDQBDWmyCoKW2F
BajZx1mWzuMfojfVDDq87lJ/HWWjNDncnJp+uijXtTGi8ToNEt/ZQSGXEptC
ITWbqlmVzaaTmclqZ9NbOr0YiX3KIyijMiqsAzOSApFnGIcQVAOtCsW8qV1Q
hoXEH6QBVKdW+SJFnpyVPrZIIoqzxNcUj/t7syxV5npz7idKBgsSRo96H/zE
mftFpuDcHa3DyejsjKg0u2RH0ilPKjAqRalE29IWpFQHjJaHqUyW4WRdAquN
+7WbbnPgu1C3ne3Rp6ye5blwoVQjpJnOCfPqFQoN+eokLuu62Cw53788kZBV
oCIM3inP5gKUL1eIgHOULcWKkFhpf72G0kmX8ujyh1dGGtDttVlgkmCbuS/S
WFfIyBJzFA50AsXhJ33iC63FliZStF1OadD0tdXugHbEVLkyeaDFwYL6ntRJ
FVuQozW59UWUSMNtwRfylss4pjydqt18O1QDsM65RO2+8O8KjKNGLJ2mRcF8
ZdJirLWMtLV13juxLVietxxqjNBv1BqEIaoeSBVIwoBzpuhc25LZqNWN7e2z
TDFb7GNeV0nUasUeSY6Alcg8XRL+SrOHKNRmo/uMitG+mzSGM4LRW8M8SKHF
E2HtDmZZp4E8EUqrG6gEijbg8JlVjaw/YZoFhV5mpsr3L5JoPAEUvPSljE2O
D6wXEXlfEq1K5IaaNXROUTejuS5gMLrb1iIJ4WXtysfHAQptuojhujEYwUbK
s9qswuEzle89ccpgTnnhxyop9sGRcICmXZyYkVOwrcLKgHo4gUCtgN2pMkKc
mIbT+Bru84iPwxsZUP/S1G5kpaRecO3Gy/7rbE9Jba2567Yk0nbMbI/N00UZ
wg9dP2RgZIHKmiDULohsA352PDOcN5349+MmPlKgmFtdkedKrKVlt1DGBjLh
PBY5joo58POOHOou2h3Auo9O5kKbfRMVG448L3zv7y0QUU+AhA97DrRqi1PD
GAINqFInBMEPLdaxobVYFxPV06Fl7vRnov88ggxFHghfCfcTdQupa/Binj/s
93hP9xu+36/8Z/u4UMzYX/qhDsq3JLq80gXDsn4FcgB9PRPs2IDZLOo804LM
/JdtKO3ckwxXph2qy3zKMYrgtnteDGfyLksCtAOwmqr7LdBoESNI0vULpwWG
noV1rNTT9EwLk39veu+I5Eh7EIuskWfN2GqOyWGTMPOsSBAkW5PfVKnSHQfq
i5PCpkulpaLgmmJfb7IxOkmEgoXQC68YEVGehFrP7zZBlF0RvJahb24FJjK+
8bFQJAlEq2CPjFI//SSpZx7ptCumfLigNvqKNKHdcDSpo0sCDpc1Xuy+QqD+
MjTCjeiKodjoty1jrddkiUz7+5s0CO01gJY5JSAGteruSgMy8KSDHe31h+1O
53GMzs6PFHy+C/obuuolc8X0rj2d+zSm4DUoENEOmUzn29zBGO+QZx578f/+
hUlXaBTeudhtFrpxHWjfCMWXQplolO1DCRhCs0ZKxztWjHcHyuLZ7GLHD1CE
BJx4xnb9KW3wo4MMupDk1jjPMmVB+ceWutnILDMtJTWx98R7UY/6VynxXF+E
UJcpSeH1ia51pBLqZc0p8eaLOFpnuGOTA+xrOD3FXmafKFMmWlO44zni58+n
4cB+NvWQelPaXdXJeOYh/ZjDoGVMzUjGba6tuZwFLHm38inbFZgO6BaSizXd
jpM93MAOje+by+RW7m4tzA2U0mwjxMnTkGAp1v0q9hemLsqd7UGewndOzSUD
klovs5RaS9T6KBlq8XLgLt1UKx5QpBcXkhhxtViWbO7P1CiCEriPKFA4xWUU
PEtjOZhGkJtezopiXsowi5phdMsRNxM6yqZe4kb4AZ0Ov6bdur4bQyblb+gh
+b//+E8FfwjzN0aLldCxUNH7GFY7YfWRfiVhE+Gng9ll5jI1J0kHKT+0MZsY
YxhiikUdsjQzMq154jfp2pPuYMh2uOULgi+EIsMbP45C3djGmVtdq1KMyV2M
kUD+sgnPmqdCJ6TsbHgx5CPsCHbEVnDPxbz9a6S1+KSv5jRNqJ51pKCtsCOb
QIa6e+7+1rZ272Gx16wzjUIgRnKgQ4XR9LunzAHVzNbOWRjQ1OtgAs+YleVu
LJS+mNORb01JFgmNWZ2zPkNHbSswfUkZR9hsncVSqOHu5GhBr7lQUENQjj0n
iRykAp7vJntRKzdvX1ElWC0iZb0/MxUgREz3/88k0fDSeT4gliCVUlJNrwSP
gUAW0xwbz4L8lrRWvPDYJQ/F6KWbG4dH6uTuFiqVXcJDdRHVbJ6vu/z6ZZFE
fWmUPJr/BN9HFU8xAxWeR+mR9EBjTJZK7YmnFTyXC3K29hae5M9YhjRJbIK6
nrKt0c4pAMH30iV8LvcvSO5fxAtXfSaF+40/q+0iu0ix2qWrOt3i++zrCjmL
mU23Mt+pkPZgGWEMx3NwqTMXpPzgg4YSnrOK3FzkNWYCRQCb37lFkwWRwpLR
Z9WnvdIx6Z4Nl4Sv3rdsVHaXXgERL537UaoqiJCUiwL0oDnEbFSqZs7+Cd26
BCCHmKlmfiujrfm5exgGEXXrzUYtuUs4b3f6vFy+qjuN6/K0dtVwWtdf4al/
Xak8lJvTWrn+UMZo/3rqHVXwt7+602hxTGAlTd83qo1m7/EApNkHyH370cFn
035+0J+Dwqfe4mXlq/3F3VaDE3J94SyGzptrAhzsP1eOnixq9/hvrLnvHGDZ
b7XAhsFDEa5ebT9WBpXy4diggq+nG5wQHz8Orl9ivvuY/dafYGXtaFqrEQcf
9tY16s3uP3Vwo04s+iMH1/W67gGXjsxsv6Nnm512m08ntHhi2dtomIWNzkmL
XwkJzd3aVdlUq9ls8quh8+P1lXB+tpOdVne3bnow1m1hwcfp0XXNTHTbrUxr
cD5lHK+E8t23cFThI4hvdiz5veaJObJ9UgeiL/cHO/X6Afb9fqbvvd4f1/d8
dafZ+RV934G0G5zIBxv+dm1/YP+p87ea5UgBtMlrD7Xaw9XHu7+CBQ+1q4/r
c/0Dv4K3GVd2C3qNP2xL+eJut/O7bWm3qN/6vbaUr+k1/4GDeu39g04eneJj
+8nj3vJGvd34h8yosEH3j5jRbl2/r/GeHkoXRtEvUDR4BqDb/udwbnT7fwDn
XrveKzi7wgkVLaHDkYq2t/bj3tpGu9/+XWvb1VbDLO11TqyI6xQsPqOrOWij
1Tn5DdAT66ngAZ6JLxqk3zFu6aTVbD1xdYXZTufzBlsE7fIrmOoVrPQqt8+T
etvMt3s6HD6DC0D6xu326m3jIrOZRr9tTY0C5dHBWMs4abIHeCv9q3uwrl/v
8B8frqYvp851gVsHQF2IvFkuN6/a5Oseyi3r9Ka1Su3g0H6P9PJqoS31wG4H
GXSzbwjq9U+6+wS12lY9T3onGUH5WL/ZKzr8k1yvelZGmZkUZiDgz8qnANqp
/x6HegI3bh0qFjwvsF7PhCgI/YQff1U+OPhlpXz8ZOjoYCXx8RBoYKHAbqu/
DcSL6YRf6xh20m9bKXT63d/0vIg9hzex8k5Qdq+y29ixeX42edzPkhllw3RL
S++J/dotZfl8NK5we6XxtGU/T/0gYcJ0dQGyNq+06trO9gupmQXcUvtSmXkT
zzS554GDWZ2nPnAcdGyaSw98kq+x/aYH9uDs/oq/syHsQHeo5vXhB7oL4NPE
XAeYIQLQfJ9lAHvzBmDqFXYgg5nx/O9B890inbF7h6l5k/PX+KgrZiMpYvnu
4kFfAGRXl/q2g3Dww+yq+ekbxSq7xGTUo7nwHmdfcjCVuha2B2JuwrM7KlG4
5N2932rv5mLJPH/pmxqPelaoT/ht7OvOkNqqRK5NVWhfXKMrNHtNr29CqFsn
gbx5+1iaCgvVVkLlERu69A4rCuSlfSHwSf13P0hDU+lJ7zHrfdD/qkFxp0z5
vDbXCKaBSmfYDoatnGmE/mON+Y80ww/fsu+/4WJOtW/WmC1c2OU3PGsJXoS+
WpumEu2S/U8c+85g4a3xvcalfqGUXtClb6C0odfEiW07vJRFALuyH4YX3/CZ
fW3QNBdHr1+/5bOawSTvoKmaMcU5ZcBUEscR/e8cc5+Ys+VWXxGpjW9fB7y/
/0r/txTkbWWsZmcOkcHPgTnVmOZ/QlXIxv4fF0WX8Ew1AAA=

-->

</rfc>
