为什么 运行 支持 ECMAScript 风格的 .Net Regex \A
Why running .Net Regex with ECMAScript flavor support \A
我有一个 .NetStandard2.1
C#
应用程序需要 运行 Regex
ECMAScript
风格。
根据MSDN documentation,我可以使用RegexOptions.ECMAScript
:
Enables ECMAScript-compliant behavior for the expression.
我知道 ECMAScript
不支持 \A
锚点(根据 link and when I tried Regex101 和 ECMAScript 选项)。但似乎.Net 确实支持它。示例:
Regex emcaRegex = new Regex(@"\A\d{3}", RegexOptions.ECMAScript);
var matches = emcaRegex.Matches("901-333-");
Console.WriteLine($"number of matches: {matches.Count}"); // number of matches: 1
Console.WriteLine($"The match: {matches[0]}"); // The match: 901
我希望完全不匹配,我错过了什么?
您需要在"ECMAScript Matching Behavior" article中进一步寻找答案。
此选项不会重新定义特定于 .NET 的锚点含义,它们仍然受支持。
The behavior of ECMAScript and canonical regular expressions differs in three areas: character class syntax, self-referencing capturing groups, and octal versus backreference interpretation.
Character class syntax. Because canonical regular expressions support Unicode whereas ECMAScript does not, character classes in ECMAScript have a more limited syntax, and some character class language elements have a different meaning. For example, ECMAScript does not support language elements such as the Unicode category or block elements \p
and \P
. Similarly, the \w
element, which matches a word character, is equivalent to the [a-zA-Z_0-9]
character class when using ECMAScript and [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]
when using canonical behavior. For more information, see Character Classes.
Self-referencing capturing groups. A regular expression capture class with a backreference to itself must be updated with each capture iteration.
Resolution of ambiguities between octal escapes and backreferences.
Regular expression
Canonical behavior
ECMAScript behavior
[=15=]
followed by 0 to 2 octal digits
Interpret as an octal. For example, 4
is always interpreted as an octal value and means "$".
Same behavior.
\
followed by a digit from 1 to 9, followed by no additional decimal digits,
Interpret as a backreference. For example, </code> always means backreference 9, even if a ninth capturing group does not exist. If the capturing group does not exist, the regular expression parser throws an <a href="https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception" rel="nofollow noreferrer">ArgumentException</a>.</td>
<td>If a single decimal digit capturing group exists, backreference to that digit. Otherwise, interpret the value as a literal.</td>
</tr>
<tr>
<td><code>\
followed by a digit from 1 to 9, followed by additional decimal digits
Interpret the digits as a decimal value. If that capturing group exists, interpret the expression as a backreference. Otherwise, interpret the leading octal digits up to octal 377; that is, consider only the low 8 bits of the value. Interpret the remaining digits as literals. For example, in the expression 00
, if capturing group 300 exists, interpret as backreference 300; if capturing group 300 does not exist, interpret as octal 300 followed by 0.
Interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. If no digits can be converted, interpret as an octal by using the leading octal digits up to octal 377; interpret the remaining digits as literals.
我有一个 .NetStandard2.1
C#
应用程序需要 运行 Regex
ECMAScript
风格。
根据MSDN documentation,我可以使用RegexOptions.ECMAScript
:
Enables ECMAScript-compliant behavior for the expression.
我知道 ECMAScript
不支持 \A
锚点(根据 link and when I tried Regex101 和 ECMAScript 选项)。但似乎.Net 确实支持它。示例:
Regex emcaRegex = new Regex(@"\A\d{3}", RegexOptions.ECMAScript);
var matches = emcaRegex.Matches("901-333-");
Console.WriteLine($"number of matches: {matches.Count}"); // number of matches: 1
Console.WriteLine($"The match: {matches[0]}"); // The match: 901
我希望完全不匹配,我错过了什么?
您需要在"ECMAScript Matching Behavior" article中进一步寻找答案。
此选项不会重新定义特定于 .NET 的锚点含义,它们仍然受支持。
The behavior of ECMAScript and canonical regular expressions differs in three areas: character class syntax, self-referencing capturing groups, and octal versus backreference interpretation.
Character class syntax. Because canonical regular expressions support Unicode whereas ECMAScript does not, character classes in ECMAScript have a more limited syntax, and some character class language elements have a different meaning. For example, ECMAScript does not support language elements such as the Unicode category or block elements
\p
and\P
. Similarly, the\w
element, which matches a word character, is equivalent to the[a-zA-Z_0-9]
character class when using ECMAScript and[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]
when using canonical behavior. For more information, see Character Classes.Self-referencing capturing groups. A regular expression capture class with a backreference to itself must be updated with each capture iteration.
Resolution of ambiguities between octal escapes and backreferences.
Regular expression | Canonical behavior | ECMAScript behavior | |
---|---|---|---|
[=15=] followed by 0 to 2 octal digits |
Interpret as an octal. For example, 4 is always interpreted as an octal value and means "$". |
Same behavior. | |
\ followed by a digit from 1 to 9, followed by no additional decimal digits, |
Interpret as a backreference. For example, </code> always means backreference 9, even if a ninth capturing group does not exist. If the capturing group does not exist, the regular expression parser throws an <a href="https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception" rel="nofollow noreferrer">ArgumentException</a>.</td>
<td>If a single decimal digit capturing group exists, backreference to that digit. Otherwise, interpret the value as a literal.</td>
</tr>
<tr>
<td><code>\ followed by a digit from 1 to 9, followed by additional decimal digits |
Interpret the digits as a decimal value. If that capturing group exists, interpret the expression as a backreference. Otherwise, interpret the leading octal digits up to octal 377; that is, consider only the low 8 bits of the value. Interpret the remaining digits as literals. For example, in the expression 00 , if capturing group 300 exists, interpret as backreference 300; if capturing group 300 does not exist, interpret as octal 300 followed by 0. |
Interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. If no digits can be converted, interpret as an octal by using the leading octal digits up to octal 377; interpret the remaining digits as literals. |