.Net Regex Lookahead Group with \Z (end of line) exception

Question

我想我在 .NET Regex 引擎中发现了一个错误，想知道是否有其他人遇到过它，或者它是否是某种预期的行为。

在前瞻组 (?=) 中的一组替代项 [] 中匹配输入字符断言 \Z 的末尾时发生在此示例表达式中，它在创建时抛出异常。

Regex test = new Regex(@"(?=[\Z])");

返回的异常是parsing "(?=[\Z])" - Unrecognized escape sequence \Z.

~~但是正则表达式 [\Z] 和 (?=\Z)~~

~~一样有效~~

~~解决方法很简单，将 (?=[]|\Z) 与替代组中需要的任何其他替代字符一起使用，但它仍然很奇怪。~~

编辑：我认为我最初的测试中一定有错字，因为正如 nhahtdh 指出的那样，上述模式实际上会抛出异常。

使用 C# 在 .NET 4.5 中测试

Answer 1

我不知道你为什么声称 @"[\Z]" 有效，但根据我对 ideone 的测试（目前在 .NET 4.0.30319.17020 上运行），它抛出与@"(?=[\Z])":

System.ArgumentException: parsing '[\Z]' - Unrecognized escape sequence Z.
  at System.Text.RegularExpressions.RegexParser.ScanCharEscape () [0x00000] in <filename unknown>:0 
  at System.Text.RegularExpressions.RegexParser.ScanCharClass (Boolean caseInsensitive, Boolean scanOnly) [0x00000] in <filename unknown>:0 
  [...]

顺便说一下，(?=[]|\Z) 也会抛出异常，因为它试图解析由 ]、| 组成的字符 class 并遇到无效的转义序列 \Z.

检查 RegexParser.ScanCharEscape 的代码，除了 ECMAScript 模式 (!UseOptionE()) 如果遇到 \ 后跟一个不构成已知字符的单词字符，代码将抛出异常转义序列（请注意，在 .NET 中，单词字符不仅限于 A-Za-z0-9_，还包括 Unicode 中的其他单词字符）。

            default:
                if (!UseOptionE() && RegexCharClass.IsWordChar(ch))
                    throw MakeException(SR.Format(SR.UnrecognizedEscape, ch.ToString()));
                return ch;

这可能是一个设计决定，允许未来在人们转向较新版本的 .NET 框架时扩展转义语法而不破坏现有代码库。 Java 在他们的 Pattern class 中也遵循相同的设计原则，但它只会为 A-Za-z 的无法识别的转义序列抛出异常。另一方面，JavaScript/ECMAScript 没有这样的限制，它将无法识别的转义序列解释为 \.

之后的字符

回到问题中的问题，注意\Z是输入assertion的结尾，即它匹配空字符串。断言不是字符，因此将它放在字符 class 中是没有意义的。如果要沿字符 class.

指定它，请使用交替 |

Answer 2

你对\Z是什么有误解...因为它是模式 anchor转义和不是真实的角色；因此，当尝试在 character 集 ([ ]) 中使用它时，例外是有效的。

只要数据末尾存在\n但不是\n字符，就可以用来匹配\n。

引用MSDN (Anchors in Regular Expressions):

The \Z anchor specifies that a match must occur at the end of the input string, or before \n at the end of the input string. It is identical to the $ anchor, except that \Z ignores the RegexOptions.Multiline option. Therefore, in a multiline string, it can only match the end of the last line, or the last line before \n.

Note that \Z matches \n but does not match \r\n (the CR/LF character combination). To match CR/LF, include \r?\Z in the regular expression pattern.

.Net Regex Lookahead Group with \Z (end of line) exception

.Net Regex Lookahead Group with \Z (end of line) exception

.net

regex

c#-4.0