如何使用 .NET 匹配 Regex 中的最后一个模式？

Question

我想提取最接近某个部分的数字。在这个正则表达式中 \d+?[\r\n]+(.*)3.2.P.4.4.\s+Justification\s+of\s+Specifications

Objective - 试图找到以数字开头并以给定的部分名称结束的部分。在这种情况下，部分名称是 ( 3.2.P.4.4。理由规格）

实际结果 - 正则表达式匹配所有内容，因为模式以数字开头。预期结果 - 正则表达式应该从 29 开始，这是距离该部分最近的数字。我尝试了很多选项，比如不贪婪的量词等，但 none 似乎有效。

Answer 1

在 .NET 中，您可以使用 RegexOptions.RightToLeft 选项从末尾到开头解析文本，从而以更简单的模式更快地获得最后一个匹配项。

使用

var text = " 26\r\nData related to the point SP-WFI-21-Room process fluids  \r\nSampling Date:16/04/2007 \r\n 28\r\nData related to pint SP-WFI-21-Room process fluids  \r\nSampling Date: 20/04/2007 \r\nTEST SPECIFICATIONS RESULTS \r\n 29\r\n3.2.P.4.2 Analytical Procedures \r\nAll the analytical procedures \r\n3.2.P.4.3 Validation of Analytical Procedures \r\nAll the analytical procedures proposed to control the excipients are those reported in Ph. Eur. \r\n− 3AQ13A: Validation of Analytical Procedures: Methodology - EUDRALEX Volume 3A \r\n3.2.P.4.4. Justification of Specifications";
var pattern = @"^\s*\d+\s*[\r\n]+(.*?)3\.2\.P\.4\.4\.\s+Justification\s+of\s+Specifications";
var regEx = new Regex(pattern, RegexOptions.RightToLeft | RegexOptions.Singleline | RegexOptions.Multiline );

var m = regEx.Match(text);
if (m.Success)
{
    Console.WriteLine(m.Groups[1].Value);
}

参见C# demo。

见.NET regex demo

我基本上只是在 \d+ 之后添加了 ^（在多行模式下，一行的开头）和 \s*（以防万一换行符之前有任何空格）。注意转义点。

注意 .NET 正则表达式不支持 U 贪婪切换修饰符，因此必须将 +? 变为 + 并将 .* 变为 .*? .实际上，在原始正则表达式中有 + 个本来应该是 +? 的量词，这可能会导致其他错误或意外行为。 如果您不能 100% 确定自己在做什么，请不要在 PCRE 中使用 U 修饰符。

Answer 2

您可以使用否定前瞻来断言下一行不是以白色开头space 个字符后跟数字和换行符：

^ \d+[\r\n](?:(?!\s+\d+[\r\n]).*[\r\n])*3\.2\.P\.4\.4\.\sJustification\s+of\s+Specifications

看到一个regex .NET demo | C# demo

说明

^ 字符串开头
\d+[\r\n] 匹配 space, 1+ 数字和换行符
(?:非捕获组
- (?! 断言以下内容不是的否定前瞻
  - \s+\d+[\r\n] 匹配 1+ 个白色space 个字符，1+ 个数字和换行符
- ) 关闭否定前瞻
- .*[\r\n] 匹配任何以换行符结尾的字符
)*关闭非捕获组并重复0+次
3\.2\.P\.4\.4\.\sJustification\s+of\s+Specifications 匹配部分名称

如何使用 .NET 匹配 Regex 中的最后一个模式？

How to match the last pattern in Regex, using .NET?

.net

regex

regex-greedy