捕获组的负后视

Question

我正在尝试编写一些正则表达式，使我能够对捕获组进行负面回顾，以便我可以从电子邮件中提取可能的引用。我需要知道如何从某个点向后看第一个白色 space。如果找到数字，我不想提取引用。

我已经得到如下所示。我有 2 个捕获组 - 'PreRef' 和 'Ref'。如果 'PreRef' 包含数字，我不想找到 'Ref' 匹配项。到目前为止，我只检查冒号前的字符是否为数字。

(?<PreRef>\S+)(?<![\d]):(?<Ref>\d{5})

12345 的 'Ref' 匹配应该在这里找到：

This is a reference:12345

但这里没有（'reference'这个词有一个5）：

This is not a ref5rence:12345

Answer 1

你需要负面回顾吗？从 PreRef 捕获中排除数字更容易。 [^\W\d] 将匹配单词字符但不匹配数字。然后你只需要添加一个 \b 或其他类似的词边界断言来确保匹配的是一个完整的词。

\b(?<PreRef>[^\W\d]+):(?<Ref>\d{5})

Answer 2

我当然同意，如果:之前不允许有数字，我们可以用一个简单的表达式，比如：

^\D+:(\d{5})

或：

^\D+:(\d{5})$

如果我们想添加更多的边界，我们当然也可以这样做。

Demo

正则表达式电路

jex.im 可视化正则表达式：

测试

const regex = /^\D+:(\d{5})/gm;
const str = `This is a reference:12345
This is not a ref5rence:12345`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Answer 3

您可以从 \S class 中排除数字，然后将表达式括起来
有空白边界，然后是中提琴..

(?<!\S)(?<PreRef>[^\s\d]+):(?<Ref>\d{5})(?!\S)

https://regex101.com/r/JrU7Kd/1

已解释

 (?<! \S )                     # Whitespace boundary
 (?<PreRef> [^\s\d]+ )         # (1), Not whitespace nor digit
 :                             # Colon
 (?<Ref> \d{5} )               # (2), Five digits
 (?! \S )                      # Whitespace boundary

捕获组的负后视

Negative lookbehind on a capture group

regex

regex-group

regex-lookarounds

Demo

正则表达式电路

测试