如何捕获不在非捕获组中的模式? - Python

How to catch a pattern that's not in the non-capturing group? - Python

给定字符串:

I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' .

目标是抓到:

'v 
'v
'w

但要避免've'll't

我试图用 (?i)\'(?:ve|ll|t)\b 捕捉 've'll 以及 't ,例如

>>> import re
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> pattern = r"(?i)\'(?:ve|ll|t)\b"
>>> re.findall(pattern, x)
["'ll", "'ve", "'t"]

但我也试过像这样否定 (?i)\'(?:ve|ll|t)\b 中的非捕获组 (?i)\'[^(?:ve|ll|t)]\b 但它没有捕获 'v'w是期望的目标。

如何捕获单引号后面但不是来自预定义子字符串列表的子字符串,即 'll've't ?


我也试过这个没用:

pattern = "(?i)\'(?:[^ve|ll|t|\s])\b"

但是 [^...] 只能识别单个字符,不能识别子字符串。

non-capturing 组的负面前瞻是 (?!...),所以它类似于 (?i)\'(?!ve|ll|t)\w\b:

>>> pattern = r"(?i)\'(?!ve|ll|t)\w\b"
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> re.findall(pattern, x)
["'v", "'v", "'w"]

也许这个 one 行得通?

\'(?!ve|ll|t|\s)\w+

您可以使用先行断言来过滤不需要的内容。

更新

在其他一些语言中,模式先行断言必须是固定长度。

这意味着(?!ve|t)是无效的,因为vet有两个不同的长度。