在字符前提取单词

Question

我正在尝试提取 Y 之前的任何单词，这是边界分隔的。当我尝试使用 (?m) 标志将每一行视为单独的记录并尝试捕获 \s+Y 向前看的 \w+ 时，但我只能打印第一个匹配项，而不是第二场比赛(IMP1).

print(foo)
this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important

目前无果的尝试：

>>> m = re.search('(?m).*?(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>
>>> m = re.search('(?m)(?<=\s)(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>

预期结果是：

('IMP','IMP1')

Answer 1

您可以使用

\w+(?=[^\S\r\n]+Y\b)

参见regex demo。详情：

\w+ - 一个或多个 letters/digits/underscores -(?=[^\S\r\n]+Y\b) - 紧跟一个或多个除 CR 和 LF 之外的空格，然后 Y 作为一个完整的单词（\b 是一个单词边界）。

看到一个Python demo:

import re
foo = "this is IMP Y text\nand this is also IMP1 Y text\nthis is not so IMP2 N text\nY is not important"
print(re.findall(r'\w+(?=[^\S\r\n]+Y\b)', foo))
# => ['IMP', 'IMP1']

Answer 2

尝试使用：

(\w+)(?=.Y)

你可以测试here

所以，完整的代码是：

import re

a="""this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important"""


print (re.findall(r"(\w+)(?=.Y)",a))

输出：

['IMP', 'IMP1']

在字符前提取单词

extracting word before character

python

regex

regex-lookarounds

positive-lookahead