Python 正则表达式选择现在完成时的动词

Python regex selection of verbs with present perfect

在给定的字符串中,我试图捕捉现在完成时态的动词。我通过在 python:

中使用以下正则表达式来做到这一点
import re
sentence = "The Batman has never shown his true identity but has done so much good for Gotham City"

verb = re.findall(r'has\s[^\,\.\"]{0,50}done', sentence)

结果是:

>>> print(verb)

['has never shown his true identity but has done']

在这里,正确答案应该是 'has done',但是 'has never shown' 中的 'has' 是错误的 'has'。 [^\,\.\"]{0,50} 部分允许对 'has' 和 'done' 之间的内容有一定的自由度,这里没有出现,但对我的真实数据很有用。但是,它会捕获它找到的第一个 'has',这并不总是好的。是否可以取最后一个 'has' 而不是?

您可以在此处使用 解决方案:

\bhas\s(?:(?!\bhas\b)[^,."]){0,50}?\bdone\b

参见regex demo

详情

  • \bhas - 一个完整的单词 has
  • \s - 一个空白字符
  • (?:(?!\bhas\b)[^,."]){0,50}? - 除了 ,." 之外的任何字符,出现零到五十次但尽可能少,不会以整个单词开头 has
  • \bdone\b - 一个完整的单词 done.

看到一个Python demo:

import re
sentence = "The Batman has never shown his true identity but has done so much good for Gotham City"
verb = re.findall(r'\bhas\s(?:(?!\bhas\b)[^,."]){0,50}?\bdone\b', sentence)
print(verb)
# => ['has done']