在引用的文本之前匹配列表中最后一次出现的名称

Question

我正在尝试获取长文本中的引语及其各自的作者。

示例：Paul […] Jane says G_quoted text_R

我怎样才能将 Jane 和她的 引用的文本 分成两组，而不是 Paul 等

我尝试了一些像这样的积极前瞻，但我得到了所有的名字，而不仅仅是简。非常感谢您的帮助。

i?(Paul|Jane|Robert|John)(?=[^.]*?G_(.*)_R)

https://regex101.com/r/mx0JgV/1

Answer 1

有什么问题：

import re

QUOTE_FINDER = re.compile(r"(paul|jane|robert|john).*?G_(.*?)_R", re.IGNORECASE | re.DOTALL)

data = """dfdsf Jane […] Paul […] Jane says G_quoted text_R
and Paul says G_some other text_R while Robert prefers to say G_nothing_R..."""

quotes = QUOTE_FINDER.findall(data)
# [('Jane', 'quoted text'), ('Paul', 'some other text'), ('Robert', 'nothing')]

在引用的文本之前匹配列表中最后一次出现的名称

Match the last occurence of a name from a list before a quoted text

python

regex

lookahead

python-3.x

python-3.6