使用 python 需要匹配以两种可能模式开始和结束的字符串

Question

的|正则表达式中的符号似乎划分了整个模式，但我需要划分一个较小的模式......我希望它找到以“Q：”或“A：”开头的匹配项，然后在下一个之前结束“问：”或“答：”。中间可以是任何内容，包括换行符。

我的尝试：

string = "Q: This is a question. \nQ: This is a 2nd question \non two lines. \n\nA: This is an answer. \nA: This is a 2nd answer \non two lines.\nQ: Here's another question. \nA: And another answer."

pattern = re.compile("(A: |Q: )[\w\W]*(A: |Q: |$)")

matches = pattern.finditer(string)
for match in matches:
    print('-', match.group(0))

我使用的正则表达式是 (A: |Q: )[\w\W]*(A: |Q: |$).

这里是多行的同一个字符串，仅供参考：

Q: This is a question. 
Q: This is a 2nd question 
on two lines. 

A: This is an answer. 
A: This is a 2nd answer 
on two lines.
Q: Here's another question. 
A: And another answer.

所以我希望括号能隔离开头的两种可能模式和结尾的三种模式，但它却将其视为 4 个独立的模式。它还会在末尾包含下一个 A: 或 Q:，但希望您能明白我的意思。我打算不使用那个组或其他东西。

如果有帮助，这是一个简单的学习程序，它从文本文件中获取问题和答案来测验用户。我能够在问题和答案各占一行的情况下做到这一点，但我无法获得包含多行的“A:”或“Q:”。

Answer 1

我建议为此使用 for 循环，因为它至少对我来说更容易。要回答你的问题，为什么不只定位到这个时期而不是下一个时期？问：？否则你可能不得不使用前瞻。

(A: |Q: )[\s\S]*?\.

[\s\S]（通常用于匹配每个字符，尽管 [\w\W] 也可以）

*? 是惰性量词。它匹配尽可能少的字符。如果我们只有 (A: |Q: )[\s\S]*?，那么它只会匹配 (A: |Q: )，但我们有结尾 \..

\. 匹配文字句点。

for 循环：

questions_and_answers = []
for line in string.splitlines():
    if line.startswith(("Q: ", "A: ")):
        questions_and_answers.append(line)
    else:
        questions_and_answers[-1] += line

# ['Q: This is a question. ', 'Q: This is a 2nd question on two lines. ', 'A: This is an answer. ', 'A: This is a 2nd answer on two lines.', "Q: Here's another question. ", 'A: And another answer.']```

Answer 2

一种方法是使用否定先行 ?! 来匹配换行符后跟 A: | Q: 块，如下所示：

^([AQ]):(?:.|\n(?![AQ]:))+

您也可以在 Regex Demo 上试用。

这是@Wiktor 建议的another approach，应该会快一点：

^[AQ]:.*(?:\n+(?![AQ]:).+)*

我们匹配 .* 而不是匹配 \n+ 的细微修改（但请注意，这也会在末尾捕获空白行）：

^[AQ]:.*(?:\n(?![AQ]:).*)*

使用 python 需要匹配以两种可能模式开始和结束的字符串

Using python re need to match string that starts and ends with two possible patterns each

python

regex

python-re