我怎样才能使这个 Regex 表达式更具包容性和准确性？

Question

我正在使用 python 2.7 在一段文本中查找文本。以下文字是我提取的一部分：

Item 1 for Product A: Flour
Solution 1 for Product A: Water
Items 2 for Product B: Milk
Solution 2 for Product B: Oil
Item 3 for Product C: Onions

Method

我有以下 Python 代码可以提取我想要的特定信息：

extract = re.findall(r"(?<=Item|s\s).*(?=\sSolution)", page_content)

虽然这提取了一些信息，但我无法提取所有我需要的信息。我需要包含单词 "Item" 并且我无法提取最后一项，因为下一个单词不是 "Solution" 而是 Method.

我想要的输出是：

Item 1 for Product A: Flour
Items 2 for Product B: Milk
Item 3 for Product C: Onions

如能帮助我们改进正则表达式，我们将不胜感激。

谢谢

Answer 1

如果您的输入看起来像

Item 1 for Product A: FlourSolution 1 for Product A: WaterItems 2 for Product B: MilkSolution 2 for Product B: OilItem 3 for Product C: Onions

Method

以下模式为您提供所需的输出。

r'(Item[s]{0,1}.*?\:\s[A-Z][a-z]*[^A-Z])'

在这里查看： https://regex101.com/r/ucPdcV/2

我怎样才能使这个 Regex 表达式更具包容性和准确性？

How would I make this Regex expression more inclusive and accurate?

python

regex

regex-group

regex-lookarounds