正则表达式：从组中排除，可能存在也可能不存在的字符串

Question

我有这篇文章

##### PRIORITY
- Priority 1
- Priority 2

##### ISSUE TYPE
<!--- comment -->
- Problem / Case
- Requirement

我正在尝试根据标题（优先级、问题类型）仅获取每个类别的选项。

我的正则表达式是这样的：

(?:#####\s?issue type.*?)(?:<!---.*?-->)?(.*?)(?:#####|$)

我分了三组（标题、评论、内容）。

如果没有评论块，正则表达式会正常工作，但如果有评论块，它就会被我的第三个（内容）组捕获。如果评论部分存在，如何从第三组中排除？

我试过这样的负面前瞻：

(?:#####\s?issue type.*?)(?:<!---.*?-->)?(?!(<!---.*?-->).*?)(?:#####|$)

但似乎不起作用。

A link to pythex for help.

Answer 1

您可以先行使用此正则表达式：

(?:#####\s*issue type.*\s+)(?:<!---.*?-->\s+)?([\s\S]*?)(?=\s*(?:#####|$))

RegEx Demo

(?=\s*(?:#####|$)) 是一个前瞻，断言我们在当前位置之前有 ##### 或行尾。这有助于您匹配给定输入中的多个匹配项。

代码：

>>> reg = re.compile(r'(?:#####\s*issue type.*\s+)(?:<!---.*?-->\s+)?([\s\S]*?)(?=\s*(?:#####|$))', re.I)
>>> print(reg.findall(test_str))
['- Problem / Case\n- Requirement', '- Problem / Case\n- Requirement']

正则表达式：从组中排除，可能存在也可能不存在的字符串

Regex: Exclude from group, string that may or may not exist

python

regex

regex-group

regex-lookarounds