如何捕获相同的组和非相同的组

Question

我试图在代码中捕获两种类型的注释。注释类型为 /../ 和 //...

我的模式：r'(/*.?*/)|(//.?)/'

以上模式没有捕获两个相似的多行注释组(/**/)。如果它们由单行注释组 (//...) 分隔。

这个模式有什么问题？

p=re.compile(r'(/\*.*?\*/)|(//.*?)/')
s='/*first multiline*/ //other comment /*second multiline*/'

expected:
["/*first multiline*/",""]
["","//other comment "]
["/*second multiline*/",""]

Actual:
["/*first multiline*/",""]
["","//other comment "]

注意：我知道这不适用于跨越多行的评论。我只想了解给定输入的上述模式的问题

Answer 1

使用这个：

re.compile(r'(/\*.*?\*/)|(//[^/\r\n]+)', re.DOTALL)

演示：https://regex101.com/r/T4cM98/3

问题是在第二个捕获组占用了你假设的匹配项的第一个斜线之后你的最后一个斜线：

r'(/\*.*?\*/)|(//.*?)/'
                     ^ here

然而，一旦该斜杠被删除，由于非贪婪的 ? 表示法，它不会捕获除双 // 之外的任何内容，因此只需匹配所有不是 [=14] 的内容=] 在 //.

之后

编辑：根据 Wiktor 的建议更新了 //[^/\r\n]+ 和 re.DOTALL。

如何捕获相同的组和非相同的组

How to capture identical groups along with non identical groups

python

regex

python-2.x