为什么插入符号在正则表达式中与星号一起使用时不表现得贪婪

Question

我们都知道 * 表示 0 或更多，除非与 ? 等非贪婪运算符一起使用，否则它将尽可能贪婪。

>>> re.search('.*hello','hai hello there, hello again').group()
'hai hello there, hello'
>>> re.search('.*?hello','hai hello there, hello again').group()
'hai hello'

我刚看到下面的代码，看到它的行为并不感到意外。

>>> re.search('\^*','abc^').group()
''
>>> re.search('a*','abc^').group()
'a'

对于模式 \^*，我希望它与字符串中出现的一个脱字符相匹配。

但是，为什么它必须表现出非贪婪的行为，即在出现 0 次插入符和空字符串匹配时退出？

是不是因为^对正则表达式比较特殊？如果是，那么我们如何将 ^ 与 * 符号匹配？

注意： 当然，以\^+作为模式，它显然会匹配文字插入符号。

Answer 1

@Wiktor Stribiżew 解释说 re.search 只有 return 第一场比赛。所以：

re.search('\^*','abc^').group() returns 空字符串；即它匹配 0 次字符串开头的插入符号和 returned。
re.search('a*','abc^').group() 匹配字符串开头的 1 a 并重新运行此 a
re.search('b*','abc^').group() 匹配空字符串，原因与插入符号（案例 1）

回答你的问题"how we can match that ^ with * symbol?"
您可以使用组 (\^+)* 并获得组的结果：

re.search('(\^+)*','abc^^ab').group()

Answer 2

正则表达式引擎从左到右解析输入字符串，因此，您的 \^* 匹配开头的空字符串，re.search returns 仅匹配第一个匹配项。

搜索内容时，应避免使用可能匹配空字符串的模式，\^* 是匹配 0 个或多个 ^ 符号的模式。因此，最好的解决方案是使用 + 而不是 *.

为什么插入符号在正则表达式中与星号一起使用时不表现得贪婪

Why caret symbol is not behaving greedy when used with asterisk in regexp

python

regex

regex-greedy