非贪婪正则表达式根据原子在正则表达式中的位置表现贪婪

Non-Greedy regex acts greedy based on the position of atoms in regex

我遇到过一种情况,我想在正则表达式模式中使用非贪婪原子 .*?

set input "Device ID: HOST1
Interface: GigabitEthernet0/1,  Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2,  Port ID (outgoing port): GigabitEthernet2/43
"

puts "======== Non-Greedy regex starting with some other patterns ========"
puts [ regexp -inline {Device\s+ID:.*?outgoing\s+port\):\s+} $input]
puts "======== Non-Greedy regex at first ========"
puts [ regexp -inline {.*?outgoing\s+port\):\s+} $input]

输出:

======== Non-Greedy regex starting with some other patterns ========
{Device ID: HOST1
Interface: GigabitEthernet0/1,  Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2,  Port ID (outgoing port): }
======== Non-Greedy regex at first ========
{Device ID: HOST1
Interface: GigabitEthernet0/1,  Port ID (outgoing port): }

虽然 .*?outgoing\s+port\):\s+ 一直匹配到第一次出现,但模式 Device\s+ID:.*?outgoing\s+port\):\s+ 不会在第一次出现匹配时停止。

为什么非贪婪匹配的行为会因为原子的放置而受到影响?

它没有很好的记录(IMO)但是 the re_syntax man page 说了关于 greedy/non-greedy 偏好:

A branch has the same preference as the first quantified atom in it which has a preference.

(强调我的)

所以如果你有 .* 作为第一个量词,整个 RE 将是贪婪的,
如果你有 .*? 作为第一个量词,整个 RE 将是非贪婪的。