非贪婪正则表达式根据原子在正则表达式中的位置表现贪婪
Non-Greedy regex acts greedy based on the position of atoms in regex
我遇到过一种情况,我想在正则表达式模式中使用非贪婪原子 .*?
。
set input "Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2, Port ID (outgoing port): GigabitEthernet2/43
"
puts "======== Non-Greedy regex starting with some other patterns ========"
puts [ regexp -inline {Device\s+ID:.*?outgoing\s+port\):\s+} $input]
puts "======== Non-Greedy regex at first ========"
puts [ regexp -inline {.*?outgoing\s+port\):\s+} $input]
输出:
======== Non-Greedy regex starting with some other patterns ========
{Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2, Port ID (outgoing port): }
======== Non-Greedy regex at first ========
{Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): }
虽然 .*?outgoing\s+port\):\s+
一直匹配到第一次出现,但模式 Device\s+ID:.*?outgoing\s+port\):\s+
不会在第一次出现匹配时停止。
为什么非贪婪匹配的行为会因为原子的放置而受到影响?
它没有很好的记录(IMO)但是 the re_syntax man page 说了关于 greedy/non-greedy 偏好:
A branch has the same preference as the first quantified atom in it which has a preference.
(强调我的)
所以如果你有 .*
作为第一个量词,整个 RE 将是贪婪的,
如果你有 .*?
作为第一个量词,整个 RE 将是非贪婪的。
我遇到过一种情况,我想在正则表达式模式中使用非贪婪原子 .*?
。
set input "Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2, Port ID (outgoing port): GigabitEthernet2/43
"
puts "======== Non-Greedy regex starting with some other patterns ========"
puts [ regexp -inline {Device\s+ID:.*?outgoing\s+port\):\s+} $input]
puts "======== Non-Greedy regex at first ========"
puts [ regexp -inline {.*?outgoing\s+port\):\s+} $input]
输出:
======== Non-Greedy regex starting with some other patterns ========
{Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2, Port ID (outgoing port): }
======== Non-Greedy regex at first ========
{Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): }
虽然 .*?outgoing\s+port\):\s+
一直匹配到第一次出现,但模式 Device\s+ID:.*?outgoing\s+port\):\s+
不会在第一次出现匹配时停止。
为什么非贪婪匹配的行为会因为原子的放置而受到影响?
它没有很好的记录(IMO)但是 the re_syntax man page 说了关于 greedy/non-greedy 偏好:
A branch has the same preference as the first quantified atom in it which has a preference.
(强调我的)
所以如果你有 .*
作为第一个量词,整个 RE 将是贪婪的,
如果你有 .*?
作为第一个量词,整个 RE 将是非贪婪的。