正则表达式:多行,非贪婪匹配直到可选字符串
regexp: multiline, non-greedy match until optional string
使用 Go 的正则表达式,我试图从原始文本中提取一组预定义的有序键值(多行)对,其最后一个元素可能是可选的,例如,
Key1:
SomeValue1
MoreValue1
Key2:
SomeValue2
MoreValue2
OptionalKey3:
SomeValue3
MoreValue3
(在这里,我想提取所有值作为命名组)
如果我使用默认的贪婪模式 (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
,它永远不会看到 OptionalKey3 并将其余文本匹配为 Key2。
如果我使用非贪婪模式 (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
,它甚至看不到 SomeValue2 并立即停止:https://regex101.com/r/QE2g3o/1
有没有一种方法可以选择性地匹配 OptionalKey3,同时还能捕获所有其他的?
使用
(?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z
参见regex proof。
解释
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
Key1: 'Key1:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key1> group and capture to "Key1":
--------------------------------------------------------------------------------
.* any character (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of "Key1"
--------------------------------------------------------------------------------
Key2: 'Key2:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key2> group and capture to "Key2":
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of "Key2"
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
OptionalKey3: 'OptionalKey3:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<OptionalKey3> group and capture to "OptionalKey3":
--------------------------------------------------------------------------------
.* any character (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of "OptionalKey3"
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\z the end of the string
使用 Go 的正则表达式,我试图从原始文本中提取一组预定义的有序键值(多行)对,其最后一个元素可能是可选的,例如,
Key1:
SomeValue1
MoreValue1
Key2:
SomeValue2
MoreValue2
OptionalKey3:
SomeValue3
MoreValue3
(在这里,我想提取所有值作为命名组)
如果我使用默认的贪婪模式 (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
,它永远不会看到 OptionalKey3 并将其余文本匹配为 Key2。
如果我使用非贪婪模式 (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?)
,它甚至看不到 SomeValue2 并立即停止:https://regex101.com/r/QE2g3o/1
有没有一种方法可以选择性地匹配 OptionalKey3,同时还能捕获所有其他的?
使用
(?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z
参见regex proof。
解释
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
Key1: 'Key1:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key1> group and capture to "Key1":
--------------------------------------------------------------------------------
.* any character (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of "Key1"
--------------------------------------------------------------------------------
Key2: 'Key2:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key2> group and capture to "Key2":
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of "Key2"
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
OptionalKey3: 'OptionalKey3:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<OptionalKey3> group and capture to "OptionalKey3":
--------------------------------------------------------------------------------
.* any character (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of "OptionalKey3"
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\z the end of the string