通过正则表达式获取第二个匹配项

Question

我想通过使用正则表达式获得匹配模式的第二次出现（在括号内）。这是正文

[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN

我想从中提取 2 text.I 尝试使用

(?<Ten ID>((^)*((?<=\[).+?(?=\]))))

但它匹配 2019-07-29 09:48:11,928 , 2，上午。如何只得到 2 ?

Answer 1

来自

/\[[^\]]*\][^[]*\[([^\]]*)\]/

你可以用这个，需要第二个捕获组。

Answer 2

如果你知道它总是第二个匹配，你可以使用scan并取第二个结果：

"[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN".scan(/\[([^\]]*)\]/)[1].first
# => "2"

Answer 3

要获取 [ 和 ]（方括号）之间的子字符串（方括号），您可以使用 /\[([^\]\[]*)\]/ 正则表达式：

\[ - 一个 [ 字符
([^\]\[]*) - 捕获第 1 组：除 [ 和 ]
\] - 一个 ] 字符。

要获得 second 匹配，您可以使用

str = '[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN'
p str[/\[[^\]\[]*\].*?\[([^\]\[]*)\]/m, 1]

参见 this Ruby demo。在这里，

\[[^\]\[]*\] - 找到第一个 [...] 子串
.*? - 尽可能少地匹配任何 0+ 个字符
\[([^\]\[]*)\] - 找到第二个 [...] 子字符串并捕获内部内容，return 在第二个参数 1.[=76= 的帮助下编辑]

要得到第N个匹配，你也可以考虑使用

str = '[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN'
result = ''
cnt = 0
str.scan(/\[([^\]\[]*)\]/) { |match| result = match[0]; cnt +=1; break if cnt >= 2}
puts result #=> 2

见Ruby demo

注意如果匹配项少于预期，此解决方案将 return 最后匹配的子字符串。

另一种不通用且仅适用于这种具体情况的解决方案：提取方括号内第一次出现的 int 数字：

s = "[2019-07-29 09:48:11,928] @hr.com [2] [AM] WARN"
puts s[/\[(\d+)\]/, 1] # => 2

参见Ruby demo。

要在 Fluentd 中使用正则表达式，请使用

\[(?<val>\d+)\]

您需要的值在 val 命名组中。 \[ 匹配 [，(?<val>\d+) 是一个 命名捕获组 匹配 1+ 个数字，] 匹配 ]。

Fluentular 显示：

Copy and paste to fluent.conf or td-agent.conf

     
      type tail 
      path /var/log/foo/bar.log 
      pos_file /var/log/td-agent/foo-bar.log.pos 
      tag foo.bar 
      format /\[(?\d+)\]/

Records

 Key    Value
 val    2

Answer 4

def nth_match(str, n)
  str[/(?:[^\[]*\[){#{n}}([^\]]*)\]/, 1]
end

str = "Little [Miss] Muffet [sat] on a [tuffet] eating [pie]."

nth_match(str, 1)  #=> "Miss" 
nth_match(str, 2)  #=> "sat" 
nth_match(str, 3)  #=> "tuffet" 
nth_match(str, 4)  #=> "pie" 
nth_match(str, 5)  #=> nil

我们可以将正则表达式写成free-spacing模式来记录它。

/
(?:       # begin a non-capture group
  [^\[]*  # match zero or more characters other than '['
  \[      # match '['
){#{n}}   # end non-capture group and execute it n times
(         # start capture group 1,
  [^\]]*  # match zero or more characters other than ']' 
)         # end capture group 1
\]        # match ']'
/x        # free-spacing regex definition mode

/(?:[^\[]*\[){#{n}}([^\]]*)\]/

通过正则表达式获取第二个匹配项

Get the second match by regex

ruby

regex

fluentd