Grok 从匹配模式中提取数据
Grok extracting data from matched pattern
我有这条消息作为输入:
Feb 18 04:35:46 xxxx zzzz-nginx_error 2016/02/18 04:35:39 [error] 28585#0: *3120 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: xx.xx.xx.xx, server: xxxxxx, request: "HEAD / HTTP/1.1", upstream: "fastcgi://unix:/var/run/default.sock:", host: "xxxxxx"
我正在解析它:
grok {
match => {
"message" => [
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) (?<hostname>[^\s]+) (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{GREEDYDATA:log}"
}
}
很好,但我还想提取 client: xx.xx.xx.xx
,同时将其保留在 %{GREEDYDATA:log}
中。
我试过了
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) (?<hostname>[^\s]+) (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{DATA:log} (?<client>%{IP})%{GREEDYDATA:log}"
但这只是将输出打断为:
log: [error] 28585#0: *3120 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client:, , server: xxxxxx, request: "HEAD / HTTP/1.1", upstream: "fastcgi://unix:/var/run/default.sock:", host: "xxxxxx"
client: xx.xx.xx.xx
(注意 IP 从 log
截断)
我可以只提取我需要的数据,还是应该用类似的东西加入它们:
mutate {
replace => {
"log" => "%{DATA:log} (?<client>%{IP})%{GREEDYDATA:log}"
}
}
?
我刚刚意识到答案就在眼前。这是模式:
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) (?<hostname>[^\s]+) (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{DATA:log} (?<client>%{IP})%{GREEDYDATA:log2}"
这是连接:
mutate {
replace => {
"log" => "%{log} %{client}%{log2}"
}
}
我有这条消息作为输入:
Feb 18 04:35:46 xxxx zzzz-nginx_error 2016/02/18 04:35:39 [error] 28585#0: *3120 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: xx.xx.xx.xx, server: xxxxxx, request: "HEAD / HTTP/1.1", upstream: "fastcgi://unix:/var/run/default.sock:", host: "xxxxxx"
我正在解析它:
grok {
match => {
"message" => [
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) (?<hostname>[^\s]+) (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{GREEDYDATA:log}"
}
}
很好,但我还想提取 client: xx.xx.xx.xx
,同时将其保留在 %{GREEDYDATA:log}
中。
我试过了
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) (?<hostname>[^\s]+) (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{DATA:log} (?<client>%{IP})%{GREEDYDATA:log}"
但这只是将输出打断为:
log: [error] 28585#0: *3120 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client:, , server: xxxxxx, request: "HEAD / HTTP/1.1", upstream: "fastcgi://unix:/var/run/default.sock:", host: "xxxxxx"
client: xx.xx.xx.xx
(注意 IP 从 log
截断)
我可以只提取我需要的数据,还是应该用类似的东西加入它们:
mutate {
replace => {
"log" => "%{DATA:log} (?<client>%{IP})%{GREEDYDATA:log}"
}
}
?
我刚刚意识到答案就在眼前。这是模式:
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) (?<hostname>[^\s]+) (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{DATA:log} (?<client>%{IP})%{GREEDYDATA:log2}"
这是连接:
mutate {
replace => {
"log" => "%{log} %{client}%{log2}"
}
}