创建自定义 GROK 模式
Creating a custom GROK pattern
目前,我正在尝试为此日志创建 grok 模式
2020-03-11 05:54:26,174 JMXINSTRUMENTS-Threading [{"timestamp":"1583906066","label":"Threading","ObjectName":"java.lang:type\u003dThreading","attributes":[{"name":"CurrentThreadUserTime","value":18600000000},{"name":"ThreadCount","value":152},{"name":"TotalStartedThreadCount","value":1138},{"name":"CurrentThreadCpuTime","value":20804323112},{"name":"PeakThreadCount","value":164},{"name":"DaemonThreadCount","value":136}]}]
目前我可以正确匹配直到 JMXINTRUMENTS-Threading 使用这个模式:
%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}
但我似乎无法匹配此后的所有值。有人知道我应该使用什么模式吗?
我正在 https://grokdebug.herokuapp.com/(这是 logstash 的官方调试器)中尝试你的模式,它确实匹配 "JMXINTRUMENTS-Threading" 之后的所有内容与你在一个名为日志消息的大字段中的模式,在这个方式:
{
"timestamp": [
[
"2020-03-11 05:54:26,174"
]
],
"YEAR": [
[
"2020"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"11"
]
],
"HOUR": [
[
"05",
null
]
],
"MINUTE": [
[
"54",
null
]
],
"SECOND": [
[
"26,174"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"instrument": [
[
"JMXINSTRUMENTS-Threading"
]
],
"log_message": [
[
"[{"timestamp":"1583906066","label":"Threading","ObjectName":"java.lang:type\u003dThreading","attributes":[{"name":"CurrentThreadUserTime","value":18600000000},{"name":"ThreadCount","value":152},{"name":"TotalStartedThreadCount","value":1138},{"name":"CurrentThreadCpuTime","value":20804323112},{"name":"PeakThreadCount","value":164},{"name":"DaemonThreadCount","value":136}]}]"
]
]
}
如果您希望匹配日志消息中包含的所有字段,您应该在 logstash 管道过滤器部分使用 json 过滤器,就在 grok 过滤器的正下方:
例如:
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}" }
tag_on_failure => ["no_match"]
}
if "no_match" not in [tags] {
json {
source => "log_message"
}
}
这样,您的 json 将按键拆分:值并进行解析。
编辑:
您可以尝试使用 kv 过滤器而不是 json,这里是文档:https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}" }
tag_on_failure => ["no_match"]
}
if "no_match" not in [tags] {
kv {
source => "log_message"
value_split => ":"
include_brackets => true #remove brackets
remove_char_key => "\""
remove_char_value => "\""
field_split => ","
}
}
在 JSON 过滤器中定义不同的源和目标后,它对我有用。谢谢您的帮助!
filter {
if "atlassian-jira-perf" in [tags] {
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message_raw}" }
tag_on_failure => ["no_match"]
add_tag => ["bananas"]
}
if "no_match" not in [tags] {
json {
source => "log_message_raw"
target => "parsed"
}
}
mutate {
remove_field => ["message"]
}
}
}
目前,我正在尝试为此日志创建 grok 模式
2020-03-11 05:54:26,174 JMXINSTRUMENTS-Threading [{"timestamp":"1583906066","label":"Threading","ObjectName":"java.lang:type\u003dThreading","attributes":[{"name":"CurrentThreadUserTime","value":18600000000},{"name":"ThreadCount","value":152},{"name":"TotalStartedThreadCount","value":1138},{"name":"CurrentThreadCpuTime","value":20804323112},{"name":"PeakThreadCount","value":164},{"name":"DaemonThreadCount","value":136}]}]
目前我可以正确匹配直到 JMXINTRUMENTS-Threading 使用这个模式:
%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}
但我似乎无法匹配此后的所有值。有人知道我应该使用什么模式吗?
我正在 https://grokdebug.herokuapp.com/(这是 logstash 的官方调试器)中尝试你的模式,它确实匹配 "JMXINTRUMENTS-Threading" 之后的所有内容与你在一个名为日志消息的大字段中的模式,在这个方式:
{
"timestamp": [
[
"2020-03-11 05:54:26,174"
]
],
"YEAR": [
[
"2020"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"11"
]
],
"HOUR": [
[
"05",
null
]
],
"MINUTE": [
[
"54",
null
]
],
"SECOND": [
[
"26,174"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"instrument": [
[
"JMXINSTRUMENTS-Threading"
]
],
"log_message": [
[
"[{"timestamp":"1583906066","label":"Threading","ObjectName":"java.lang:type\u003dThreading","attributes":[{"name":"CurrentThreadUserTime","value":18600000000},{"name":"ThreadCount","value":152},{"name":"TotalStartedThreadCount","value":1138},{"name":"CurrentThreadCpuTime","value":20804323112},{"name":"PeakThreadCount","value":164},{"name":"DaemonThreadCount","value":136}]}]"
]
]
}
如果您希望匹配日志消息中包含的所有字段,您应该在 logstash 管道过滤器部分使用 json 过滤器,就在 grok 过滤器的正下方:
例如:
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}" }
tag_on_failure => ["no_match"]
}
if "no_match" not in [tags] {
json {
source => "log_message"
}
}
这样,您的 json 将按键拆分:值并进行解析。
编辑:
您可以尝试使用 kv 过滤器而不是 json,这里是文档:https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message}" }
tag_on_failure => ["no_match"]
}
if "no_match" not in [tags] {
kv {
source => "log_message"
value_split => ":"
include_brackets => true #remove brackets
remove_char_key => "\""
remove_char_value => "\""
field_split => ","
}
}
在 JSON 过滤器中定义不同的源和目标后,它对我有用。谢谢您的帮助!
filter {
if "atlassian-jira-perf" in [tags] {
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} (?<instrument>[^\ ]*) ?%{GREEDYDATA:log_message_raw}" }
tag_on_failure => ["no_match"]
add_tag => ["bananas"]
}
if "no_match" not in [tags] {
json {
source => "log_message_raw"
target => "parsed"
}
}
mutate {
remove_field => ["message"]
}
}
}