如何更改索引中的“消息”值
How to change “message” value in index
在 logstash pipeline 或 indexpattern 中如何更改“消息”字段中 CDN 日志的以下部分以分离或提取一些数据然后聚合它们。
<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {"method":"GET","scheme":"https","domain":"www.123.com","uri":"/product/10809350","ip":"66.249.65.174","ua":"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)","country":"US","asn":15169,"content_type":"text/html; charset=utf-8","status":200,"server_port":443,"bytes_sent":1892,"bytes_received":1371,"upstream_time":0.804,"cache":"MISS","request_id":"b017d78db4652036250148216b0a290c"}
预期变化:
{"method":"GET","scheme":"https","domain":"www.123.com","uri":"/product/10809350","ip":"66.249.65.174","ua":"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)","country":"US","asn":15169,"content_type":"text/html; charset=utf-8","status":200,"server_port":443,"bytes_sent":1892,"bytes_received":1371,"upstream_time":0.804,"cache":"MISS","request_id":"b017d78db4652036250148216b0a290c"}
因为这部分“<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]:”在 jason 中没有被解析,我不能根据国家、ASN 等文件创建可视化仪表板...
logstash 索引的原始日志为:
{
"_index": "logstash-2022.01.17-000001",
"_type": "_doc",
"_id": "Qx8pZ34BhloLEkDviGxe",
"_version": 1,
"_score": 1,
"_source": {
"message": "<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {\"method\":\"GET\",\"scheme\":\"https\",\"domain\":\"www.123.com\",\"uri\":\"/product/10809350\",\"ip\":\"66.249.65.174\",\"ua\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\",\"country\":\"US\",\"asn\":15169,\"content_type\":\"text/html; charset=utf-8\",\"status\":200,\"server_port\":443,\"bytes_sent\":1892,\"bytes_received\":1371,\"upstream_time\":0.804,\"cache\":\"MISS\",\"request_id\":\"b017d78db4652036250148216b0a290c\"}",
"port": 39278,
"@timestamp": "2022-01-17T08:31:22.100Z",
"@version": "1",
"host": "93.115.150.121"
},
"fields": {
"@timestamp": [
"2022-01-17T08:31:22.100Z"
],
"port": [
39278
],
"@version": [
"1"
],
"host": [
"93.115.150.121"
],
"message": [
"<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {\"method\":\"GET\",\"scheme\":\"https\",\"domain\":\"www.123.com\",\"uri\":\"/product/10809350\",\"ip\":\"66.249.65.174\",\"ua\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\",\"country\":\"US\",\"asn\":15169,\"content_type\":\"text/html; charset=utf-8\",\"status\":200,\"server_port\":443,\"bytes_sent\":1892,\"bytes_received\":1371,\"upstream_time\":0.804,\"cache\":\"MISS\",\"request_id\":\"b017d78db4652036250148216b0a290c\"}"
],
"host.keyword": [
"93.115.150.121"
]
}
}
谢谢
将这些配置添加到 logstash 配置的过滤部分:
#To parse the message field
grok {
match => { "message" => "<%{NONNEGINT:syslog_pri}>\s+%{TIMESTAMP_ISO8601:syslog_timestamp}\s+%{DATA:sys_host}\s+%{NOTSPACE:sys_module}\s+%{GREEDYDATA:syslog_message}"}
}
#To replace message field with syslog_message
mutate {
replace => [ "message", "%{syslog_message}" ]
}
消息字段被 syslog_message 替换后,您可以添加下面的 json 过滤器来解析 json 以分隔字段..
json {
source => "syslog_message"
}
谢谢,这很有用,我从你对这个特定场景的建议中得到了一个想法:
以下编辑的 logstash.conf 解决了这个问题:
input {
tcp {
port => 5000
codec => json
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:Junk}: %{GREEDYDATA:request}"}
}
json { source => "request" }
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["elasticsearch:9200"]
manage_template => false
ecs_compatibility => disabled
index => "logs-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}
但我主要关心的是编辑配置文件,我更愿意在 kibana web ui 中进行任何更改而不是更改 logstash.conf,因为我们在组织中的不同场景中使用 elk 并且此类更改使 elk 服务器仅适用于特殊用途,而不适用于多种用途。
如何在不更改 logstash 配置文件的情况下获得这样的结果?
在 logstash pipeline 或 indexpattern 中如何更改“消息”字段中 CDN 日志的以下部分以分离或提取一些数据然后聚合它们。
<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {"method":"GET","scheme":"https","domain":"www.123.com","uri":"/product/10809350","ip":"66.249.65.174","ua":"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)","country":"US","asn":15169,"content_type":"text/html; charset=utf-8","status":200,"server_port":443,"bytes_sent":1892,"bytes_received":1371,"upstream_time":0.804,"cache":"MISS","request_id":"b017d78db4652036250148216b0a290c"}
预期变化:
{"method":"GET","scheme":"https","domain":"www.123.com","uri":"/product/10809350","ip":"66.249.65.174","ua":"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)","country":"US","asn":15169,"content_type":"text/html; charset=utf-8","status":200,"server_port":443,"bytes_sent":1892,"bytes_received":1371,"upstream_time":0.804,"cache":"MISS","request_id":"b017d78db4652036250148216b0a290c"}
因为这部分“<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]:”在 jason 中没有被解析,我不能根据国家、ASN 等文件创建可视化仪表板...
logstash 索引的原始日志为:
{
"_index": "logstash-2022.01.17-000001",
"_type": "_doc",
"_id": "Qx8pZ34BhloLEkDviGxe",
"_version": 1,
"_score": 1,
"_source": {
"message": "<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {\"method\":\"GET\",\"scheme\":\"https\",\"domain\":\"www.123.com\",\"uri\":\"/product/10809350\",\"ip\":\"66.249.65.174\",\"ua\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\",\"country\":\"US\",\"asn\":15169,\"content_type\":\"text/html; charset=utf-8\",\"status\":200,\"server_port\":443,\"bytes_sent\":1892,\"bytes_received\":1371,\"upstream_time\":0.804,\"cache\":\"MISS\",\"request_id\":\"b017d78db4652036250148216b0a290c\"}",
"port": 39278,
"@timestamp": "2022-01-17T08:31:22.100Z",
"@version": "1",
"host": "93.115.150.121"
},
"fields": {
"@timestamp": [
"2022-01-17T08:31:22.100Z"
],
"port": [
39278
],
"@version": [
"1"
],
"host": [
"93.115.150.121"
],
"message": [
"<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {\"method\":\"GET\",\"scheme\":\"https\",\"domain\":\"www.123.com\",\"uri\":\"/product/10809350\",\"ip\":\"66.249.65.174\",\"ua\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\",\"country\":\"US\",\"asn\":15169,\"content_type\":\"text/html; charset=utf-8\",\"status\":200,\"server_port\":443,\"bytes_sent\":1892,\"bytes_received\":1371,\"upstream_time\":0.804,\"cache\":\"MISS\",\"request_id\":\"b017d78db4652036250148216b0a290c\"}"
],
"host.keyword": [
"93.115.150.121"
]
}
}
谢谢
将这些配置添加到 logstash 配置的过滤部分:
#To parse the message field
grok {
match => { "message" => "<%{NONNEGINT:syslog_pri}>\s+%{TIMESTAMP_ISO8601:syslog_timestamp}\s+%{DATA:sys_host}\s+%{NOTSPACE:sys_module}\s+%{GREEDYDATA:syslog_message}"}
}
#To replace message field with syslog_message
mutate {
replace => [ "message", "%{syslog_message}" ]
}
消息字段被 syslog_message 替换后,您可以添加下面的 json 过滤器来解析 json 以分隔字段..
json {
source => "syslog_message"
}
谢谢,这很有用,我从你对这个特定场景的建议中得到了一个想法: 以下编辑的 logstash.conf 解决了这个问题:
input {
tcp {
port => 5000
codec => json
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:Junk}: %{GREEDYDATA:request}"}
}
json { source => "request" }
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["elasticsearch:9200"]
manage_template => false
ecs_compatibility => disabled
index => "logs-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}
但我主要关心的是编辑配置文件,我更愿意在 kibana web ui 中进行任何更改而不是更改 logstash.conf,因为我们在组织中的不同场景中使用 elk 并且此类更改使 elk 服务器仅适用于特殊用途,而不适用于多种用途。 如何在不更改 logstash 配置文件的情况下获得这样的结果?