无法强制 GROK 解析器在 haproxy 日志上强制执行 integer/float 类型
can't force GROK parser to enforce integer/float types on haproxy logs
integer/long 或浮动无关紧要,像 time_duration 这样的字段(所有时间_* 真的)在 kibana logstash 索引中映射为字符串。
我试过使用 mutate (https://www.elastic.co/blog/little-logstash-lessons-part-using-grok-mutate-type-data) 也没有用。
如何在这些字段上正确强制执行数字类型而不是字符串?
我的/etc/logstash/conf.d/haproxy.conf:
input {
syslog {
type => haproxy
port => 5515
}
}
filter {
if [type] == "haproxy" {
grok {
patterns_dir => "/usr/local/etc/logstash/patterns"
match => ["message", "%{HAPROXYHTTP}"]
named_captures_only => true
}
geoip {
source => "client_ip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
}
}
我的 HAPROXYHTTP 模式:
HAPROXYHTTP %{IP:client_ip}:%{INT:client_port} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request:int}/%{INT:time_queue:int}/%{INT:time_backend_connect:int}/%{INT:time_backend_response:int}/%{NOTSPACE:time_duration:int} %{INT:http_status_code} %{NOTSPACE:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn:int}/%{INT:feconn:int}/%{INT:beconn:int}/%{INT:srvconn:int}/%{NOTSPACE:retries:int} %{INT:srv_queue:int}/%{INT:backend_queue:int} (\{%{HAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )?"(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?"
很可能 Logstash 在这里做了正确的事情(您的配置看起来是正确的),但 Elasticsearch 如何映射字段是另一回事。如果某个 Elasticsearch 文档中的某个字段已动态映射为字符串,则添加到同一索引的后续文档也将映射为字符串,即使它们在源文档中是整数或浮点数。要更改此设置,您必须重新编制索引,但是使用基于时间序列的 Logstash 索引,您可以等到第二天获得新索引。
integer/long 或浮动无关紧要,像 time_duration 这样的字段(所有时间_* 真的)在 kibana logstash 索引中映射为字符串。
我试过使用 mutate (https://www.elastic.co/blog/little-logstash-lessons-part-using-grok-mutate-type-data) 也没有用。
如何在这些字段上正确强制执行数字类型而不是字符串?
我的/etc/logstash/conf.d/haproxy.conf:
input {
syslog {
type => haproxy
port => 5515
}
}
filter {
if [type] == "haproxy" {
grok {
patterns_dir => "/usr/local/etc/logstash/patterns"
match => ["message", "%{HAPROXYHTTP}"]
named_captures_only => true
}
geoip {
source => "client_ip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
}
}
我的 HAPROXYHTTP 模式:
HAPROXYHTTP %{IP:client_ip}:%{INT:client_port} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request:int}/%{INT:time_queue:int}/%{INT:time_backend_connect:int}/%{INT:time_backend_response:int}/%{NOTSPACE:time_duration:int} %{INT:http_status_code} %{NOTSPACE:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn:int}/%{INT:feconn:int}/%{INT:beconn:int}/%{INT:srvconn:int}/%{NOTSPACE:retries:int} %{INT:srv_queue:int}/%{INT:backend_queue:int} (\{%{HAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )?"(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?"
很可能 Logstash 在这里做了正确的事情(您的配置看起来是正确的),但 Elasticsearch 如何映射字段是另一回事。如果某个 Elasticsearch 文档中的某个字段已动态映射为字符串,则添加到同一索引的后续文档也将映射为字符串,即使它们在源文档中是整数或浮点数。要更改此设置,您必须重新编制索引,但是使用基于时间序列的 Logstash 索引,您可以等到第二天获得新索引。