logstash grok 不适用于某些数据集
logstash grok not working for some of the dataset
这仅与 有关,我在其中使用下面的 grok 过滤器来剖析要可视化到 kibana 中的数据,下面是我在我的 logstash conf 文件中使用并为之工作的内容根据需要过滤数据,但今天我遇到了一种情况,它没有根据需要过滤数据。
Kibana 的正确视觉效果如下:
received_at:February 1st 2019, 21:00:04.105 float:0.5, 0.0 type:rmlog Hostname:dba- foxon93 Date:19/02/01 User_1:dv_vxehw @version:1 Hour_since:06 Command:rm -rf /data/rg/log
logstash 配置文件中的 grok 过滤器:
match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:Command}" }
我的 logstash 配置文件:
input {
file {
path => [ "/data/mylogs/*.txt" ]
start_position => beginning
sincedb_path => "/dev/null"
type => "tac"
}
}
filter {
if [type] == "tac" {
grok {
match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:Command}" }
add_field => [ "received_at", "%{@timestamp}" ]
remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
}
}
}
output {
if [type] == "rmlog" {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "tac-%{+YYYY.MM.dd}"
}
}
}
下面是正在处理的新数据,但我没有得到此数据的 Hostname 、 Command 等字段。
dbproj01,19/02/01,00:04,23-hrs,cvial,cvial 120804 0.0 0.0 106096 1200 pts/90 S Jan30 0:00 /bin/sh -c /bin/rm -f ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.cxt ../../../../../../
tools.lnx86/dfII/etc/context/64bit/hBrowser.toc ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.aux ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.ini ; (CUR_DIR=`pwd` ;
cd ../../../../obj/linux-x86-64/optimize/bin/virtuoso ; ${CUR_DIR}/../../../../../../tools.lnx86/dfII/bin/virtuoso -ilLoadIL hBrowserBuildContext.il -log hBrowserBuildContext.log -nograph && [ `/bi
n/grep -c Error hBrowserBuildContext.log` = 0 ]) || (echo '*** Error: Failed to build hBrowser context.' ; /bin/rm -f ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.cxt ../../../../..
/../tools.lnx86/dfII/etc/context/64bit/hBrowser.toc ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.aux ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.ini ; exit 1),/pro
j/cvial/WS/BUNGEE/REBASE_190120-138_2/tools.lnx86/dfII/group/bin/src
我在 %{HOUR:hour2}:%{MINUTE:minute2}
值中看到你的问题,因为它 returns 是日期 Jan30
而不是时间,并且它包含在 %{DATA}
部分。
下面的模式将处理它
%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:hour2}:|)(?:%{MINUTE:minute2}|) (?:%{HOUR:hour3}:|)(?:%{MINUTE:minute3}|)%{GREEDYDATA:Command}
您也可以使用 Grok Debug 进行模式测试。
这仅与
Kibana 的正确视觉效果如下:
received_at:February 1st 2019, 21:00:04.105 float:0.5, 0.0 type:rmlog Hostname:dba- foxon93 Date:19/02/01 User_1:dv_vxehw @version:1 Hour_since:06 Command:rm -rf /data/rg/log
logstash 配置文件中的 grok 过滤器:
match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:Command}" }
我的 logstash 配置文件:
input {
file {
path => [ "/data/mylogs/*.txt" ]
start_position => beginning
sincedb_path => "/dev/null"
type => "tac"
}
}
filter {
if [type] == "tac" {
grok {
match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:Command}" }
add_field => [ "received_at", "%{@timestamp}" ]
remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
}
}
}
output {
if [type] == "rmlog" {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "tac-%{+YYYY.MM.dd}"
}
}
}
下面是正在处理的新数据,但我没有得到此数据的 Hostname 、 Command 等字段。
dbproj01,19/02/01,00:04,23-hrs,cvial,cvial 120804 0.0 0.0 106096 1200 pts/90 S Jan30 0:00 /bin/sh -c /bin/rm -f ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.cxt ../../../../../../
tools.lnx86/dfII/etc/context/64bit/hBrowser.toc ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.aux ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.ini ; (CUR_DIR=`pwd` ;
cd ../../../../obj/linux-x86-64/optimize/bin/virtuoso ; ${CUR_DIR}/../../../../../../tools.lnx86/dfII/bin/virtuoso -ilLoadIL hBrowserBuildContext.il -log hBrowserBuildContext.log -nograph && [ `/bi
n/grep -c Error hBrowserBuildContext.log` = 0 ]) || (echo '*** Error: Failed to build hBrowser context.' ; /bin/rm -f ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.cxt ../../../../..
/../tools.lnx86/dfII/etc/context/64bit/hBrowser.toc ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.aux ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.ini ; exit 1),/pro
j/cvial/WS/BUNGEE/REBASE_190120-138_2/tools.lnx86/dfII/group/bin/src
我在 %{HOUR:hour2}:%{MINUTE:minute2}
值中看到你的问题,因为它 returns 是日期 Jan30
而不是时间,并且它包含在 %{DATA}
部分。
下面的模式将处理它
%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:hour2}:|)(?:%{MINUTE:minute2}|) (?:%{HOUR:hour3}:|)(?:%{MINUTE:minute3}|)%{GREEDYDATA:Command}
您也可以使用 Grok Debug 进行模式测试。