正确的 ELK 多行正则表达式?
Correct ELK multiline regular expression?
我是 ELK 的新手,我正在编写一个使用多行的配置文件,我们需要为输入数据编写一个模式
110000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
210000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
370000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
并且使用的配置文件是:
input {
file {
path => "/opt/test5/practice_new/xml_input.dat"
start_position => "beginning"
codec => multiline
{
pattern => "^%{INT}\|%{WORD}\|<soapenv:Envelope*>\|<soapenv"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "%{DATA:method_id}\|%{WORD:method_type}\|%{GREEDYDATA:request}\|%{GREEDYDATA:response}" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "xml"
}
stdout {}
}
但是其中使用的模式不符合我的要求。
请建议我正确的模式。
预期输出:
第一个日志
method_id- 110000
method type-
request-
response-
第二个日志
method id- 210000
method type-
request-
response-
其余类似。
首先你必须修复你的多行模式:
codec => multiline {
pattern => "^%{NUMBER:method_id}\|%{DATA:method_type}\|<soapenv:Envelope>"
negate => true
what => previous
}
之后您可以使用 Wiktor 在评论中建议的模式:
(?m)^(?<method_id>\d+)\|(?<method_type>\w+)\|(?<request><soapenv:Envelope>.*?</soapenv:Envelope>)\|(?<response><soapenv:Envelope>.*?</soapenv:Envelope>)
http://grokconstructor.appspot.com 上 post 中三个日志行的以下结果:
您的整个配置可能如下所示:
input {
file {
path => "/opt/test5/practice_new/xml_input.dat"
start_position => "beginning"
codec => multiline {
pattern => "^%{NUMBER:method_id}\|%{DATA:method_type}\|<soapenv:Envelope>"
negate => true
what => previous
}
}
}
filter {
grok {
match => [ "message", "(?m)^(?<method_id>\d+)\|(?<method_type>\w+)\|(?<request><soapenv:Envelope>.*?</soapenv:Envelope>)\|(?<response><soapenv:Envelope>.*?</soapenv:Envelope>)" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "xml"
}
stdout {}
}
我是 ELK 的新手,我正在编写一个使用多行的配置文件,我们需要为输入数据编写一个模式
110000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
210000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
370000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
并且使用的配置文件是:
input {
file {
path => "/opt/test5/practice_new/xml_input.dat"
start_position => "beginning"
codec => multiline
{
pattern => "^%{INT}\|%{WORD}\|<soapenv:Envelope*>\|<soapenv"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "%{DATA:method_id}\|%{WORD:method_type}\|%{GREEDYDATA:request}\|%{GREEDYDATA:response}" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "xml"
}
stdout {}
}
但是其中使用的模式不符合我的要求。
请建议我正确的模式。
预期输出:
第一个日志
method_id- 110000
method type-
request-
response-
第二个日志
method id- 210000
method type-
request-
response-
其余类似。
首先你必须修复你的多行模式:
codec => multiline {
pattern => "^%{NUMBER:method_id}\|%{DATA:method_type}\|<soapenv:Envelope>"
negate => true
what => previous
}
之后您可以使用 Wiktor 在评论中建议的模式:
(?m)^(?<method_id>\d+)\|(?<method_type>\w+)\|(?<request><soapenv:Envelope>.*?</soapenv:Envelope>)\|(?<response><soapenv:Envelope>.*?</soapenv:Envelope>)
http://grokconstructor.appspot.com 上 post 中三个日志行的以下结果:
您的整个配置可能如下所示:
input {
file {
path => "/opt/test5/practice_new/xml_input.dat"
start_position => "beginning"
codec => multiline {
pattern => "^%{NUMBER:method_id}\|%{DATA:method_type}\|<soapenv:Envelope>"
negate => true
what => previous
}
}
}
filter {
grok {
match => [ "message", "(?m)^(?<method_id>\d+)\|(?<method_type>\w+)\|(?<request><soapenv:Envelope>.*?</soapenv:Envelope>)\|(?<response><soapenv:Envelope>.*?</soapenv:Envelope>)" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "xml"
}
stdout {}
}