Logstash 中的节拍输入正在丢失字段
Beat input in Logstash is losing fields
我有以下基础设施:
ELK 安装为 docker 个容器,每个容器都在自己的容器中。在虚拟机 运行 CentOS 上,我安装了 nginx 网络服务器和 Filebeat 来收集日志。
我在filebeat中启用了nginx模块。
> filebeat modules enable nginx
在启动 filebeat 之前,我使用 elasticsearch 对其进行了设置,并将其仪表板安装在 kibana 上。
配置文件(我已经从文件中删除了不必要的注释):
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.kibana:
host: "172.17.0.1:5601"
output.elasticsearch:
hosts: ["172.17.0.1:9200"]
然后在 elasticsearch 和 kibana 中设置它
> filebeat setup -e --dashboards
This works fine. In fact if I keep it this way everything works perfectly. I can use the collected logs in kibana and use the dashboards for NGinX I installed with the above command.
我想将日志传递给 Logstash。
这是我的 Logstash 配置使用以下管道:
- pipeline.id: filebeat
path.config: "config/filebeat.conf"
filebeat.conf:
input {
beats {
port => 5044
}
}
#filter {
# mutate {
# add_tag => ["filebeat"]
# }
#}
output {
elasticsearch {
hosts => ["elasticsearch0:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
stdout { }
}
使日志通过 Logstash 生成的日志只是:
{
"offset" => 6655,
"@version" => "1",
"@timestamp" => 2019-02-20T13:34:06.886Z,
"message" => "10.0.2.2 - - [20/Feb/2019:08:33:58 -0500] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36\" \"-\"",
"beat" => {
"version" => "6.5.4",
"name" => "localhost.localdomain",
"hostname" => "localhost.localdomain"
},
"source" => "/var/log/nginx/access.log",
"host" => {
"os" => {
"version" => "7 (Core)",
"codename" => "Core",
"family" => "redhat",
"platform" => "centos"
},
"name" => "localhost.localdomain",
"id" => "18e7cb2506624fb6ae2dc3891d5d7172",
"containerized" => true,
"architecture" => "x86_64"
},
"fileset" => {
"name" => "access",
"module" => "nginx"
},
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"input" => {
"type" => "log"
},
"prospector" => {
"type" => "log"
}
}
我的对象中缺少很多字段。应该有更多的结构化信息
UPDATE: This is what I'm expecting instead
{
"_index": "filebeat-6.5.4-2019.02.20",
"_type": "doc",
"_id": "ssJPC2kBLsya0HU-3uwW",
"_version": 1,
"_score": null,
"_source": {
"offset": 9639,
"nginx": {
"access": {
"referrer": "-",
"response_code": "404",
"remote_ip": "10.0.2.2",
"method": "GET",
"user_name": "-",
"http_version": "1.1",
"body_sent": {
"bytes": "3650"
},
"remote_ip_list": [
"10.0.2.2"
],
"url": "/access",
"user_agent": {
"patch": "3578",
"original": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36",
"major": "71",
"minor": "0",
"os": "Ubuntu",
"name": "Chromium",
"os_name": "Ubuntu",
"device": "Other"
}
}
},
"prospector": {
"type": "log"
},
"read_timestamp": "2019-02-20T14:29:36.393Z",
"source": "/var/log/nginx/access.log",
"fileset": {
"module": "nginx",
"name": "access"
},
"input": {
"type": "log"
},
"@timestamp": "2019-02-20T14:29:32.000Z",
"host": {
"os": {
"codename": "Core",
"family": "redhat",
"version": "7 (Core)",
"platform": "centos"
},
"containerized": true,
"name": "localhost.localdomain",
"id": "18e7cb2506624fb6ae2dc3891d5d7172",
"architecture": "x86_64"
},
"beat": {
"hostname": "localhost.localdomain",
"name": "localhost.localdomain",
"version": "6.5.4"
}
},
"fields": {
"@timestamp": [
"2019-02-20T14:29:32.000Z"
]
},
"sort": [
1550672972000
]
}
从您的 logstash 配置来看,您似乎没有在解析日志消息。
logstash 文档中有 example 介绍如何解析 nginx 日志:
Nginx Logs
The Logstash pipeline configuration in this example shows how to ship and parse access and error logs collected by the nginx Filebeat module.
input {
beats {
port => 5044
host => "0.0.0.0"
}
}
filter {
if [fileset][module] == "nginx" {
if [fileset][name] == "access" {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "[nginx][access][geoip]"
}
}
else if [fileset][name] == "error" {
grok {
match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
remove_field => "message"
}
mutate {
rename => { "@timestamp" => "read_timestamp" }
}
date {
match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
remove_field => "[nginx][error][time]"
}
}
}
}
我知道它没有解决为什么 filebeat 不将完整对象发送到 logstash 的问题,但它应该开始介绍如何在 logstash 中解析 nginx 日志。
@baudsp 提供的答案大部分正确,但不完整。我遇到了完全相同的问题,而且我也遇到了完全相同的 filter mentioned in the documentation(在@baudsp 的回答中),但是 Elastic Search 中的文档仍然不包含任何预期的字段。
我终于找到了问题:因为我将 Filebeat 配置为通过 Nginx module and not the Log input 发送 Nginx 日志,来自 Logbeat 的数据与示例 Logstash 过滤器所期望的不完全匹配。
示例中的条件是 if [fileset][module] == "nginx"
,如果 Filebeat 从 Log 输入发送数据,这是正确的。但是,由于日志数据来自 Nginx 模块,因此 fileset
属性 不包含 module
属性.
要使过滤器与来自 Nginx 模块的 Logstash 数据一起工作,需要修改条件以查找其他内容。我发现 [event][module]
可以代替 [fileset][module]
。
工作过滤器:
filter {
if [event][module] == "nginx" {
if [fileset][name] == "access" {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "[nginx][access][geoip]"
}
}
else if [fileset][name] == "error" {
grok {
match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
remove_field => "message"
}
mutate {
rename => { "@timestamp" => "read_timestamp" }
}
date {
match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
remove_field => "[nginx][error][time]"
}
}
}
}
现在,Elastic Search 中的文档具有所有预期的字段:
Note: You'll have the same problem with other Filebeat modules, too. Just use [event][module]
in place of [fileset][module]
.
我有以下基础设施:
ELK 安装为 docker 个容器,每个容器都在自己的容器中。在虚拟机 运行 CentOS 上,我安装了 nginx 网络服务器和 Filebeat 来收集日志。 我在filebeat中启用了nginx模块。
> filebeat modules enable nginx
在启动 filebeat 之前,我使用 elasticsearch 对其进行了设置,并将其仪表板安装在 kibana 上。
配置文件(我已经从文件中删除了不必要的注释):
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.kibana:
host: "172.17.0.1:5601"
output.elasticsearch:
hosts: ["172.17.0.1:9200"]
然后在 elasticsearch 和 kibana 中设置它
> filebeat setup -e --dashboards
This works fine. In fact if I keep it this way everything works perfectly. I can use the collected logs in kibana and use the dashboards for NGinX I installed with the above command.
我想将日志传递给 Logstash。 这是我的 Logstash 配置使用以下管道:
- pipeline.id: filebeat
path.config: "config/filebeat.conf"
filebeat.conf:
input {
beats {
port => 5044
}
}
#filter {
# mutate {
# add_tag => ["filebeat"]
# }
#}
output {
elasticsearch {
hosts => ["elasticsearch0:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
stdout { }
}
使日志通过 Logstash 生成的日志只是:
{
"offset" => 6655,
"@version" => "1",
"@timestamp" => 2019-02-20T13:34:06.886Z,
"message" => "10.0.2.2 - - [20/Feb/2019:08:33:58 -0500] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36\" \"-\"",
"beat" => {
"version" => "6.5.4",
"name" => "localhost.localdomain",
"hostname" => "localhost.localdomain"
},
"source" => "/var/log/nginx/access.log",
"host" => {
"os" => {
"version" => "7 (Core)",
"codename" => "Core",
"family" => "redhat",
"platform" => "centos"
},
"name" => "localhost.localdomain",
"id" => "18e7cb2506624fb6ae2dc3891d5d7172",
"containerized" => true,
"architecture" => "x86_64"
},
"fileset" => {
"name" => "access",
"module" => "nginx"
},
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"input" => {
"type" => "log"
},
"prospector" => {
"type" => "log"
}
}
我的对象中缺少很多字段。应该有更多的结构化信息
UPDATE: This is what I'm expecting instead
{
"_index": "filebeat-6.5.4-2019.02.20",
"_type": "doc",
"_id": "ssJPC2kBLsya0HU-3uwW",
"_version": 1,
"_score": null,
"_source": {
"offset": 9639,
"nginx": {
"access": {
"referrer": "-",
"response_code": "404",
"remote_ip": "10.0.2.2",
"method": "GET",
"user_name": "-",
"http_version": "1.1",
"body_sent": {
"bytes": "3650"
},
"remote_ip_list": [
"10.0.2.2"
],
"url": "/access",
"user_agent": {
"patch": "3578",
"original": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36",
"major": "71",
"minor": "0",
"os": "Ubuntu",
"name": "Chromium",
"os_name": "Ubuntu",
"device": "Other"
}
}
},
"prospector": {
"type": "log"
},
"read_timestamp": "2019-02-20T14:29:36.393Z",
"source": "/var/log/nginx/access.log",
"fileset": {
"module": "nginx",
"name": "access"
},
"input": {
"type": "log"
},
"@timestamp": "2019-02-20T14:29:32.000Z",
"host": {
"os": {
"codename": "Core",
"family": "redhat",
"version": "7 (Core)",
"platform": "centos"
},
"containerized": true,
"name": "localhost.localdomain",
"id": "18e7cb2506624fb6ae2dc3891d5d7172",
"architecture": "x86_64"
},
"beat": {
"hostname": "localhost.localdomain",
"name": "localhost.localdomain",
"version": "6.5.4"
}
},
"fields": {
"@timestamp": [
"2019-02-20T14:29:32.000Z"
]
},
"sort": [
1550672972000
]
}
从您的 logstash 配置来看,您似乎没有在解析日志消息。
logstash 文档中有 example 介绍如何解析 nginx 日志:
Nginx Logs
The Logstash pipeline configuration in this example shows how to ship and parse access and error logs collected by the nginx Filebeat module.
input { beats { port => 5044 host => "0.0.0.0" } } filter { if [fileset][module] == "nginx" { if [fileset][name] == "access" { grok { match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] } remove_field => "message" } mutate { add_field => { "read_timestamp" => "%{@timestamp}" } } date { match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ] remove_field => "[nginx][access][time]" } useragent { source => "[nginx][access][agent]" target => "[nginx][access][user_agent]" remove_field => "[nginx][access][agent]" } geoip { source => "[nginx][access][remote_ip]" target => "[nginx][access][geoip]" } } else if [fileset][name] == "error" { grok { match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] } remove_field => "message" } mutate { rename => { "@timestamp" => "read_timestamp" } } date { match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ] remove_field => "[nginx][error][time]" } } } }
我知道它没有解决为什么 filebeat 不将完整对象发送到 logstash 的问题,但它应该开始介绍如何在 logstash 中解析 nginx 日志。
@baudsp 提供的答案大部分正确,但不完整。我遇到了完全相同的问题,而且我也遇到了完全相同的 filter mentioned in the documentation(在@baudsp 的回答中),但是 Elastic Search 中的文档仍然不包含任何预期的字段。
我终于找到了问题:因为我将 Filebeat 配置为通过 Nginx module and not the Log input 发送 Nginx 日志,来自 Logbeat 的数据与示例 Logstash 过滤器所期望的不完全匹配。
示例中的条件是 if [fileset][module] == "nginx"
,如果 Filebeat 从 Log 输入发送数据,这是正确的。但是,由于日志数据来自 Nginx 模块,因此 fileset
属性 不包含 module
属性.
要使过滤器与来自 Nginx 模块的 Logstash 数据一起工作,需要修改条件以查找其他内容。我发现 [event][module]
可以代替 [fileset][module]
。
工作过滤器:
filter {
if [event][module] == "nginx" {
if [fileset][name] == "access" {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "[nginx][access][geoip]"
}
}
else if [fileset][name] == "error" {
grok {
match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
remove_field => "message"
}
mutate {
rename => { "@timestamp" => "read_timestamp" }
}
date {
match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
remove_field => "[nginx][error][time]"
}
}
}
}
现在,Elastic Search 中的文档具有所有预期的字段:
Note: You'll have the same problem with other Filebeat modules, too. Just use
[event][module]
in place of[fileset][module]
.