自 5.2 以来,grok 过滤器因 ISO8601 时间戳而失败

grok filter fails for ISO8601 timestamps since 5.2

自从我将 ELK-stack 从 5.0.2 升级到 5.2 后,我们的 grok 过滤器就失败了,我不知道为什么。也许我忽略了变更日志中的某些内容?

过滤器

filter {
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{TIMESTAMP_ISO8601:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} %{QS:http_referer} %{QS:http_user_agent} \"%{DATA:host_uri}\" \"%{DATA:proxy}\" \"%{DATA:upstream_addr}\" \"%{WORD:cache_status}\" \[%{NUMBER:request_time}\] \[(?:%{NUMBER:proxy_response_time}|-)\]" }
      add_field => [ "received_at", "%{@timestamp}" ]
    }
    mutate {
      convert => {
        "proxy_response_time" => "float"
        "request_time" => "float"
        "body_bytes_sent" => "integer"
      }
    }
  }
}

错误

Invalid format: \"2017-02-05T15:55:38+01:00\" is malformed at \"-02-05T15:55:38+01:00\"

完全错误

[2017-02-05T15:55:49,500][WARN ][logstash.outputs.elasticsearch] Failed action. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"filebeat-2017.02.05", :_type=>"nginx_access", :_routing=>nil}, 2017-02-05T14:55:38.000Z proxy2 4.3.2.1 - - [2017-02-05T15:55:38+01:00] "HEAD / HTTP/1.1" 200 0 "-" "Zabbix" "example.com" "host1:10040" "1.2.3.4:10040" "MISS" [0.095] [0.095]], :response=>{"index"=>{"_index"=>"filebeat-2017.02.05", "_type"=>"nginx_access", "_id"=>"AVoOxh7p5p68dsalXDFX", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [timestamp]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2017-02-05T15:55:38+01:00\" is malformed at \"-02-05T15:55:38+01:00\""}}}}}

整个过程在 http://grokconstructor.appspot.com and the TIMESTAMP_ISO8601 still seems the right choice (https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns)

上完美运行

Techstack

有idas吗?

干杯, 芬恩

更新

所以这个版本出于某种原因可以工作

filter {
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{TIMESTAMP_ISO8601:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} %{QS:http_referer} %{QS:http_user_agent} \"%{DATA:host_uri}\" \"%{DATA:proxy}\" \"%{DATA:upstream_addr}\" \"%{WORD:cache_status}\" \[%{NUMBER:request_time}\] \[(?:%{NUMBER:proxy_response_time}|-)\]" }
      add_field => [ "received_at", "%{@timestamp}" ]
    }
    date {
        match => [ "timestamp" , "yyyy-MM-dd'T'HH:mm:ssZ" ]
        target => "timestamp"
    }
    mutate {
      convert => {
        "proxy_response_time" => "float"
        "request_time" => "float"
        "body_bytes_sent" => "integer"
      }
    }
  }
}

如果有人能阐明为什么我必须重新定义有效的 ISO8601 日期,我会很高兴知道。

确保在文档中指定 timestampformat,映射可能如下所示:

PUT index
{
  "mappings": {
    "your_index_type": {
      "properties": {
        "date": {
          "type":   "date",
          "format": "yyyy-MM-ddTHH:mm:ss+01:SS" <-- make sure to give the correct one
        }
      }
    }
  }
}

如果您没有正确指定,Elasticsearch 将期望 ISO 格式的 timestamp 值。 你可以为你的 timestamp 字段做一个 date match,它在你的 filter 中看起来像这样:

date {
    match => [ "timestamp" , "yyyy-MM-ddTHH:mm:ss+01:SS" ] <--match the timestamp (I'm not sure what +01:ss stands for, make sure it matches)
    target => "timestamp"
    locale => "en"
    timezone => "UTC"
}

或者如果您愿意,您可以添加一个新字段并将其与 时间戳 匹配,然后如果您没有真正使用它,则可以将其删除,因为您有新字段的时间戳。希望对你有帮助。