ELK 数据摄取：elasticsearch 将文本视为布尔值，即使映射表示类型为 'text'

Question

我正在使用 ELK 7.4.1 和来自 here 的 docker，我需要从 MySQL 数据库中提取数据。其中一个表将此 'status' 字段定义为 varchar(128)。为此，我使用了 logstash jdbc 插件，但是当我启动 docker 图像时，我看到很多警告消息说 org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [status] of type [boolean] in document with id '34ZXb24BsfR1FhttyYWt'. Preview of field's value: 'Success'"。然而，令我困惑的是映射似乎是正确的："status": { "type": "text", ... } 并且数据似乎已成功摄取。

我什至尝试手动创建索引，然后在摄取数据之前放置映射，但这也无济于事。

知道为什么吗？

添加更多信息：

Table定义

CREATE TABLE records (
  id int(11) NOT NULL AUTO_INCREMENT,
  ...
  status varchar(128) NOT NULL DEFAULT '',
  ...
)

Elasticsearch 映射

{
 "properties": {
 ...
  "status": {
      "type": "text",
      "fields": {
          "keyword": {
              "type": "keyword",
              "ignore_above": 256
          }
      }
  },
 ...
}

数据示例

+-----------+-----------+
| id        | status    |
+-----------+-----------+
| 452172830 | success   |
| 452172835 | other     |
| 452172840 | success   |
...

更多信息 Elasticsearch 映射模板

PUT /_template/records_template
{
  "index_patterns": ["records"],
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
        "status": {
          "type": "text",
          "fields": {
              "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
              }
        }
      }
    }
  }
}

Logstash 配置文件

input {
    jdbc {
        tags => "records"
        jdbc_connection_string => "jdbc:mysql://10.0.2.15:3306/esd"
        jdbc_user => "dbuser"
        jdbc_password => "dbpass"
        schedule => "* * * * *"
        jdbc_validate_connection => true
        jdbc_paging_enabled => true
        jdbc_page_size => 100000
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        statement => "select * from records order by id asc"
    }
    ...
}
output {
    if "records" in [tags] {
        elasticsearch {
                hosts => "elasticsearch:9200"
                user => "elastic"
                password => "changeme"
                index => "records"
                template_name => "records_template"
                document_id => "%{id}"
        }
    }
    ...

Answer 1

看起来字段名称在某种程度上很重要。如果我将 select 子句更改为 select status as rd_status, ... 之类的内容，那么所有错误都将消失。不确定我是否遗漏了 elasticsearch 映射或 logstash 内部会尝试通过名称 status 来猜测数据类型（如果是后一种情况，我会感到惊讶）

ELK 数据摄取：elasticsearch 将文本视为布尔值，即使映射表示类型为 'text'

ELK data ingestion: elasticsearch treat text as boolean even though mapping says type is 'text'

mysql

jdbc

elasticsearch

logstash

logstash-jdbc