具有未知数据类型的 Elasticsearch 术语聚合

Question

我正在使用 dynamic mapping, i.e. we don't know the shape, datatypes, etc. of much of the data ahead of time. In queries, I want to be able to aggregate on any field. Strings are (by default) mapped as both text and keyword types, and only the latter can be aggregated on. So for strings my terms aggregations 在 Elasticsearch 中索引未知模式的数据必须如下所示：

"aggs": {
    "something": {
        "terms": {
            "field": "something.keyword"
        }
    }
}

但是数字和布尔值等其他类型没有此 .keyword 子字段，因此这些类型的聚合必须如下所示（对于文本字段将失败）：

"aggs": {
    "something": {
        "terms": {
            "field": "something"
        }
    }
}

有没有什么方法可以指定基本 "if something.keyword exists, use that, otherwise just use something" 的术语聚合，并且不会对性能造成重大影响？

要求在查询时提供数据类型信息对我来说可能是一个选项，但理想情况下我想尽可能避免它。

Answer 1

如果主要用例是聚合，可能值得将 string 属性的动态映射更改为索引为 keyword 数据类型，并将多字段子字段索引为text 数据类型，即 dynamic_templates

{
  "strings": {
    "match_mapping_type": "string",
    "mapping": {
      "type": "keyword",
      "ignore_above": 256,
      "fields": {
        "text": {
          "type": "text"
        }
      }
    }
  }
},

具有未知数据类型的 Elasticsearch 术语聚合

Elasticsearch Terms aggregation with unknown datatype

elasticsearch

nest

elasticsearch-aggregation