使用术语聚合执行搜索时如何 return 实际值（不是小写）？

Question

我正在开发一个 ElasticSearch (6.2) 项目，其中 index 有许多 keyword 字段，并且它们使用 lowercase 过滤器进行规范化以执行不区分大小写的搜索。搜索效果很好，returning 了规范化字段的实际值（不是小写）。但是，聚合未 return 字段的实际值（returning 小写）。

以下示例取自 ElasticSearch 文档。

https://www.elastic.co/guide/en/elasticsearch/reference/master/normalizer.html

正在创建索引：

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

正在插入文档：

PUT index/_doc/1
{
  "foo": "Bar"
}

PUT index/_doc/2
{
  "foo": "Baz"
}

聚合搜索：

GET index/_search
{
  "size": 0,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

结果：

{
  "took": 43,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": {
    "total": 2,
    "max_score": 0.47000363,
    "hits": [
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.47000363,
        "_source": {
          "foo": "Bar"
        }
      },
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.47000363,
        "_source": {
          "foo": "Baz"
        }
      }
    ]
  }
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 1
        },
        {
          "key": "baz",
          "doc_count": 1
        }
      ]
    }
  }
}

如果您检查聚合，您会看到小写值已 returned。例如"key": "bar"。

有什么方法可以将聚合更改为 return 实际值？

例如"key": "Bar"

Answer 1

如果您想在聚合中进行不区分大小写的搜索 return 精确值，则不需要任何规范化程序。您可以简单地使用 text 字段（将标记小写并默认允许不区分大小写的搜索）和 keyword 子字段。您将使用前者进行搜索，将后者用于聚合。它是这样的：

PUT index
{
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

为您的两个文档编制索引后，您可以在 foo.keyword:

上发出 terms 聚合

GET index/_search
{
  "size": 2,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo.keyword"
      }
    }
  }
}

结果将如下所示：

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "foo": "Baz"
        }
      },
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "foo": "Bar"
        }
      }
    ]
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Bar",
          "doc_count": 1
        },
        {
          "key": "Baz",
          "doc_count": 1
        }
      ]
    }
  }
}

使用术语聚合执行搜索时如何 return 实际值（不是小写）？

How to return actual value (not lowercase) when performing search with terms aggregation?

elasticsearch

elasticsearch-aggregation