使用术语聚合执行搜索时如何 return 实际值(不是小写)?

How to return actual value (not lowercase) when performing search with terms aggregation?

我正在开发一个 ElasticSearch (6.2) 项目,其中 index 有许多 keyword 字段,并且它们使用 lowercase 过滤器进行规范化以执行不区分大小写的搜索。搜索效果很好,returning 了规范化字段的实际值(不是小写)。但是,聚合未 return 字段的实际值(returning 小写)。

以下示例取自 ElasticSearch 文档。

https://www.elastic.co/guide/en/elasticsearch/reference/master/normalizer.html

正在创建索引:

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

正在插入文档:

PUT index/_doc/1
{
  "foo": "Bar"
}

PUT index/_doc/2
{
  "foo": "Baz"
}

聚合搜索:

GET index/_search
{
  "size": 0,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

结果:

{
  "took": 43,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": {
    "total": 2,
    "max_score": 0.47000363,
    "hits": [
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.47000363,
        "_source": {
          "foo": "Bar"
        }
      },
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.47000363,
        "_source": {
          "foo": "Baz"
        }
      }
    ]
  }
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 1
        },
        {
          "key": "baz",
          "doc_count": 1
        }
      ]
    }
  }
}

如果您检查聚合,您会看到小写值已 returned。例如"key": "bar"

有什么方法可以将聚合更改为 return 实际值?

例如"key": "Bar"

如果您想在聚合中进行不区分大小写的搜索 return 精确值,则不需要任何规范化程序。您可以简单地使用 text 字段(将标记小写并默认允许不区分大小写的搜索)和 keyword 子字段。您将使用前者进行搜索,将后者用于聚合。它是这样的:

PUT index
{
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

为您的两个文档编制索引后,您可以在 foo.keyword:

上发出 terms 聚合
GET index/_search
{
  "size": 2,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo.keyword"
      }
    }
  }
}

结果将如下所示:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "foo": "Baz"
        }
      },
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "foo": "Bar"
        }
      }
    ]
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Bar",
          "doc_count": 1
        },
        {
          "key": "Baz",
          "doc_count": 1
        }
      ]
    }
  }
}