Elasticsearch 聚合将带连字符的值拆分为单独的值

Elasticsearch aggregation with hyphenated values splitting into separate values

我正在尝试从 Elasticsearch 检索标签的聚合(带计数),但是在我使用连字符标记的地方,它们被拆分为单独的标签。

例如

{
    "tags": ['foo', 'foo-bar', 'cheese']
}

我回来了(删节):

{
  'foo': 8,
  'bar': 3,
  'cheese' : 2
}

当我期望得到:

{
  'foo': 5,
  'foo-bar': 3,
  'cheese' : 2
}

我的映射是:

{
    "asset" : {
        "properties" : {
            "name" : {"type" : "string"},
            "path" : {"type" : "string", "index" : "not_analyzed"},
            "url": {"type" : "string"},
            "tags" : {"type" : "string", "index_name" : "tag"},
            "created": {"type" : "date"},
            "updated": {"type" : "date"},
            "usages": {"type" : "string", "index_name" : "usage"},
            "meta": {"type": "object"}
        }
    }
}

谁能给我指出正确的方向?

尝试另一种分析器,而不是标准的分析器,它会在遇到某些字符时拆分单词:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_keyword_lowercase": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "trim"
          ]
        }
      }
    }
  },
  "mappings": {
    "asset" : {
        "properties" : {
            "name" : {"type" : "string"},
            "path" : {"type" : "string", "index" : "not_analyzed"},
            "url": {"type" : "string"},
            "tags" : {"type" : "string", "index_name" : "tag", "analyzer":"my_keyword_lowercase"},
            "created": {"type" : "date"},
            "updated": {"type" : "date"},
            "usages": {"type" : "string", "index_name" : "usage"},
            "meta": {"type": "object"}
        }
    }
  }
}