ElasticSearch 按文档字段分组并计算出现次数

ElasticSearch group by documents field and count occurences

我的 ElasticSearch 6.5.2 索引看起来像:

      {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cCYuHW4BvwH6Y3jL87ul",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cSYuHW4BvwH6Y3jL_Lvt",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "eCb6O24BvwH6Y3jLP7tM",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "industry",
    }

我想查询 return 这个结果:

"result": 
{
"querySearched" : "telecom",
"number" : 2
},
{
"querySearched" : "industry",
"number" : 1
}

我只想按发生次数分组并获取每个数字的数量,限制为十个最大的数字。我尝试使用聚合,但桶是空的。 谢谢!

你尝试了什么?

POST /searches/_search

   {
      "size": 0,
      "aggs": {
        "byquerySearched": {
          "terms": {
            "field": "querySearched",
             "size": 10
          }
        }
      }
    }

案例你的映射

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

您的查询应如下所示

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched",
        "size": 10
      }
    }
  }
}

您应该添加 fielddata:true 以便为 text 类型字段启用聚合 more of that

    "size": 10, => limit to 10
    

在与@Kamal 进行简短讨论后,我觉得有义务让您知道,如果您选择启用 fielddata:true,您必须知道 它会消耗大量堆 space.

来自 link 我分享的:

Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

另一种选择(更有效的选择):

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
        }
      }
    }
  }
}

然后你的聚合查询

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched.keyword",
        "size": 10
      }
    }
  }
}

两种解决方案都有效,但您应该考虑 this

希望对您有所帮助