Elasticsearch top_hits聚合结果与doc_count不同

Elasticsearch top_hits aggregation result and doc_count are different

查询

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "field": "holderInfo.raw",
            "size": 50
         },
         "aggregations": {
            "tops": {
               "top_hits": {
                  "_source": {
                     "includes": ["uid"]
                  }
               }
            }
         }
      }
   }
}

结果

{
   ...
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "498": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "MATSUSHITA ELECTRIC INDUSTRIAL",
               "doc_count": 5,
               "tops": {
                  "hits": {
                     "total": 5,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "someindex",
                           "_id": "03a3",
                           "_score": 1,
                           "_source": {
                              "uid": "03a3"
                           }
                        },
                        {
                           "_index": "someindex",
                           "_id": "08a2",
                           "_score": 1,
                           "_source": {
                              "uid": "08a2"
                           }
                        },
                        {
                           "_index": "someindex",
                           "_id": "84a1",
                           "_score": 1,
                           "_source": {
                              "uid": "84a1"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

“08a2”、“08a3”、“03a2”、“03a3”和“84a1”在 holderInfo.raw 字段中显然都有 'MATSUSHITA ELECTRIC INDUSTRIAL'。

因此doc_count中有5种情况,但top_hits结果中只输出“03a3”、“08a2”、“84a1”,而“08a3”和“ 03a2" 省略.

查询

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "script": {
               "inline": "doc['holderInfo.raw'].value"
            },
            "size": 50
         }
      }
   }
}

结果

{
   ...
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "498": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "MATSUSHITA ELECTRIC INDUSTRIAL",
               "doc_count": 3
            }
         ]
      }
   }
}

另外,脚本聚合时省略了两种情况

我想知道为什么缺少一些 uid。

我处于必须使用 elasticsearch 2.2 版的情况。想知道是elasticsearch的bug出现在旧版本还是用户的错

谢谢!

默认情况下,top_hits aggregation returns 前 3 个热门点击。你只需要增加大小参数:

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "field": "holderInfo.raw",
            "size": 50
         },
         "aggregations": {
            "tops": {
               "top_hits": {
                  "size": 5,                   <---- add this
                  "_source": {
                     "includes": ["uid"]
                  }
               }
            }
         }
      }
   }
}