按数组中出现的次数计数和排序

Count and sort by the number of occurences in an array

我有一个名为 account 的类型,具有以下映射:

        "country" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "followingClientIds" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "fielddata" : true
        },

followingClientIds 是我关注的其他帐户的字符串 ID 数组。

我想构建一个查询,从一个国家/地区获取每个帐户,并根据我们共同关注的帐户数量对它们进行排序。

以下是我到目前为止所做的一些查询:


GET account/_search
{
  "size": 20,
  "query": {
    "bool": {
      "filter": {
        "term": {
          "country.keyword": "AT"
        }
      }
    }
  },
  "sort": [
    {
      "followingClientIds.keyword": {
        "order": "asc",
        "nested_filter": {
          "terms": {
            "followingClientIds.keyword": [
              "dFbEW23hVZ3w8jhH9LeCw3QG33UjuF5C"
            ]
          }
        }
      }
    }
  ]
}

例如,我在帐户类型中有这 3 个文档:

{
    "username": "user2",
    "country": "AT",
    "followingClientIds": ["abc"]
},
{
    "username": "user3",
    "country": "AT",
    "followingClientIds": ["abc", "bcd", "cde"]
},
{
    "username": "user4",
    "country": "AT",
    "followingClientIds": ["abc"]
}

假设我将 countryfollowingClientIds 发送到查询进行排序:

{
    "country": "AT",
    "followingClientIds": ["abc", "bcd", "cde"]
}

我希望结果是这样的:

{
    "username": "user3",
    "country": "AT",
    "followingClientIds": ["abc", "bcd", "cde"],
    "fields": [ // dont really need this custom field, but would be cool
        "mutual_following_count": 3
    ]
},
{
    "username": "user2",
    "country": "AT",
    "followingClientIds": ["abc"],
    "fields": [
        "mutual_following_count": 1
    ]
},
{
    "username": "user4",
    "country": "AT",
    "followingClientIds": ["abc"],
    "fields": [
        "mutual_following_count": 1
    ]
}

如果您正在寻找独立的,computed field called mutual_following_count, you can do just that with the script below. But you won't be able to sort on it

唯一的其他选择是脚本排序,它首先计算一个值,然后按它排序。结果查询可能如下所示:

{
  "size": 20,
  "query": {
    "bool": {
      "filter": {
        "term": {
          "country.keyword": "AT"
        }
      }
    }
  },
  "sort": [
    {
      "_script": {
        "type": "number",
        "order": "desc", 
        "script": {
          "lang": "painless", 
          "params": {
            "followingClientIds": ["abc", "bcd", "cde"]
          },
          "source": """
            // deduplicate
            def fromSource = doc.followingClientIds
                                .stream()
                                .distinct()
                                .collect(Collectors.toList());
            def fromParams = params.followingClientIds
                                   .stream()
                                   .distinct()
                                   .collect(Collectors.toList());
            
            // size() is a float so cast
            return (int) fromParams.findAll(x -> fromSource.contains(x)).size();
          """
        }
      }
    }
  ]
}

缺点是您不能 'name' 那样排序。 mutual_following_count 和其他任何东西都没有。