Elasticsearch：找到一个字段与另一个字段的重叠

Question

我正在尝试想办法在 Elasticsearch 中执行此操作，而无需进行多次查询，或在必要时使用 _mget。

我有很多这样结构的文档：

{
  'location': 'Orlando',
  'agent_id': 395205, 
},
{
  'location': 'Miami',
  'agent_id': 391773,
},
{
  'location': 'Miami',
  'agent_id': 391773,
},
{
  'location': 'Tampa',
  'agent_id': 395205,
}

有固定数量的 location 个值，但有许多唯一的 agent_id 个值。

我的最终目标是，给定一个位置列表，找到所有位置中都存在的 agent_id。所以在上面的例子中，给定 ['Orlando', 'Tampa']，我们得到 [395205]，因为它存在于两者中。一个位置可能有重复的 agent_ids（这是预期的行为），所以我不能使用计数（例如，显示 agent_ids 出现 n 次，其中 n = len(locations).

这里的另一个关键是，如果可能的话，我想实际 return 命中，而不是在聚合桶中。所以理想情况下 top_hits 可以嵌套在某个地方。

我认为这可以通过一些巧妙的过滤或一些严格的评分来实现，但我不确定如何处理这些问题。我已经使用多个查询完成了这项工作，但我发现这个过程太昂贵了，如果可能的话，我想简化它。我承认这实际上可能是不可能的。但很想听听其他人的意见。

Answer 1

代理下的唯一位置计数可用于查找常见代理

查询：

{
  "query": { --> select docs with give location
    "terms": {
      "location.keyword": [
        "Orlando",
        "Tampa"
      ]
    }
  },
  "aggs": {
    "agents": {
      "terms": {
        "field": "agent_id",  ---> List of agents
        "size": 10
      },
      "aggs": {
        "location": {         ---> Unique locations under a agent
          "terms": {
            "field": "location.keyword",
            "size": 10
          }
        },
        "my_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "count": "location._bucket_count" 
            },
            "script": "params.count==2" -->count of locations for agent, replace 2
                                        --> with needed count(number of locations)
          }
        }
      }
    }
  }
}

结果：

 [
      {
        "_index" : "index30",
        "_type" : "_doc",
        "_id" : "LXuksHABg1vns4B5FWL5",
        "_score" : 1.0,
        "_source" : {
          "location" : "Orlando",
          "agent_id" : 395205
        }
      },
      {
        "_index" : "index30",
        "_type" : "_doc",
        "_id" : "MHuksHABg1vns4B5OmKC",
        "_score" : 1.0,
        "_source" : {
          "location" : "Tampa",
          "agent_id" : 395205
        }
      }
    ]
  },
  "aggregations" : {
    "agents" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 395205,
          "doc_count" : 2,
          "location" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Orlando",
                "doc_count" : 1
              },
              {
                "key" : "Tampa",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }

Elasticsearch：找到一个字段与另一个字段的重叠

Elasticsearch: Finding the overlap of one field against another

python

intersection

aggregation

elasticsearch