Elasticsearch - 不同的值,不重要

Elasticsearch - Distinct Values, Not Counts

我正在尝试做类似于此 SQL 查询的事情:

SELECT * FROM table WHERE fileContent LIKE '%keyword%' AND company_id = '1' GROUP BY email

已阅读帖子 我有这个:

{
    "query": {
        "bool": {
            "must": [{
                "match": {
                    "fileContent": {
                        "query": "keyword"
                    }
                }
            }],
            "filter": [{
                "terms": {
                    "company_id": [1]
                }
            }]
        }
    },
    "aggs": {
        "group_by_email": {
            "terms": {
                "field": "email",
                "size": 1000
            }
        }
    },
    "size": 0
}

字段映射为:

{
  "cvs" : {
    "mappings" : {
      "application" : {
        "_meta" : {
          "model" : "Acme\AppBundle\Entity\Application"
        },
        "dynamic_date_formats" : [ ],
        "properties" : {
          "email" : {
            "type" : "keyword"
          },
          "fileContent" : {
            "type" : "text"
          },
          "company_id" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

... 从 Symfony 生成 config.yml:

fos_elastica:
    clients:
        default:
            host: "%elastica.host%"
            port: "%elastica.port%"
    indexes:
        cvs:
            client: default
            types:
               application:
                    properties:
                        fileContent: ~
                        email:
                          index: not_analyzed
                        company_id: ~
                    persistence:
                        driver: orm
                        model: Acme\AppBundle\Entity\Application
                        provider: ~
                        finder: ~

过滤器工作正常,但我发现 hits:hits returns 没有项目(或者如果我删除 size:0 则所有结果匹配搜索)并且 aggregations:group_by_email:buckets 有对组的计数,但不是记录本身。分组的记录没有返回,我需要的就是这些。

如果这是您喜欢的风格,我也尝试过使用查询构建器来使用 FOSElasticBundle(这可行,但没有 grouping/aggregation):

$boolQuery = new \Elastica\Query\BoolQuery();

$filterKeywords = new \Elastica\Query\Match();
$filterKeywords->setFieldQuery('fileContent', 'keyword');
$boolQuery->addMust($filterKeywords);

$filterUser = new \Elastica\Query\Terms();
$filterUser->setTerms('company_id', array('1'));
$boolQuery->addFilter($filterUser);

$finder = $this->get('fos_elastica.finder.cvs.application');

谢谢。

为此,您需要 top_hits aggregation 在您已经在使用的 terms 中:

  "aggs": {
    "group_by_email": {
      "terms": {
        "field": "email",
        "size": 1000
      },
      "aggs": {
        "sample_docs": {
          "top_hits": {
            "size": 100
          }
        }
      }
    }
  }

top_hits:{size:1} 似乎是我所需要的,玩过安德烈的答案。这将为聚合中的每个桶 return 一条记录

  "aggs": {
    "group_by_email": {
      "terms": {
        "field": "email",
        "size": 1000
      },
      "aggs": {
        "sample_docs": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }

参考:top_hits

top_hits 也帮了我。我也遇到了一些麻烦,但最终想出了解决办法。 所以这是我的解决方案:

{
    "query": {
        "nested": {
            "path": "placedOrders",
            "query": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "placedOrders.ownerId": "0a9fdef0-4508-4f9c-aa8c-b3984e39ad1e"
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "custom_name1": {
            "nested": {
                "path": "placedOrders"
            },
            "aggs": {
                "custom_name2": {
                    "terms": {
                        "field": "placedOrders.propertyId"
                    },
                    "aggs": {
                        "custom_name3": {
                            "top_hits": {
                                "size": 1,
                                "sort": [
                                    {
                                        "placedOrders.propertyId": {
                                            "order": "desc"
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        }
    }
}