使用 spring 数据获取 elasticsearch 中某个字段的所有不同值

Get all different values for a certain field in elasticsearch with spring data

我正在尝试使用 SpringData 在 ElasticSearch 中获取特定字段(例如“名称”)的所有不同值。

作为第一种方法,我有这个 JSON 可以满足我的要求:

{
    "aggs" : {
        "nameAgg" : {
            "terms" : { "field" : "name.key", "size":10000 }
        }
    },
    "size":0
}

如果我对索引执行 GET,它工作正常,检索数据如下:

"aggregations": {
        "nameAgg": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Paul",
                    "doc_count": 12
                },
                {
                    "key": "John",
                    "doc_count": 7
                }
]
}
}

因为我只需要该字段的所有不同值,所以这符合我的需要。 现在,我试图在 Java 中使用 spring-data-elasticsearch 实现这一点。

我做的最接近的尝试是这样的:

AbstractAggregationBuilder<TermsAggregationBuilder> agBuilder = AggregationBuilders.terms("name.key").field("name.key").size(10000);
    
Query query = new NativeSearchQueryBuilder().withQuery(QueryBuilders.matchAllQuery()).addAggregation(agBuilder).build();

但输出包含每个索引文档的所有数据,与我的预期输出不符。 我如何使用 spring-data-elasticsearch 执行此类请求?

更新: 我试图为查询指定一个 Pageable 参数,就像这样:

AbstractAggregationBuilder<TermsAggregationBuilder> agBuilder = AggregationBuilders.terms("name.key").field("name.key");
    
    Query query = new NativeSearchQueryBuilder()   
        .withPageable(PageRequest.of(0, 0))
        .withQuery(QueryBuilders.matchAllQuery())
        .addAggregation(agBuilder).build();

但后来我收到了这个异常:

org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.IllegalArgumentException: Page size must not be less than one!
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:497)
Caused by: java.lang.IllegalArgumentException: Page size must not be less than one!
    at org.springframework.data.domain.AbstractPageRequest.<init>(AbstractPageRequest.java:50)
    at org.springframework.data.domain.PageRequest.<init>(PageRequest.java:43)
    at org.springframework.data.domain.PageRequest.of(PageRequest.java:70)
    at org.springframework.data.domain.PageRequest.of(PageRequest.java:58)

您已在 JSON 中设置 size: 0,因此您需要在 spring:

Query query = new NativeSearchQueryBuilder()   
                     .withPageable(PageRequest.of(0, 0)    //  <-- (page = 0, size = 0) <--
                     .withQuery(QueryBuilders.matchAllQuery())
                     .addAggregation(agBuilder).build();

我找到了解决方案。

由于 spring 如果您尝试使用大小为 0 的页面,数据会抛出异常,我做了一个解决方法,创建了一个 class 实现 Pageable 接口:

public class CustomBlankPage implements Pageable {
  
  public static final CustomBlankPage PAGE = new CustomBlankPage();

  @Override
  public int getPageNumber() {
    return 0;
  }

  @Override
  public int getPageSize() {
    return 0;
  }

  @Override
  public long getOffset() {
    return 0;
  }

  @Override
  public Sort getSort() {
    return null;
  }

  @Override
  public Pageable next() {
    return null;
  }

  @Override
  public Pageable previousOrFirst() {
    return null;
  }

  @Override
  public Pageable first() {
    return null;
  }

  @Override
  public boolean hasPrevious() {
    return false;
  }
}

搜索的含义如下:

AbstractAggregationBuilder<TermsAggregationBuilder> agBuilder = AggregationBuilders.terms("name.key").field("name.key").size(10000);
    
Query query = new NativeSearchQueryBuilder()   
        .withPageable(CustomBlankPage.PAGE)
        .withQuery(QueryBuilders.matchAllQuery())
        .addAggregation(agBuilder).build(); 

然后我探索使用此代码检索的结果:

SearchHits<Person> hits = operations.search(query, Person.class);
Aggregations aggs = hits.getAggregations();
ParsedStringTerms namesTerm = (ParsedStringTerms) aggs.get("name.key");
List<String> keys = namesTerm.getBuckets()
      .stream()
      .map(b -> b.getKeyAsString())
      .collect(Collectors.toList());

NativeSearchQueryBuilder 有一个 withMaxResults(Integer) 方法可以满足您的需要。