ElasticSearch DateHistogram聚合填充缺失数据
ElasticSearch DateHistogram Aggregation Fill Missing Data
我正在尝试使用 ElasticSearch spring 数据进行一些聚合
这是我的查询
final FilteredQueryBuilder filteredQuery = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.andFilter(FilterBuilders.termFilter("gender", "F"),
FilterBuilders.termFilter("place", "Arizona"),
FilterBuilders.rangeFilter("dob").from(from).to(to)));
final MetricsAggregationBuilder<?> aggregateArtifactcount = AggregationBuilders.sum("delivery")
.field("birth");
final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY).field("dob")
.interval(DateHistogram.Interval.DAY).subAggregation(aggregateArtifactcount);
final SearchQuery query = new NativeSearchQueryBuilder().withIndices(index).withTypes(type)
.withQuery(filteredQuery).addAggregation(dailyDateHistogarm).build();
return elasticsearchTemplate.query(query, new DailyDeliveryAggregation());
这也是我的聚合
public class DailyDeliveryAggregation implements ResultsExtractor<List<DailyDeliverySum>> {
@SuppressWarnings("unchecked")
@Override
public List<DailyDeliverySum> extract(final SearchResponse response) {
final List<DailyDeliverySum> dailyDeliverySum = new ArrayList<DailyDeliverySum>();
final Aggregations aggregations = response.getAggregations();
final DateHistogram daily = aggregations.get(AggregationConstants.DAILY);
final List<DateHistogram.Bucket> buckets = (List<DateHistogram.Bucket>) daily.getBuckets();
for (final DateHistogram.Bucket bucket : buckets) {
final Sum sum = (Sum) bucket.getAggregations().getAsMap().get("delivery");
final int deliverySum = (int) sum.getValue();
final int delivery = (int) bucket.getDocCount();
final String dateString = bucket.getKeyAsText().string();
dailyDeliverySum.add(new DailyDeliverySum(deliverySum, delivery, dateString));
}
return dailyDeliverySum;
}
}
它给了我正确的数据,但它并不能满足我所有的需求
假设如果我查询 10 天的时间范围,如果给定时间范围内的日期没有数据它会在日期直方图桶中错过该日期,但我想将 0 设置为聚合和文档计数的默认值(如果有)没有可用数据
有什么办法吗??
是的,您可以使用 date_histogram
聚合的 "minimum document count" feature 并将其设置为 0。这样,您还将获得不包含任何数据的存储桶:
final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY)
.field("dob")
.minDocCount(0) <--- add this line
.interval(DateHistogram.Interval.DAY)
.subAggregation(aggregateArtifactcount);
来自@Val 的示例本身对我不起作用(我使用的是高级 API 和 ElasticSearch 6.2.x)。什么起作用了,告诉聚合应该将缺失值处理为 0:
final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY)
.field("dob")
.minDocCount(0)
.missing(0)
.interval(DateHistogram.Interval.DAY)
.subAggregation(aggregateArtifactcount);
我正在尝试使用 ElasticSearch spring 数据进行一些聚合
这是我的查询
final FilteredQueryBuilder filteredQuery = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.andFilter(FilterBuilders.termFilter("gender", "F"),
FilterBuilders.termFilter("place", "Arizona"),
FilterBuilders.rangeFilter("dob").from(from).to(to)));
final MetricsAggregationBuilder<?> aggregateArtifactcount = AggregationBuilders.sum("delivery")
.field("birth");
final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY).field("dob")
.interval(DateHistogram.Interval.DAY).subAggregation(aggregateArtifactcount);
final SearchQuery query = new NativeSearchQueryBuilder().withIndices(index).withTypes(type)
.withQuery(filteredQuery).addAggregation(dailyDateHistogarm).build();
return elasticsearchTemplate.query(query, new DailyDeliveryAggregation());
这也是我的聚合
public class DailyDeliveryAggregation implements ResultsExtractor<List<DailyDeliverySum>> {
@SuppressWarnings("unchecked")
@Override
public List<DailyDeliverySum> extract(final SearchResponse response) {
final List<DailyDeliverySum> dailyDeliverySum = new ArrayList<DailyDeliverySum>();
final Aggregations aggregations = response.getAggregations();
final DateHistogram daily = aggregations.get(AggregationConstants.DAILY);
final List<DateHistogram.Bucket> buckets = (List<DateHistogram.Bucket>) daily.getBuckets();
for (final DateHistogram.Bucket bucket : buckets) {
final Sum sum = (Sum) bucket.getAggregations().getAsMap().get("delivery");
final int deliverySum = (int) sum.getValue();
final int delivery = (int) bucket.getDocCount();
final String dateString = bucket.getKeyAsText().string();
dailyDeliverySum.add(new DailyDeliverySum(deliverySum, delivery, dateString));
}
return dailyDeliverySum;
}
}
它给了我正确的数据,但它并不能满足我所有的需求 假设如果我查询 10 天的时间范围,如果给定时间范围内的日期没有数据它会在日期直方图桶中错过该日期,但我想将 0 设置为聚合和文档计数的默认值(如果有)没有可用数据
有什么办法吗??
是的,您可以使用 date_histogram
聚合的 "minimum document count" feature 并将其设置为 0。这样,您还将获得不包含任何数据的存储桶:
final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY)
.field("dob")
.minDocCount(0) <--- add this line
.interval(DateHistogram.Interval.DAY)
.subAggregation(aggregateArtifactcount);
来自@Val 的示例本身对我不起作用(我使用的是高级 API 和 ElasticSearch 6.2.x)。什么起作用了,告诉聚合应该将缺失值处理为 0:
final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY)
.field("dob")
.minDocCount(0)
.missing(0)
.interval(DateHistogram.Interval.DAY)
.subAggregation(aggregateArtifactcount);