当我们 运行 对 daliy 和行数据进行相同查询时,德鲁伊计数会有所不同

Druid count differ when we run same query on daliy and row data

当 运行 查询 Druid.I 中的 ABS 数据源时得到了一些计数,但当同一查询 运行 与 ABS_DAILY 数据源时不同。然后我们用 ABS 制作 ABS_DAILY。

{
  "queryType" : "groupBy",
  "dataSource" : "ABS",
  "granularity" : "all",
  "intervals" : [  "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z" ],
  "descending" : "false",
  "aggregations" : [ {
    "type" : "count",
    "name" : "COUNT",
    "fieldName" : "COUNT"
  } ],
  "postAggregations" : [ ],

  "dimensions" : [ "event_id" ]
}

下面 json 用于提交德鲁伊的日常工作,它将在特定时间 ABS_DALIY 创建段

{
  "spec": {
    "ioConfig": {
      "firehose": {
        "dataSource": "ABS",                                   
        "interval": "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z",
        "metrics": null,
        "dimensions": null,
        "type": "ingestSegment"
      },
      "type": "index"
    },
    "dataSchema": {
      "granularitySpec": {
        "queryGranularity": "day",
        "intervals": [
          "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z"           
        ],
        "segmentGranularity": "day",
        "type": "uniform"
      },
      "dataSource": "ABS_DAILY",                            
      "metricsSpec": [],
      "parser": {
        "parseSpec": {
          "timestampSpec": {
            "column": "server_timestamp",
            "format": "dd MMMM, yyyy (HH:mm:ss)"
          },
          "dimensionsSpec": {
            "dimensionExclusions": [
              "server_timestamp"
            ],
            "dimensions": []
          },
          "format": "json"
        },
        "type": "string"
      }
    }
  },
  "type": "index"
}

我要求 ABS_DAILY 在其下方 return 与 ABS 计数不同的结果。它不应该。

{
  "queryType" : "groupBy",
  "dataSource" : "ERS_DAILY",
  "granularity" : "all",
  "intervals" : [ "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z" ],
  "descending" : "false",
  "aggregations" : [ {
    "type" : "count",
    "name" : "COUNT",
    "fieldName" : "COUNT"
  } ],
  "postAggregations" : [ ],

  "dimensions" : [ "event_id" ]
}

您正在计算每日汇总的行数。

要汇总预先汇总的计数,您现在需要对计数列求和(请参阅 type

{
  "queryType" : "groupBy",
  "dataSource" : "ERS_DAILY",
  "granularity" : "all",
  "intervals" : [ "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z" ],
  "descending" : "false",
  "aggregations" : [ {
    "type" : "longSum",
    "name" : "COUNT",
    "fieldName" : "COUNT"
  } ],
  "postAggregations" : [ ],

  "dimensions" : [ "event_id" ]
}