AggregatingMergeTree 没有正确聚合插入

AggregatingMergeTree not aggregating inserts properly

我有一个 table 可以按 minute/hour/day 汇总各种产品的销售数量并计算各种指标。

下面的 table 有 1 分钟的增量计算,计算出 core_product_tbl。计算在 product_agg_tbl 之后,其他 tables 在 product_agg_tbl 之后按小时、天、周等计算。

CREATE TABLE product_agg_tbl (
  product String,
  minute DateTime,
  high Nullable(Float32),
  low Nullable(Float32),
  average AggregateFunction(avg, Nullable(Float32)),
  first Nullable(Float32),
  last Nullable(Float32),
  total_sales Nullable(UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (product, minute);

CREATE MATERIALIZED VIEW product_agg_mv TO product_agg_tbl AS
SELECT 
  product,
  minute,
  max(price) AS high,
  min(price) AS low,
  avgState(price) AS average,
  argMin(price, sales_timestamp) AS first,
  argMax(price, sales_timestamp) AS last,
  sum(batch_size) as total_sales
FROM  core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;

CREATE VIEW product_agg_1w AS
SELECT
    product,
    toStartOfHour(minute) AS minute,
    max(high) AS high,
    min(low) AS low,
    avgMerge(average) AS average_price,
    argMin(first, minute) AS first,
    argMax(last, minute) AS last,
    sum(total_sales) as total_sales
FROM product_agg_tbl
WHERE minute >= date_sub(today(), interval 7 + 7 day)
GROUP BY  product, minute;

我遇到的问题是,当我 运行 下面的查询直接来自 core_product_tbl 时,我得到的数字与 product_agg_1w 有很大不同。可能发生了什么?

SELECT 
  product,
  toStartOfHour(minute) AS minute,
  max(price) AS high,
  min(price) AS low,
  avgState(price) AS average,
  argMin(price, sales_timestamp) AS first,
  argMax(price, sales_timestamp) AS last,
  sum(batch_size) as total_sales
FROM  core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;

您应该在 table AggregatingMergeTree 中使用 SimpleAggregateFunction 或 AggregateFunction。

AggregatingMergeTree 对实体化视图和实体化视图中的 select 一无所知。 https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

CREATE TABLE product_agg_tbl (
  product String,
  minute DateTime,
  high SimpleAggregateFunction(max, Nullable(Float32)),
  low SimpleAggregateFunction(min, Nullable(Float32)),
  average AggregateFunction(avg, Nullable(Float32), DateTime),
  first AggregateFunction(argMin, Nullable(Float32), DateTime),
  last AggregateFunction(argMax, Nullable(Float32),DateTime),
  total_sales SimpleAggregateFunction(sum,Nullable(UInt64))
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (product, minute);

CREATE MATERIALIZED VIEW product_agg_mv TO product_agg_tbl AS
SELECT 
  product,
  minute,
  max(price) AS high,
  min(price) AS low,
  avgState(price) AS average,
  argMinState(price, sales_timestamp) AS first,
  argMaxState(price, sales_timestamp) AS last,
  sum(batch_size) as total_sales
FROM  core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;

CREATE VIEW product_agg_1w AS
SELECT
    product,
    toStartOfHour(minute) AS minute,
    max(high) AS high,
    min(low) AS low,
    avgMerge(average) AS average_price,
    argMinMerge(first, minute) AS first,
    argMaxMerge(last, minute) AS last,
    sum(total_sales) as total_sales
FROM product_agg_tbl
WHERE minute >= date_sub(today(), interval 7 + 7 day)
GROUP BY  product, minute;

不要使用视图 (product_agg_1w),因为它会适得其反。它读取过多的数据。使用select直接到product_agg_tbl.