AggregatingMergeTree 没有正确聚合插入
AggregatingMergeTree not aggregating inserts properly
我有一个 table 可以按 minute/hour/day 汇总各种产品的销售数量并计算各种指标。
下面的 table 有 1 分钟的增量计算,计算出 core_product_tbl
。计算在 product_agg_tbl
之后,其他 tables 在 product_agg_tbl
之后按小时、天、周等计算。
CREATE TABLE product_agg_tbl (
product String,
minute DateTime,
high Nullable(Float32),
low Nullable(Float32),
average AggregateFunction(avg, Nullable(Float32)),
first Nullable(Float32),
last Nullable(Float32),
total_sales Nullable(UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (product, minute);
CREATE MATERIALIZED VIEW product_agg_mv TO product_agg_tbl AS
SELECT
product,
minute,
max(price) AS high,
min(price) AS low,
avgState(price) AS average,
argMin(price, sales_timestamp) AS first,
argMax(price, sales_timestamp) AS last,
sum(batch_size) as total_sales
FROM core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;
CREATE VIEW product_agg_1w AS
SELECT
product,
toStartOfHour(minute) AS minute,
max(high) AS high,
min(low) AS low,
avgMerge(average) AS average_price,
argMin(first, minute) AS first,
argMax(last, minute) AS last,
sum(total_sales) as total_sales
FROM product_agg_tbl
WHERE minute >= date_sub(today(), interval 7 + 7 day)
GROUP BY product, minute;
我遇到的问题是,当我 运行 下面的查询直接来自 core_product_tbl
时,我得到的数字与 product_agg_1w
有很大不同。可能发生了什么?
SELECT
product,
toStartOfHour(minute) AS minute,
max(price) AS high,
min(price) AS low,
avgState(price) AS average,
argMin(price, sales_timestamp) AS first,
argMax(price, sales_timestamp) AS last,
sum(batch_size) as total_sales
FROM core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;
您应该在 table AggregatingMergeTree 中使用 SimpleAggregateFunction 或 AggregateFunction。
AggregatingMergeTree 对实体化视图和实体化视图中的 select 一无所知。 https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf
CREATE TABLE product_agg_tbl (
product String,
minute DateTime,
high SimpleAggregateFunction(max, Nullable(Float32)),
low SimpleAggregateFunction(min, Nullable(Float32)),
average AggregateFunction(avg, Nullable(Float32), DateTime),
first AggregateFunction(argMin, Nullable(Float32), DateTime),
last AggregateFunction(argMax, Nullable(Float32),DateTime),
total_sales SimpleAggregateFunction(sum,Nullable(UInt64))
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (product, minute);
CREATE MATERIALIZED VIEW product_agg_mv TO product_agg_tbl AS
SELECT
product,
minute,
max(price) AS high,
min(price) AS low,
avgState(price) AS average,
argMinState(price, sales_timestamp) AS first,
argMaxState(price, sales_timestamp) AS last,
sum(batch_size) as total_sales
FROM core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;
CREATE VIEW product_agg_1w AS
SELECT
product,
toStartOfHour(minute) AS minute,
max(high) AS high,
min(low) AS low,
avgMerge(average) AS average_price,
argMinMerge(first, minute) AS first,
argMaxMerge(last, minute) AS last,
sum(total_sales) as total_sales
FROM product_agg_tbl
WHERE minute >= date_sub(today(), interval 7 + 7 day)
GROUP BY product, minute;
不要使用视图 (product_agg_1w),因为它会适得其反。它读取过多的数据。使用select直接到product_agg_tbl.
我有一个 table 可以按 minute/hour/day 汇总各种产品的销售数量并计算各种指标。
下面的 table 有 1 分钟的增量计算,计算出 core_product_tbl
。计算在 product_agg_tbl
之后,其他 tables 在 product_agg_tbl
之后按小时、天、周等计算。
CREATE TABLE product_agg_tbl (
product String,
minute DateTime,
high Nullable(Float32),
low Nullable(Float32),
average AggregateFunction(avg, Nullable(Float32)),
first Nullable(Float32),
last Nullable(Float32),
total_sales Nullable(UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (product, minute);
CREATE MATERIALIZED VIEW product_agg_mv TO product_agg_tbl AS
SELECT
product,
minute,
max(price) AS high,
min(price) AS low,
avgState(price) AS average,
argMin(price, sales_timestamp) AS first,
argMax(price, sales_timestamp) AS last,
sum(batch_size) as total_sales
FROM core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;
CREATE VIEW product_agg_1w AS
SELECT
product,
toStartOfHour(minute) AS minute,
max(high) AS high,
min(low) AS low,
avgMerge(average) AS average_price,
argMin(first, minute) AS first,
argMax(last, minute) AS last,
sum(total_sales) as total_sales
FROM product_agg_tbl
WHERE minute >= date_sub(today(), interval 7 + 7 day)
GROUP BY product, minute;
我遇到的问题是,当我 运行 下面的查询直接来自 core_product_tbl
时,我得到的数字与 product_agg_1w
有很大不同。可能发生了什么?
SELECT
product,
toStartOfHour(minute) AS minute,
max(price) AS high,
min(price) AS low,
avgState(price) AS average,
argMin(price, sales_timestamp) AS first,
argMax(price, sales_timestamp) AS last,
sum(batch_size) as total_sales
FROM core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;
您应该在 table AggregatingMergeTree 中使用 SimpleAggregateFunction 或 AggregateFunction。
AggregatingMergeTree 对实体化视图和实体化视图中的 select 一无所知。 https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf
CREATE TABLE product_agg_tbl (
product String,
minute DateTime,
high SimpleAggregateFunction(max, Nullable(Float32)),
low SimpleAggregateFunction(min, Nullable(Float32)),
average AggregateFunction(avg, Nullable(Float32), DateTime),
first AggregateFunction(argMin, Nullable(Float32), DateTime),
last AggregateFunction(argMax, Nullable(Float32),DateTime),
total_sales SimpleAggregateFunction(sum,Nullable(UInt64))
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (product, minute);
CREATE MATERIALIZED VIEW product_agg_mv TO product_agg_tbl AS
SELECT
product,
minute,
max(price) AS high,
min(price) AS low,
avgState(price) AS average,
argMinState(price, sales_timestamp) AS first,
argMaxState(price, sales_timestamp) AS last,
sum(batch_size) as total_sales
FROM core_product_tbl
WHERE minute >= today()
GROUP BY product, toStartOfMinute(sales_timestamp) AS minute;
CREATE VIEW product_agg_1w AS
SELECT
product,
toStartOfHour(minute) AS minute,
max(high) AS high,
min(low) AS low,
avgMerge(average) AS average_price,
argMinMerge(first, minute) AS first,
argMaxMerge(last, minute) AS last,
sum(total_sales) as total_sales
FROM product_agg_tbl
WHERE minute >= date_sub(today(), interval 7 + 7 day)
GROUP BY product, minute;
不要使用视图 (product_agg_1w),因为它会适得其反。它读取过多的数据。使用select直接到product_agg_tbl.