"Year-on-year" 具有 window 功能
"Year-on-year" with window functions
我正在通过连接进行 year on year
分析。我每年都加入相同的 table,但由于我正在使用另一种工具来构建我的 SQL,所以 'dynamic' 并非如此。如果我能用 window 函数解决这个问题会更好。所以任何建议表示赞赏:D
我们的想法是按小时进行。也就是说,我想比较 2022-04-05 小时 8 的销售额与 2021-04-05 小时 8 和 2020-04-05 小时 8 的销售额。
我的数据是按小时汇总的:
Store
Timestamp
Sales
1
2019-04-05T08:00:00Z
10000
1
2020-04-05T08:00:00Z
12000
1
2021-04-05T08:00:00Z
15000
1
2022-04-05T08:00:00Z
20000
2
2019-04-05T08:00:00Z
13000
2
2020-04-05T08:00:00Z
16000
2
2021-04-05T08:00:00Z
19000
2
2022-04-05T08:00:00Z
22000
期望的结果(订单可能从今年开始)不需要时间戳。我添加它们只是为了澄清:
Store
Timestamp_1
Sales_1
Timestamp_2
Sales_2
Timestamp_3
Sales_3
1
2019-04-05T08:00:00Z
10000
2020-04-05T08:00:00Z
12000
2021-04-05T08:00:00Z
15000
2
2019-04-05T08:00:00Z
13000
2020-04-05T08:00:00Z
16000
2021-04-05T08:00:00Z
19000
有什么想法吗?
提前致谢
不是你真正的答案,但如果你只有一天的时间
SELECT
store
,hour(date)
,array_agg(object_construct(date::text, sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
ORDER BY 1,2;
给出:
STORE
HOUR(DATE)
HOUR_HISTORY
1
8
[ { "2019-04-05 08:00:00.000": 10000 }, { "2020-04-05 08:00:00.000": 12000 }, { "2021-04-05 08:00:00.000": 15000 }, { "2022-04-05 08:00:00.000": 20000 } ]
2
8
[ { "2019-04-05 08:00:00.000": 13000 }, { "2020-04-05 08:00:00.000": 16000 }, { "2021-04-05 08:00:00.000": 19000 }, { "2022-04-05 08:00:00.000": 22000 } ]
因此:
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
)
ORDER BY 1;
确实给了:
STORE
TIMESTAMP_1
SALES_1
TIMESTAMP_2
SALES_2
TIMESTAMP_3
SALES_3
1
2019-04-05 08:00:00.000
10,000
2020-04-05 08:00:00.000
12,000
2021-04-05 08:00:00.000
15,000
2
2019-04-05 08:00:00.000
13,000
2020-04-05 08:00:00.000
16,000
2021-04-05 08:00:00.000
19,000
如果您有很多月、日、小时的数据,这适用于内部循环:
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
又名:
WITH data_table AS (
SELECT * FROM VALUES
(1,'2019-04-05T08:00:00Z'::timestamp,10000),
(1,'2020-04-05T08:00:00Z'::timestamp,12000),
(1,'2021-04-05T08:00:00Z'::timestamp,15000),
(1,'2022-04-05T08:00:00Z'::timestamp,20000),
(1,'2019-03-05T08:00:00Z'::timestamp,10001),
(1,'2020-03-05T08:00:00Z'::timestamp,12001),
(1,'2021-03-05T08:00:00Z'::timestamp,15001),
(1,'2022-03-05T08:00:00Z'::timestamp,20001),
(1,'2019-04-04T08:00:00Z'::timestamp,10002),
(1,'2020-04-04T08:00:00Z'::timestamp,12002),
(1,'2021-04-04T08:00:00Z'::timestamp,15002),
(1,'2022-04-04T08:00:00Z'::timestamp,20002),
(2,'2019-04-05T08:00:00Z'::timestamp,13000),
(2,'2020-04-05T08:00:00Z'::timestamp,16000),
(2,'2021-04-05T08:00:00Z'::timestamp,19000),
(2,'2022-04-05T08:00:00Z'::timestamp,22000)
t(store, date, sales)
)
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
)
ORDER BY 1;
给出:
STORE
TIMESTAMP_1
SALES_1
TIMESTAMP_2
SALES_2
TIMESTAMP_3
SALES_3
1
2019-04-05 08:00:00.000
10,000
2020-04-05 08:00:00.000
12,000
2021-04-05 08:00:00.000
15,000
1
2019-03-05 08:00:00.000
10,001
2020-03-05 08:00:00.000
12,001
2021-03-05 08:00:00.000
15,001
1
2019-04-04 08:00:00.000
10,002
2020-04-04 08:00:00.000
12,002
2021-04-04 08:00:00.000
15,002
2
2019-04-05 08:00:00.000
13,000
2020-04-05 08:00:00.000
16,000
2021-04-05 08:00:00.000
19,000
您会注意到您的示例有 4 年的数据,而您正在丢弃 2022 年的数据。
我正在通过连接进行 year on year
分析。我每年都加入相同的 table,但由于我正在使用另一种工具来构建我的 SQL,所以 'dynamic' 并非如此。如果我能用 window 函数解决这个问题会更好。所以任何建议表示赞赏:D
我们的想法是按小时进行。也就是说,我想比较 2022-04-05 小时 8 的销售额与 2021-04-05 小时 8 和 2020-04-05 小时 8 的销售额。
我的数据是按小时汇总的:
Store | Timestamp | Sales |
---|---|---|
1 | 2019-04-05T08:00:00Z | 10000 |
1 | 2020-04-05T08:00:00Z | 12000 |
1 | 2021-04-05T08:00:00Z | 15000 |
1 | 2022-04-05T08:00:00Z | 20000 |
2 | 2019-04-05T08:00:00Z | 13000 |
2 | 2020-04-05T08:00:00Z | 16000 |
2 | 2021-04-05T08:00:00Z | 19000 |
2 | 2022-04-05T08:00:00Z | 22000 |
期望的结果(订单可能从今年开始)不需要时间戳。我添加它们只是为了澄清:
Store | Timestamp_1 | Sales_1 | Timestamp_2 | Sales_2 | Timestamp_3 | Sales_3 |
---|---|---|---|---|---|---|
1 | 2019-04-05T08:00:00Z | 10000 | 2020-04-05T08:00:00Z | 12000 | 2021-04-05T08:00:00Z | 15000 |
2 | 2019-04-05T08:00:00Z | 13000 | 2020-04-05T08:00:00Z | 16000 | 2021-04-05T08:00:00Z | 19000 |
有什么想法吗? 提前致谢
不是你真正的答案,但如果你只有一天的时间
SELECT
store
,hour(date)
,array_agg(object_construct(date::text, sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
ORDER BY 1,2;
给出:
STORE | HOUR(DATE) | HOUR_HISTORY |
---|---|---|
1 | 8 | [ { "2019-04-05 08:00:00.000": 10000 }, { "2020-04-05 08:00:00.000": 12000 }, { "2021-04-05 08:00:00.000": 15000 }, { "2022-04-05 08:00:00.000": 20000 } ] |
2 | 8 | [ { "2019-04-05 08:00:00.000": 13000 }, { "2020-04-05 08:00:00.000": 16000 }, { "2021-04-05 08:00:00.000": 19000 }, { "2022-04-05 08:00:00.000": 22000 } ] |
因此:
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
)
ORDER BY 1;
确实给了:
STORE | TIMESTAMP_1 | SALES_1 | TIMESTAMP_2 | SALES_2 | TIMESTAMP_3 | SALES_3 |
---|---|---|---|---|---|---|
1 | 2019-04-05 08:00:00.000 | 10,000 | 2020-04-05 08:00:00.000 | 12,000 | 2021-04-05 08:00:00.000 | 15,000 |
2 | 2019-04-05 08:00:00.000 | 13,000 | 2020-04-05 08:00:00.000 | 16,000 | 2021-04-05 08:00:00.000 | 19,000 |
如果您有很多月、日、小时的数据,这适用于内部循环:
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
又名:
WITH data_table AS (
SELECT * FROM VALUES
(1,'2019-04-05T08:00:00Z'::timestamp,10000),
(1,'2020-04-05T08:00:00Z'::timestamp,12000),
(1,'2021-04-05T08:00:00Z'::timestamp,15000),
(1,'2022-04-05T08:00:00Z'::timestamp,20000),
(1,'2019-03-05T08:00:00Z'::timestamp,10001),
(1,'2020-03-05T08:00:00Z'::timestamp,12001),
(1,'2021-03-05T08:00:00Z'::timestamp,15001),
(1,'2022-03-05T08:00:00Z'::timestamp,20001),
(1,'2019-04-04T08:00:00Z'::timestamp,10002),
(1,'2020-04-04T08:00:00Z'::timestamp,12002),
(1,'2021-04-04T08:00:00Z'::timestamp,15002),
(1,'2022-04-04T08:00:00Z'::timestamp,20002),
(2,'2019-04-05T08:00:00Z'::timestamp,13000),
(2,'2020-04-05T08:00:00Z'::timestamp,16000),
(2,'2021-04-05T08:00:00Z'::timestamp,19000),
(2,'2022-04-05T08:00:00Z'::timestamp,22000)
t(store, date, sales)
)
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
)
ORDER BY 1;
给出:
STORE | TIMESTAMP_1 | SALES_1 | TIMESTAMP_2 | SALES_2 | TIMESTAMP_3 | SALES_3 |
---|---|---|---|---|---|---|
1 | 2019-04-05 08:00:00.000 | 10,000 | 2020-04-05 08:00:00.000 | 12,000 | 2021-04-05 08:00:00.000 | 15,000 |
1 | 2019-03-05 08:00:00.000 | 10,001 | 2020-03-05 08:00:00.000 | 12,001 | 2021-03-05 08:00:00.000 | 15,001 |
1 | 2019-04-04 08:00:00.000 | 10,002 | 2020-04-04 08:00:00.000 | 12,002 | 2021-04-04 08:00:00.000 | 15,002 |
2 | 2019-04-05 08:00:00.000 | 13,000 | 2020-04-05 08:00:00.000 | 16,000 | 2021-04-05 08:00:00.000 | 19,000 |
您会注意到您的示例有 4 年的数据,而您正在丢弃 2022 年的数据。