"Year-on-year" 具有 window 功能

"Year-on-year" with window functions

我正在通过连接进行 year on year 分析。我每年都加入相同的 table,但由于我正在使用另一种工具来构建我的 SQL,所以 'dynamic' 并非如此。如果我能用 window 函数解决这个问题会更好。所以任何建议表示赞赏:D

我们的想法是按小时进行。也就是说,我想比较 2022-04-05 小时 8 的销售额与 2021-04-05 小时 8 和 2020-04-05 小时 8 的销售额。

我的数据是按小时汇总的:

Store Timestamp Sales
1 2019-04-05T08:00:00Z 10000
1 2020-04-05T08:00:00Z 12000
1 2021-04-05T08:00:00Z 15000
1 2022-04-05T08:00:00Z 20000
2 2019-04-05T08:00:00Z 13000
2 2020-04-05T08:00:00Z 16000
2 2021-04-05T08:00:00Z 19000
2 2022-04-05T08:00:00Z 22000

期望的结果(订单可能从今年开始)不需要时间戳。我添加它们只是为了澄清:

Store Timestamp_1 Sales_1 Timestamp_2 Sales_2 Timestamp_3 Sales_3
1 2019-04-05T08:00:00Z 10000 2020-04-05T08:00:00Z 12000 2021-04-05T08:00:00Z 15000
2 2019-04-05T08:00:00Z 13000 2020-04-05T08:00:00Z 16000 2021-04-05T08:00:00Z 19000

有什么想法吗? 提前致谢

不是你真正的答案,但如果你只有一天的时间

SELECT 
    store
    ,hour(date)
    ,array_agg(object_construct(date::text, sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
ORDER BY 1,2;

给出:

STORE HOUR(DATE) HOUR_HISTORY
1 8 [ { "2019-04-05 08:00:00.000": 10000 }, { "2020-04-05 08:00:00.000": 12000 }, { "2021-04-05 08:00:00.000": 15000 }, { "2022-04-05 08:00:00.000": 20000 } ]
2 8 [ { "2019-04-05 08:00:00.000": 13000 }, { "2020-04-05 08:00:00.000": 16000 }, { "2021-04-05 08:00:00.000": 19000 }, { "2022-04-05 08:00:00.000": 22000 } ]

因此:

SELECT store
    ,hour_history[0].date::timestamp as Timestamp_1
    ,hour_history[0].sales::number as Sales_1
    ,hour_history[1].date::timestamp as Timestamp_2
    ,hour_history[1].sales::number as Sales_2
    ,hour_history[2].date::timestamp as Timestamp_3
    ,hour_history[2].sales::number as Sales_3
FROM (
SELECT 
    store
    ,hour(date)
    ,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
)
ORDER BY 1;

确实给了:

STORE TIMESTAMP_1 SALES_1 TIMESTAMP_2 SALES_2 TIMESTAMP_3 SALES_3
1 2019-04-05 08:00:00.000 10,000 2020-04-05 08:00:00.000 12,000 2021-04-05 08:00:00.000 15,000
2 2019-04-05 08:00:00.000 13,000 2020-04-05 08:00:00.000 16,000 2021-04-05 08:00:00.000 19,000

如果您有很多月、日、小时的数据,这适用于内部循环:

SELECT 
    store
    ,month(date)
    ,day(date)
    ,hour(date)
    ,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4

又名:

WITH data_table AS (
    SELECT * FROM VALUES
        (1,'2019-04-05T08:00:00Z'::timestamp,10000),
        (1,'2020-04-05T08:00:00Z'::timestamp,12000),
        (1,'2021-04-05T08:00:00Z'::timestamp,15000),
        (1,'2022-04-05T08:00:00Z'::timestamp,20000),

        (1,'2019-03-05T08:00:00Z'::timestamp,10001),
        (1,'2020-03-05T08:00:00Z'::timestamp,12001),
        (1,'2021-03-05T08:00:00Z'::timestamp,15001),
        (1,'2022-03-05T08:00:00Z'::timestamp,20001),

        (1,'2019-04-04T08:00:00Z'::timestamp,10002),
        (1,'2020-04-04T08:00:00Z'::timestamp,12002),
        (1,'2021-04-04T08:00:00Z'::timestamp,15002),
        (1,'2022-04-04T08:00:00Z'::timestamp,20002),

    
        (2,'2019-04-05T08:00:00Z'::timestamp,13000),
        (2,'2020-04-05T08:00:00Z'::timestamp,16000),
        (2,'2021-04-05T08:00:00Z'::timestamp,19000),
        (2,'2022-04-05T08:00:00Z'::timestamp,22000)
    t(store, date, sales)
)
SELECT store
    ,hour_history[0].date::timestamp as Timestamp_1
    ,hour_history[0].sales::number as Sales_1
    ,hour_history[1].date::timestamp as Timestamp_2
    ,hour_history[1].sales::number as Sales_2
    ,hour_history[2].date::timestamp as Timestamp_3
    ,hour_history[2].sales::number as Sales_3
FROM (
SELECT 
    store
    ,month(date)
    ,day(date)
    ,hour(date)
    ,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
)
ORDER BY 1;

给出:

STORE TIMESTAMP_1 SALES_1 TIMESTAMP_2 SALES_2 TIMESTAMP_3 SALES_3
1 2019-04-05 08:00:00.000 10,000 2020-04-05 08:00:00.000 12,000 2021-04-05 08:00:00.000 15,000
1 2019-03-05 08:00:00.000 10,001 2020-03-05 08:00:00.000 12,001 2021-03-05 08:00:00.000 15,001
1 2019-04-04 08:00:00.000 10,002 2020-04-04 08:00:00.000 12,002 2021-04-04 08:00:00.000 15,002
2 2019-04-05 08:00:00.000 13,000 2020-04-05 08:00:00.000 16,000 2021-04-05 08:00:00.000 19,000

您会注意到您的示例有 4 年的数据,而您正在丢弃 2022 年的数据。