Presto 滞后日期,group/partitioned by id
Presto lag dates, group/partitioned by id
假设我想在每次客户更新预算时查找。
我的数据是这样的
datetime, client_id, new_budget
__________,__________,___________
2022-01-01,1, ,100
2022-01-01,2, ,300
2022-01-02,1, ,80
2022-01-02,2, ,80
还有我的代码 运行。
SELECT datetime AS dt_1,
LAG(datetime) OVER (ORDER BY client_id, datetime) AS dt_2,
client_id,
new_budget
FROM budget_table
我期望返回的是
dt_1, dt_2, client_id, new_budget
__________,__________,__________,___________
2022-01-01,NULL, 1 , 100
2022-01-02,2022-01-01,1 , 80
2022-01-01,NULL, 2 , 300
2022-01-02,2022-01-01,2 , 80
因此在每个 client_id 的第一个条目中 dt_2 有 NULL 值。我不确定什么代码可以实现这种效果;是否需要 GROUP BY 子句(或 partition over 子句)。
但这是我 运行
SQL 的输出
dt_1, dt_2, client_id, new_budget
__________,__________,__________,___________
2022-01-01,NULL, 1 , 100
2022-01-02,2022-01-01,1 , 80
2022-01-01,2022-01-02,2 , 300
2022-01-02,2022-01-01,2 , 80
所以这里的大问题是,如果前一行来自不同的 client_id.
,它无法识别 dt_2 应该为 NULL
建议使用哪种语法来实现此效果?
你需要partition by client_id
:
The PARTITION BY
clause separates the input rows into different partitions. This is analogous to how the GROUP BY
clause separates rows into different groups for aggregate functions. If PARTITION BY
is not specified, the entire input is treated as a single partition.
SELECT datetime AS dt_1,
LAG(datetime) OVER (PARTITION BY client_id ORDER BY datetime) AS dt_2,
client_id,
new_budget
FROM budget_table
假设我想在每次客户更新预算时查找。
我的数据是这样的
datetime, client_id, new_budget
__________,__________,___________
2022-01-01,1, ,100
2022-01-01,2, ,300
2022-01-02,1, ,80
2022-01-02,2, ,80
还有我的代码 运行。
SELECT datetime AS dt_1,
LAG(datetime) OVER (ORDER BY client_id, datetime) AS dt_2,
client_id,
new_budget
FROM budget_table
我期望返回的是
dt_1, dt_2, client_id, new_budget
__________,__________,__________,___________
2022-01-01,NULL, 1 , 100
2022-01-02,2022-01-01,1 , 80
2022-01-01,NULL, 2 , 300
2022-01-02,2022-01-01,2 , 80
因此在每个 client_id 的第一个条目中 dt_2 有 NULL 值。我不确定什么代码可以实现这种效果;是否需要 GROUP BY 子句(或 partition over 子句)。
但这是我 运行
SQL 的输出dt_1, dt_2, client_id, new_budget
__________,__________,__________,___________
2022-01-01,NULL, 1 , 100
2022-01-02,2022-01-01,1 , 80
2022-01-01,2022-01-02,2 , 300
2022-01-02,2022-01-01,2 , 80
所以这里的大问题是,如果前一行来自不同的 client_id.
,它无法识别 dt_2 应该为 NULL建议使用哪种语法来实现此效果?
你需要partition by client_id
:
The
PARTITION BY
clause separates the input rows into different partitions. This is analogous to how theGROUP BY
clause separates rows into different groups for aggregate functions. IfPARTITION BY
is not specified, the entire input is treated as a single partition.
SELECT datetime AS dt_1,
LAG(datetime) OVER (PARTITION BY client_id ORDER BY datetime) AS dt_2,
client_id,
new_budget
FROM budget_table