如何根据组内日期之间的差异更改列?
How to change a column based on the difference between dates within a group?
这可能是一个简单的问题,但我是 SQL 的菜鸟。我正在使用 Impala。所以我有这样的数据:
New_ID
Date
Old_ID
1
2020-11-14 12:41:21
0
1
2020-11-14 12:50:40
1
2
2020-10-14 15:22:00
1.5
2
2020-12-18 11:31:05
2
3
2020-11-14 12:42:25
3
假设我按 New_ID 分组,我需要检查日期和紧随其后的日期(如果存在)之间的差异是否小于 2 个月(假设是 60 天) .如果差异大于 2 个月,那么我需要将 New_ID 更改为 Old_ID。如果小于或等于 2 个月,则 New_ID 可以保持不变。本质上,我希望新的 table 看起来像这样:
New_ID
Date
Old_ID
1
2020-11-14 12:41:21
0
1
2020-11-14 12:50:40
1
1.5
2020-10-14 15:22:00
1.5
2
2020-12-18 11:31:05
2
3
2020-11-14 12:42:25
3
我已经尝试过这段代码片段及其变体,但是 1. 我不确定如何处理空值和 2. 我一直收到语法错误“无法解析 column/field 参考 'day' '
SELECT New_ID, Old_ID, Date,
LAG(Date) OVER(partition by New_ID ORDER BY Date) as previous_date,
case when datediff(day, previous_date, Date)/30.0 >= 2 then Old_ID
else New_ID end as 'new_identifier'
From MYTABLE;
任何 pointers/suggestions 将不胜感激。
Impala 日期函数是 months_between()
-- 无法识别 previous_date
,因此您需要重复表达式:
SELECT New_ID, Old_ID, Date,
LAG(Date) OVER (partition by New_ID ORDER BY Date) as previous_date,
(case when months_between(date, LAG(Date) OVER (partition by New_ID ORDER BY Date)) >= 2 then Old_ID
else New_ID
end) as new_identifier
From MYTABLE;
这可能是一个简单的问题,但我是 SQL 的菜鸟。我正在使用 Impala。所以我有这样的数据:
New_ID | Date | Old_ID |
---|---|---|
1 | 2020-11-14 12:41:21 | 0 |
1 | 2020-11-14 12:50:40 | 1 |
2 | 2020-10-14 15:22:00 | 1.5 |
2 | 2020-12-18 11:31:05 | 2 |
3 | 2020-11-14 12:42:25 | 3 |
假设我按 New_ID 分组,我需要检查日期和紧随其后的日期(如果存在)之间的差异是否小于 2 个月(假设是 60 天) .如果差异大于 2 个月,那么我需要将 New_ID 更改为 Old_ID。如果小于或等于 2 个月,则 New_ID 可以保持不变。本质上,我希望新的 table 看起来像这样:
New_ID | Date | Old_ID |
---|---|---|
1 | 2020-11-14 12:41:21 | 0 |
1 | 2020-11-14 12:50:40 | 1 |
1.5 | 2020-10-14 15:22:00 | 1.5 |
2 | 2020-12-18 11:31:05 | 2 |
3 | 2020-11-14 12:42:25 | 3 |
我已经尝试过这段代码片段及其变体,但是 1. 我不确定如何处理空值和 2. 我一直收到语法错误“无法解析 column/field 参考 'day' '
SELECT New_ID, Old_ID, Date,
LAG(Date) OVER(partition by New_ID ORDER BY Date) as previous_date,
case when datediff(day, previous_date, Date)/30.0 >= 2 then Old_ID
else New_ID end as 'new_identifier'
From MYTABLE;
任何 pointers/suggestions 将不胜感激。
Impala 日期函数是 months_between()
-- 无法识别 previous_date
,因此您需要重复表达式:
SELECT New_ID, Old_ID, Date,
LAG(Date) OVER (partition by New_ID ORDER BY Date) as previous_date,
(case when months_between(date, LAG(Date) OVER (partition by New_ID ORDER BY Date)) >= 2 then Old_ID
else New_ID
end) as new_identifier
From MYTABLE;