根据 Hive 中的组类别移动日期列的行
Shift rows of date column according to a group category in Hive
我有这些包含位置和日期的列。我想使用 Hive HQL 在每个位置而不是整个列中向下移动日期列。我试过使用 LAG()
但它不支持 Hive,而且 datediff()
但它不会给我带来变化的日期,只有日期差异。
Location Dates
0000076-0 01/01/2020
0000076-0 28/01/2020
0000076-0 28/02/2020
0000076-0 01/04/2020
0000076-0 28/04/2020
0000076-0 28/05/2020
0000076-0 01/07/2020
000478-2 01/01/2020
000478-2 01/02/2020
000478-2 26/02/2020
000478-2 01/04/2020
000478-2 26/04/2020
000478-2 26/05/2020
000478-2 01/07/2020
0005655-3 01/01/2020
0005655-3 07/02/2020
0005655-3 09/03/2020
0005655-3 07/04/2020
0005655-3 07/05/2020
0005655-3 08/06/2020
0005655-3 01/07/2020
0010828-6 01/01/2020
0010828-6 31/01/2020
0010828-6 29/02/2020
0010828-6 01/04/2020
0010828-6 01/05/2020
期望输出
Location Dates Dates_shifted_down_per_location
0000076-0 01/01/2020 null
0000076-0 28/01/2020 01/01/2020
0000076-0 28/02/2020 28/01/2020
0000076-0 01/04/2020 28/02/2020
0000076-0 28/04/2020 01/04/2020
0000076-0 28/05/2020 28/04/2020
0000076-0 01/07/2020 28/05/2020
000478-2 01/01/2020 null
000478-2 01/02/2020 01/01/2020
000478-2 26/02/2020 01/02/2020
000478-2 01/04/2020 26/02/2020
000478-2 26/04/2020 01/04/2020
000478-2 26/05/2020 26/04/2020
000478-2 01/07/2020 26/05/2020
0005655-3 01/01/2020 null
0005655-3 07/02/2020 01/01/2020
0005655-3 09/03/2020 07/02/2020
0005655-3 07/04/2020 09/03/2020
0005655-3 07/05/2020 07/04/2020
0005655-3 08/06/2020 07/05/2020
0005655-3 01/07/2020 08/06/2020
0010828-6 01/01/2020 null
0010828-6 31/01/2020 01/01/2020
0010828-6 29/02/2020 31/01/2020
0010828-6 01/05/2020 29/02/2020
0010828-6 01/06/2020 01/05/2020
Hive 支持 lag()
。您想要的代码是:
select t.*,
lag(date) over (partition by location order by date) as prev_date
from t;
我有这些包含位置和日期的列。我想使用 Hive HQL 在每个位置而不是整个列中向下移动日期列。我试过使用 LAG()
但它不支持 Hive,而且 datediff()
但它不会给我带来变化的日期,只有日期差异。
Location Dates
0000076-0 01/01/2020
0000076-0 28/01/2020
0000076-0 28/02/2020
0000076-0 01/04/2020
0000076-0 28/04/2020
0000076-0 28/05/2020
0000076-0 01/07/2020
000478-2 01/01/2020
000478-2 01/02/2020
000478-2 26/02/2020
000478-2 01/04/2020
000478-2 26/04/2020
000478-2 26/05/2020
000478-2 01/07/2020
0005655-3 01/01/2020
0005655-3 07/02/2020
0005655-3 09/03/2020
0005655-3 07/04/2020
0005655-3 07/05/2020
0005655-3 08/06/2020
0005655-3 01/07/2020
0010828-6 01/01/2020
0010828-6 31/01/2020
0010828-6 29/02/2020
0010828-6 01/04/2020
0010828-6 01/05/2020
期望输出
Location Dates Dates_shifted_down_per_location
0000076-0 01/01/2020 null
0000076-0 28/01/2020 01/01/2020
0000076-0 28/02/2020 28/01/2020
0000076-0 01/04/2020 28/02/2020
0000076-0 28/04/2020 01/04/2020
0000076-0 28/05/2020 28/04/2020
0000076-0 01/07/2020 28/05/2020
000478-2 01/01/2020 null
000478-2 01/02/2020 01/01/2020
000478-2 26/02/2020 01/02/2020
000478-2 01/04/2020 26/02/2020
000478-2 26/04/2020 01/04/2020
000478-2 26/05/2020 26/04/2020
000478-2 01/07/2020 26/05/2020
0005655-3 01/01/2020 null
0005655-3 07/02/2020 01/01/2020
0005655-3 09/03/2020 07/02/2020
0005655-3 07/04/2020 09/03/2020
0005655-3 07/05/2020 07/04/2020
0005655-3 08/06/2020 07/05/2020
0005655-3 01/07/2020 08/06/2020
0010828-6 01/01/2020 null
0010828-6 31/01/2020 01/01/2020
0010828-6 29/02/2020 31/01/2020
0010828-6 01/05/2020 29/02/2020
0010828-6 01/06/2020 01/05/2020
Hive 支持 lag()
。您想要的代码是:
select t.*,
lag(date) over (partition by location order by date) as prev_date
from t;