Hive - sql 多行的最大数量

Hive - sql max number with multiple rows

对于下面的原始数据,如何为整行获取每个 customer_id 的最大数量,并为该行的其余部分获取 null?我可以获得数据的最大值,但无法以 #Results

的形式获得
#Raw data                           
customer_id  name       location    itemno_1    itemno_2    itemno_3    itemno_4    itemno_5
123          Ashley M   CA          10          null        10       null   null
123          Ashley M   CA          null        12          null        12  null
143          Donald P   FL          15          15          0   1   10
187          Alicia P   GA          15          9           null    null    null
1736         Mike H     CT          null        8           8   9        null
1736         Mike H     CT          null        null       null null         null
1876         David M    CA          null        null       null null         null
532          Matthew T  CA          null        9          10   10  null

结果

customer_id  name       location    itemno_1    itemno_2    itemno_3    itemno_4    itemno_5
123          Ashley M   CA  null    12  null    null    null
143          Donald P   FL  15  null    null    null    null
187          Alicia P   GA  15  null    null    null    null
1736         Mike H     CT  null    null    null    null    null
1876         David M    CA  null    null    null    null    null
532          Matthew T  CA  null    null    null    10  null

下面是生成预期结果的查询。(我已经测试过它有效)我假设如果 2 item_nos 具有相同的最大值,我们将保持最低值 item_no。例如 customer_id = 123 itemno_2 和 itemno_4 的值为 12 但将 itemno_2 保留为 12 并使 itemno_4 为空。

select customer_id, name, location1
      ,CASE WHEN (i1 >= i2 or i2 is null)
            AND  (i1 >= i3 or i3 is null)
            AND  (i1 >= i4 or i4 is null)
            AND  (i1 >= i5 or i5 is null)
            THEN i1
            ELSE null
       END as itemno_1
      ,CASE WHEN (i2 >= i1 or i1 is null)
            AND  (i2 >= i3 or i3 is null)
            AND  (i2 >= i4 or i4 is null)
            AND  (i2 >= i5 or i5 is null)
            AND  (i1 <> i2 or i1 is null)
            THEN i2
            ELSE null
       END as itemno_2
      ,CASE WHEN (i3 >= i1 or i1 is null)
            AND  (i3 >= i2 or i2 is null)
            AND  (i3 >= i4 or i4 is null)
            AND  (i3 >= i5 or i5 is null)
            AND  (i1 <> i3 or i1 is null)
            AND  (i2 <> i3 or i2 is null)
            THEN i3
            ELSE null
       END as itemno_3
      ,CASE WHEN (i4 >= i1 or i1 is null)
            AND  (i4 >= i2 or i2 is null)
            AND  (i4 >= i3 or i3 is null)
            AND  (i4 >= i5 or i5 is null)
            AND  (i1 <> i4 or i1 is null)
            AND  (i2 <> i4 or i2 is null)
            and  (i3 <> i4 or i3 is null)
            THEN i4
            ELSE null
       END as itemno_4
      ,CASE WHEN (i5 >= i1 or  i1   is null)
            AND  (i5 >= i2 or  i2   is null)
            AND  (i5 >= i3 or  i3   is null)
            AND  (i5 >= i4 or  i4   is null)
            AND  (i1 is null or i1 <> i5)
            AND  (i2 is null or i2 <> i5)
            AND  (i3 is null or i3 <> i5)
            AND  (i4 is null or i4 <> i5)
            THEN i5
            ELSE null
       END as itemno_5

from (
select customer_id, name, location1
      ,max(itemno_1) as i1
      ,max(itemno_2) as i2
      ,max(itemno_3) as i3
      ,max(itemno_4) as i4
      ,max(itemno_5) as i5
from default.stack2
group by customer_id, name, location1) a
order by customer_id;

同样的事情也可以通过编写 UDF 而不是 case 语句来查找最多 5 列和预期的 return 来实现。