Select hive 中的最大时间戳

Question

我有一个 table-客户，在不同的时间戳中有两条记录。我要select最大时间戳记录：2014-08-1515:54:07.379.

Select Customer_ID, Account_ID, max(ProcessTimeStamp)
from Customer
group by Customer_ID,   Account_ID

我应该得到一条记录，但实际结果是两条记录。

如何获取最大 ProcessTimeStamp 记录？

Answer 1

您可以在这里使用windows功能。您可以使用 dense_rank() 或 row_num().

1.USING DENSE_RANK()

select customer_id,account_id,processTimeStamp
    from (select *
          ,dense_rank() over(partition by customer_id order by processTimeStamp desc) as rank
          from "your table" 
         ) temp
    where rank=1

2.USING 行号

select customer_id,account_id,processTimeStamp
        from (select *
              ,row_number() over(partition by customer_id order by processTimeStamp desc) as rank
              from "your table" 
             ) temp
        where rank=1

但是使用 row_number() 每行将获得一个唯一编号，如果有重复记录，则 row_number 将仅给出行号 = 1 的行（在上述情况下).

Select hive 中的最大时间戳

Select the max timestamp in hive

conditional

hive

max

datetime-format