Select hive 中的最大时间戳
Select the max timestamp in hive
我有一个 table-客户,在不同的时间戳中有两条记录。我要select最大时间戳记录:2014-08-1515:54:07.379.
Select Customer_ID, Account_ID, max(ProcessTimeStamp)
from Customer
group by Customer_ID, Account_ID
我应该得到一条记录,但实际结果是两条记录。
如何获取最大 ProcessTimeStamp 记录?
您可以在这里使用windows功能。
您可以使用 dense_rank() 或 row_num().
1.USING DENSE_RANK()
select customer_id,account_id,processTimeStamp
from (select *
,dense_rank() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
2.USING 行号
select customer_id,account_id,processTimeStamp
from (select *
,row_number() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
但是使用 row_number() 每行将获得一个唯一编号,如果有重复记录,则 row_number 将仅给出行号 = 1 的行(在上述情况下).
我有一个 table-客户,在不同的时间戳中有两条记录。我要select最大时间戳记录:2014-08-1515:54:07.379.
Select Customer_ID, Account_ID, max(ProcessTimeStamp)
from Customer
group by Customer_ID, Account_ID
我应该得到一条记录,但实际结果是两条记录。
如何获取最大 ProcessTimeStamp 记录?
您可以在这里使用windows功能。 您可以使用 dense_rank() 或 row_num().
1.USING DENSE_RANK()
select customer_id,account_id,processTimeStamp
from (select *
,dense_rank() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
2.USING 行号
select customer_id,account_id,processTimeStamp
from (select *
,row_number() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
但是使用 row_number() 每行将获得一个唯一编号,如果有重复记录,则 row_number 将仅给出行号 = 1 的行(在上述情况下).