SQL 在 Hive 中查询以获取 2 列中的前 2 个值

SQL Query in Hive to get top 2 values in 2 columns

我有一个 table 类似 access(url, access_time),每个 url.

的访问次数可能很多

我还有一个 table 是 asset(url, foo)

我想做一个查询,将其变成 joined_data(url, first_access_time, second_access_time)

如果没有访问时间,first_access_time为NULL,如果没有第二次访问时间,second_access_time为NULL

如何在配置单元中执行此操作?

您可以使用 row_number 来执行此操作。

with twotimes as (select ast.url, a.access_time,
                  row_number() over(partition by a.url order by a.access_time) as rn
                  from asset ast 
                  left join access a on a.url = ast.url )
select url, max(first_access_time), max(second_access_time)
from (
select url, access_time as first_access_time, null as second_access_time
from twotimes where rn = 1
union all
select url, null as first_access_time, access_time as second_access_time
from twotimes where rn = 2
) t
group by url