我每次 运行 我在 SQL Impala 中使用 lead 函数时都会得到不同的结果

I get different results every time I run my which uses lead function in SQL Impala

我有以下代码:

select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

然而,似乎每次我 运行 它都会产生不同的结果。

有什么不同?

提前致谢!

(我检查过代码通过以下代码输出不同的结果:

create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

select count (*) from
(
    select * from t1
    union
    select * from t2
) as t;

生成的行数不同于t1的行数和t2的行数;意味着 t1 和 t2 的结果不同。)

首先,不需要重复order by中的partition by列。您可以将其简化为:

lead(session_end_type) over (partition by user_id, session_id order by log_time) as next_session_end_type

其次,如果 log_time 对于给定的 user_id/session_id 不是唯一的,那么结果是 不稳定的 。请记住,SQL 表代表 无序 集,因此如果排序键中存在联系,则没有“自然”顺序可以依靠。

您可以通过以下方式检查:

select user_id, session_id, log_time, count(*)
from table_name
group by user_id, session_id, log_time
having count(*) > 1
order by count(*) desc;

如果您有一列唯一标识每一行(或每个 user/user 会话行),则将其包含在 order by:

lead(session_end_type) over (partition by user_id, session_id
                             order by log_time, <make it stable column>) as next_session_end_type
                            )