我每次 运行 我在 SQL Impala 中使用 lead 函数时都会得到不同的结果
I get different results every time I run my which uses lead function in SQL Impala
我有以下代码:
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
然而,似乎每次我 运行 它都会产生不同的结果。
有什么不同?
提前致谢!
(我检查过代码通过以下代码输出不同的结果:
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
select count (*) from
(
select * from t1
union
select * from t2
) as t;
生成的行数不同于t1的行数和t2的行数;意味着 t1 和 t2 的结果不同。)
首先,不需要重复order by
中的partition by
列。您可以将其简化为:
lead(session_end_type) over (partition by user_id, session_id order by log_time) as next_session_end_type
其次,如果 log_time
对于给定的 user_id
/session_id
不是唯一的,那么结果是 不稳定的 。请记住,SQL 表代表 无序 集,因此如果排序键中存在联系,则没有“自然”顺序可以依靠。
您可以通过以下方式检查:
select user_id, session_id, log_time, count(*)
from table_name
group by user_id, session_id, log_time
having count(*) > 1
order by count(*) desc;
如果您有一列唯一标识每一行(或每个 user/user 会话行),则将其包含在 order by
:
lead(session_end_type) over (partition by user_id, session_id
order by log_time, <make it stable column>) as next_session_end_type
)
我有以下代码:
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
然而,似乎每次我 运行 它都会产生不同的结果。
有什么不同?
提前致谢!
(我检查过代码通过以下代码输出不同的结果:
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
select count (*) from
(
select * from t1
union
select * from t2
) as t;
生成的行数不同于t1的行数和t2的行数;意味着 t1 和 t2 的结果不同。)
首先,不需要重复order by
中的partition by
列。您可以将其简化为:
lead(session_end_type) over (partition by user_id, session_id order by log_time) as next_session_end_type
其次,如果 log_time
对于给定的 user_id
/session_id
不是唯一的,那么结果是 不稳定的 。请记住,SQL 表代表 无序 集,因此如果排序键中存在联系,则没有“自然”顺序可以依靠。
您可以通过以下方式检查:
select user_id, session_id, log_time, count(*)
from table_name
group by user_id, session_id, log_time
having count(*) > 1
order by count(*) desc;
如果您有一列唯一标识每一行(或每个 user/user 会话行),则将其包含在 order by
:
lead(session_end_type) over (partition by user_id, session_id
order by log_time, <make it stable column>) as next_session_end_type
)