嵌套 window 函数在雪花中不起作用
Nested window function not working in snowflake
我正在研究将 spark sql 迁移到 snowsql。
有一次我遇到了一个场景,我在 spark sql 中使用了嵌套的 window 函数。我想将该 sql 查询迁移到雪花中。但是雪花不支持嵌套 window 函数。
Spark sql 查询 -
SELECT
*,
(case when (
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
) is not null)
then
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
)
else 0 end)/1000 as pg_to_pg
FROM dwell_time_step2
输出-
我已尝试将上述查询转换为如下雪花。
化雪sql -
with lagsession as (
SELECT
a.*,
lag(timestamp) over (partition BY session_id order by timestamp asc) lagsession
FROM mktg_web_wi.dwell_time_step2 a
)
select
a.,
nvl(lead(a.timestamp - b.lagsession) over (partition BY a.session_id order by a.timestamp),0)/1000 pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a,
lagsession b
WHERE a.key=b.key
order by timestamp;
输出-
这里,问题出在 Snow-sql 输出中。驻留时间值正在分配给不同的网址。
期望使 spark-sql 查询在雪上工作sql 并且两种情况下的输出应该相同。
如果有人知道如何解决这个问题,请告诉我。
谢谢!!
我认为将其从嵌套 window 函数更改为 cte 已经改变了滞后和超前所指的记录,但我很难理解这一点。
无论如何,如果我能理解您的代码,我认为有一种更简单的方法,只有一个 windows 函数。
select
a.*,
(nvl(lead(a.timestamp) over (partition BY a.session_id order by a.timestamp) - a.timestamp)/1000,0) pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a
order by timestamp;
我正在研究将 spark sql 迁移到 snowsql。 有一次我遇到了一个场景,我在 spark sql 中使用了嵌套的 window 函数。我想将该 sql 查询迁移到雪花中。但是雪花不支持嵌套 window 函数。
Spark sql 查询 -
SELECT
*,
(case when (
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
) is not null)
then
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
)
else 0 end)/1000 as pg_to_pg
FROM dwell_time_step2
输出-
化雪sql -
with lagsession as (
SELECT
a.*,
lag(timestamp) over (partition BY session_id order by timestamp asc) lagsession
FROM mktg_web_wi.dwell_time_step2 a
)
select
a.,
nvl(lead(a.timestamp - b.lagsession) over (partition BY a.session_id order by a.timestamp),0)/1000 pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a,
lagsession b
WHERE a.key=b.key
order by timestamp;
输出-
这里,问题出在 Snow-sql 输出中。驻留时间值正在分配给不同的网址。
期望使 spark-sql 查询在雪上工作sql 并且两种情况下的输出应该相同。
如果有人知道如何解决这个问题,请告诉我。
谢谢!!
我认为将其从嵌套 window 函数更改为 cte 已经改变了滞后和超前所指的记录,但我很难理解这一点。
无论如何,如果我能理解您的代码,我认为有一种更简单的方法,只有一个 windows 函数。
select
a.*,
(nvl(lead(a.timestamp) over (partition BY a.session_id order by a.timestamp) - a.timestamp)/1000,0) pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a
order by timestamp;