用条件滞后语句查询
Query with conditional lag statement
我正在尝试查找该行满足某些条件的列的先前值。考虑 table:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
我想为每个会话查找引荐来源网址为 NULL 的 session_id 的先前值。因此,对于第二行和第三行,parent_session_id
的值应为 1.
但是,仅使用 lag(session_id) over (partition by user_id order by time)
,我将在第 3 行得到 parent_session_id
=2。
我怀疑可以使用 window 函数的组合来完成,但我就是想不通。
您甚至可以通过相关子查询来做到这一点:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
这是一种使用分析函数的方法,可能会起作用:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
我会结合使用 last_value() 和 if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t
我正在尝试查找该行满足某些条件的列的先前值。考虑 table:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
我想为每个会话查找引荐来源网址为 NULL 的 session_id 的先前值。因此,对于第二行和第三行,parent_session_id
的值应为 1.
但是,仅使用 lag(session_id) over (partition by user_id order by time)
,我将在第 3 行得到 parent_session_id
=2。
我怀疑可以使用 window 函数的组合来完成,但我就是想不通。
您甚至可以通过相关子查询来做到这一点:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
这是一种使用分析函数的方法,可能会起作用:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
我会结合使用 last_value() 和 if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t