用条件滞后语句查询

Query with conditional lag statement

我正在尝试查找该行满足某些条件的列的先前值。考虑 table:

| user_id | session_id | time       | referrer   |  
|---------|------------|------------|------------|  
| 1       | 1          | 2018-01-01 | [NULL]     |  
| 1       | 2          | 2018-02-01 | google.com |  
| 1       | 3          | 2018-03-01 | google.com |

我想为每个会话查找引荐来源网址为 NULL 的 session_id 的先前值。因此,对于第二行和第三行,parent_session_id 的值应为 1.

但是,仅使用 lag(session_id) over (partition by user_id order by time),我将在第 3 行得到 parent_session_id=2。

我怀疑可以使用 window 函数的组合来完成,但我就是想不通。

您甚至可以通过相关子查询来做到这一点:

SELECT
    session_id,
    (SELECT MAX(t2.session_id) FROM yourTable t2
     WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
    session_id;

这是一种使用分析函数的方法,可能会起作用:

WITH cte AS (
    SELECT *,
        SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
            OVER (ORDER BY session_id) cnt
    FROM yourTable
)

SELECT
    session_id,
    CASE WHEN cnt = 0
         THEN NULL
         ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
    session_id;

我会结合使用 last_value() 和 if():

WITH t AS (SELECT * FROM UNNEST([ 
    struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
    (1,2,date('2018-02-01'), 'google.com'),
    (1,3,date('2018-03-01'), 'google.com')
  ]) )

SELECT
  *,
  last_value(IF(referrer is null, session_id, NULL) ignore nulls) 
    over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t