SQL 服务器 LEAD 函数

SQL Server LEAD function

-- FIRST LOGIN DATE
WITH CTE_FIRST_LOGIN AS 
(
    SELECT 
        PLAYER_ID, EVENT_DATE, 
        ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE ASC) AS RN
    FROM 
        ACTIVITY
),
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS 
(
    SELECT 
        PLAYER_ID, 
        LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE 
    FROM 
        ACTIVITY A
    JOIN 
        CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
    WHERE  
        NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
    GROUP BY 
        A.PLAYER_ID
)
-- FRACTION
SELECT 
    NULLIF(ROUND(1.00 * COUNT(CTE_CONSEC.PLAYER_ID) / COUNT(DISTINCT PLAYER_ID), 2), 0) AS FRACTION 
FROM 
    ACTIVITY 
JOIN 
    CTE_CONSEC_PLAYERS CTE_CONSEC ON CTE_CONSEC.PLAYER_ID = ACTIVITY.PLAYER_ID

我在 运行 这个查询时收到以下错误。

[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'NEXT_DATE'. (207) (SQLExecDirectW)

这是一道leetcode中等题550.游戏玩法分析四。我想知道为什么它无法在此处识别列 NEXT_DATE 以及我错过了什么?谢谢!

您给每个 table 一个别名(例如 JOIN CTE_FIRST_LOGIN C 有别名 C),并且每个列访问都是通过别名。您需要将正确的别名从正确的 table 添加到 NEXT_DATE.

问题出在这个 CTE 中:

-- CONSECUTIVE LOGINS prep
CTE_CONSEC_PLAYERS AS (
  SELECT 
    PLAYER_ID, 
    LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE 
  FROM ACTIVITY A
  JOIN CTE_FIRST_LOGIN C  ON A.PLAYER_ID = C.PLAYER_ID
  WHERE  NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
  GROUP BY A.PLAYER_ID
)

请注意,您正在创建 NEXT_DATE 作为此 CTE 中的列别名,但也在 WHERE 子句中 引用 它。这是无效的,因为根据 SQL 子句排序规则,NEXT_DATE 列别名在您到达 ORDER BY 子句之前不存在,该子句是 SQL 查询中的最后一个评估子句或子查询。您在此子查询中没有 ORDER BY 子句,因此从技术上讲,NEXT_DATE 列别名仅存在于 after 并引用您的 [=16] 的 [子] 查询中=] CTE.

要解决此问题,您可能需要两个这样的 CTE(未经测试):

-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS_pre AS (
  SELECT 
    PLAYER_ID, 
    RN,
    EVENT_DATE,
    LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE 
  FROM ACTIVITY A
  JOIN CTE_FIRST_LOGIN C  ON A.PLAYER_ID = C.PLAYER_ID
)
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS (
  SELECT
    PLAYER_ID, 
    MAX(NEXT_DATE) AS NEXT_DATE,
  FROM CTE_CONSEC_PLAYERS_pre
  WHERE  NEXT_DATE = DATEADD(DAY, 1, EVENT_DATE) AND RN = 1
  GROUP BY PLAYER_ID
)

您的主要问题是 NEXT_DATE 是一个 window 函数,因此不能在 WHERE 中引用,因为 SQL 的操作顺序。

但是这个查询似乎过于复杂了。

要解决的问题似乎是:有多少玩家在第一次登录后的第二天登录,占所有玩家的百分比。

这可以通过同时使用多个 window 函数一次性完成(无连接):

WITH CTE_FIRST_LOGIN AS (
    SELECT
      PLAYER_ID,
      EVENT_DATE,
      ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE) AS RN,
-- if EVENT_DATE is a datetime and can have multiple per day then group by CAST(EVENT_DATE AS date) first
      LEAD(EVENT_DATE, 1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) AS NextDate
  FROM ACTIVITY
),
BY_PLAYERS AS (
    SELECT
      c.PLAYER_ID,
      SUM(CASE WHEN c.RN = 1 AND c.NextDate = DATEADD(DAY, 1, c.EVENT_DATE)
        THEN 1 END) AS IsConsecutive
    FROM CTE_FIRST_LOGIN AS c
    GROUP BY c.PLAYER_ID
)
SELECT ROUND(
    1.00 *
    COUNT(c.IsConsecutive) /
    NULLIF(COUNT(*), 0)
  ,2) AS FRACTION
FROM BY_PLAYERS AS c;

理论上您可以将 BY_PLAYERS 合并到外部查询中并使用 COUNT(DISTINCT 但拆分它们感觉更干净