使用 SQL 的队列分析(雪花)

Cohort Analysis using SQL (Snowflake)

我正在使用 table 交易进行同期群分析。下面是 table 架构,

USER_ID              NUMBER,
PAYMENT_DATE_UTC     DATE,
IS_PAYMENT_ADDED     BOOLEAN

下面是一个快速查询,可以了解 USER_ID 12345(示例)如何根据提供的日期过滤器筛选不同的同类群组,

WITH RESULT(
SELECT
USER_ID,
TO_DATE(PAYMENT_DATE_UTC) AS PAYMENT_DATE,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY 1,2
HAVING PAYMENT_ADDED_COUNT>=1
ORDER BY 2
)
SELECT
COUNT(DISTINCT r.USER_ID),
SUM(r.PAYMENT_ADDED_COUNT)
FROM RESULT r
WHERE r.USER_ID=12345
AND (r.PAYMENT_DATE>='2021-02-01' AND r.PAYMENT_DATE<'2021-02-15')

此时间范围(两周)的查询结果为

| 1 | 55 |

并且此 USER_ID 将根据提供的日期筛选器被归类为普通用户群组(付款超过 10 次的群组)

如果相同的查询是 运行,时间范围只是一天,比如 '2021-02-07',结果将是

| 1 | 10 |

并且此 USER_ID 将根据提供的日期筛选器归类为临时用户群组(付款次数在 1 到 10 之间)

我有下面的查询,根据添加的付款总和将 USER_ID 分为两个不同的群组,

WITH
ALL_USER_COHORT AS 
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
),
OCASSIONAL_USER_COHORT AS 
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING (PAYMENT_ADDED_COUNT>=1 AND PAYMENT_ADDED_COUNT<=10)
),
REGULAR_USER_COHORT AS 
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING PAYMENT_ADDED_COUNT>10
)
SELECT
COUNT(DISTINCT ou.USER_ID) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.USER_ID) AS "REGULAR USERS"
FROM ALL_USER_COHORT au
LEFT JOIN OCASSIONAL_USER_COHORT ou ON au.USER_ID=ou.USER_ID
LEFT JOIN REGULAR_USER_COHORT ru ON au.USER_ID=ru.USER_ID
LEFT JOIN TRANSACTIONS t ON au.USER_ID=t.USER_ID
WHERE au.USER_ID=12345
AND TO_DATE(t.PAYMENT_DATE_UTC)>='2021-02-07'

理想情况下,USER_ID 12345 应该根据提供的日期过滤器将其存储为“临时用户”,但查询将其存储为“普通用户”。

对于初学者,您的 CTE 可以像这样删除冗余:

WITH all_user_cohort AS (
    SELECT
        USER_ID,
        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
    FROM transactions
    GROUP BY user_id
), ocassional_user_cohort AS (
    SELECT * FROM all_user_cohort
    WHERE PAYMENT_ADDED_COUNT between 1 AND 10
), regular_user_cohort AS (
    SELECT * FROM all_user_cohort
    WHERE PAYMENT_ADDED_COUNT > 10
)
SELECT
COUNT(DISTINCT ou.user_id) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.user_id) AS "REGULAR USERS"
FROM all_user_cohort AS au
LEFT JOIN ocassional_user_cohort ou ON au.user_id=ou.user_id
LEFT JOIN regular_user_cohort ru ON au.user_id=ru.user_id
LEFT JOIN transactions t ON au.user_id=t.user_id
WHERE au.user_id=12345
AND TO_DATE(t.payment_date_utc)>='2021-03-01'

但是你遇到这个问题的原因是你一直在做属于自己的事情。

您想要的是将日期过滤器移动到 all_user_cohort,而不是在您可以只对满足需要的行数求和时制作表格。

WITH all_user_cohort AS (
    SELECT
        USER_ID,
        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
    FROM transactions
    WHERE TO_DATE(payment_date_utc)>='2021-03-01'
    GROUP BY user_id
)   
SELECT
    SUM(IFF(payment_added_count between 1 AND 10, 1,0)) AS "OCCASIONAL USERS"
    SUM(IFF(payment_added_count > 10, 1,0)) AS "REGULAR USERS"
FROM transactions 
WHERE au.user_id=12345

如果出于其他原因,这更符合您的要求,也可以采用不同的方式。

WITH all_user_cohort AS (
    SELECT
        USER_ID,
        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
    FROM transactions
    WHERE TO_DATE(payment_date_utc)>='2021-03-01'
    GROUP BY user_id
), classify_users AS (
    SELECT user_id
        ,CASE 
            WHEN payment_added_count between 1 AND 10 THEN 'OCCASIONAL USERS'
            WHEN payment_added_count > 10 THEN 'REGULAR USERS'
            ELSE 'users with zero payments'
        END AS classified
    FROM all_user_cohort
)
SELECT classified
    ,count(*)
FROM classify_users
WHERE user_id=12345
GROUP BY 1