Bigquery 中新的和保留的重复项

New and Retained duplicates in Bigquery

我正在使用 BigQuery 和 DataStudio 来显示整个星期内绘制的留存率,当用户是新用户然后在同一周再次使用该应用程序时我遇到了麻烦,他既是新用户又是留存用户,并且在我的计算中我希望他在使用该应用程序的第一周才成为新手,然后如果他在 2 周内再次使用该应用程序,他就是 "Retained"。

这是我的查询:

SELECT
UserID,
DATE,
DATE_DIFF(DATE,PreviousSessionDATE, DAY) as DaysBetweenSessions,
(SELECT
 CASE
WHEN DaysBetweenSessions <= 14 THEN 'Retained'
WHEN DaysBetweenSessions >14 THEN 'Returned'
WHEN DaysBetweenSessions IS NULL AND FirstSessionDATE = DATE THEN 'New'
WHEN DaysBetweenSessions IS NULL THEN 'User has an old version without Retention Parameters'
END) as User_Type
FROM
app_project.analytics_*********.events_*
GROUP BY
1,2,3,4
ORDER BY
DATE DESC,
DaysBetweenSessions DESC,
1,2,3,4

那么结果通常会很好,除非用户多次使用该应用程序并获得 1-14 之间的 DaysBetweenSessions,然后在同一周内被计为新的和保留的。

然后在 DataStudio 中,我将执行 YEARWEEK(DATE) 以每周可视化,并将 Count_Distinct(UserID) 作为我的指标。

有什么想法可以让新用户只在第一周被算作新用户,即使该用户在那一周内接受培训?

Current output in BQ:
UserID     DATE        DaysBetweenSessions     User_Type
123        20180801    NULL                    "New"
123        20180801    0                       "Retained"

和期望的输出

UserID     DATE        DaysBetweenSessions     User_Type
123        20180801    NULL                    "New"

可能有更简洁的方法来执行此操作,但是...

WITH CTE AS
(SELECT
UserID,
DATE,
DATE_DIFF(DATE,PreviousSessionDATE, DAY) as DaysBetweenSessions,
(SELECT
CASE
WHEN DaysBetweenSessions <= 14 THEN 'Retained'
WHEN DaysBetweenSessions >14 THEN 'Returned'
WHEN DaysBetweenSessions IS NULL AND FirstSessionDATE = DATE THEN 'New'
WHEN DaysBetweenSessions IS NULL THEN 'User has an old version without Retention 
Parameters'
END) as User_Type,
(SELECT
CASE
WHEN DaysBetweenSessions <= 7 THEN 0
WHEN DaysBetweenSessions >7 THEN 1
WHEN DaysBetweenSessions IS NULL AND FirstSessionDATE = DATE THEN 0
WHEN DaysBetweenSessions IS NULL THEN 2
END) as DaysBetween
FROM
app_project.analytics_*********.events_*
GROUP BY
1,2,3,4
ORDER BY
DATE DESC,
DaysBetweenSessions DESC,
1,2,3,4),

Result as 
(SELECT *, min(User_Type) OVER (PARTITION BY UserID, DaysBetween) minUser_Type
FROM CTE)

SELECT UserID,
DATE,
DaysBetweenSessions,
User_type 
FROM Result 
WHERE NOT (User_Type <> 'New' AND minUser_Type = 'New')

第二部分应该添加一个维度,该维度是该周按字母顺序排列最低的 User_Type(因此,如果您将任何内容重命名为按字母顺序小于 'New' 的内容,它将不起作用,最好使用数字).

最后一部分应该去掉那一周有 'New' 但行 User_Type 不是 'New' 的那些。