在滚动时间范围内计算唯一 ID
Count unique ids in a rolling time frame
我有一个简单的 table,如下所示,其中包含许多 ID 和日期。
ID Date
10R46 2014-11-23
10R46 2016-04-11
100R9 2016-12-21
10R91 2013-05-03
... ...
我想制定一个查询来计算日期滚动时间范围内的唯一 ID,例如十天。这意味着对于每个日期,它应该给我该日期和 10 天前的唯一 ID 的数量。结果应该是这个样子。
UniqueTenDays Date
200 2014-11-23
324 2014-11-24
522 2014-11-25
532 2014-11-26
... ...
类似于下面的内容,但我意识到我需要应用 WHERE 子句并以某种方式计算每个日期的 ID。
SELECT Date, COUNT(DISTINCT ID)
FROM T
WHERE Date BETWEEN DATE_SUB(Date, INTERVAL 10 DAY) AND Date
GROUP BY Date
ORDER BY Date
提前致谢。
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH temp1 AS (
SELECT dt, STRING_AGG(DISTINCT id) AS users
FROM `project.dataset.yourtable`
GROUP BY dt
), temp2 AS (
SELECT
dt,
STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
FROM temp1
)
SELECT dt,
(SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2
您可以使用下面的虚拟数据测试/玩它
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT '10R46' id, DATE '2014-11-23' dt UNION ALL
SELECT '10R46', DATE '2016-04-11' UNION ALL
SELECT '10R46', DATE '2016-04-12' UNION ALL
SELECT '10R47', DATE '2016-04-13' UNION ALL
SELECT '10R48', DATE '2016-04-14' UNION ALL
SELECT '100R9', DATE '2016-12-21' UNION ALL
SELECT '10R91', DATE '2013-05-03'
), temp1 AS (
SELECT dt, STRING_AGG(DISTINCT id) AS users
FROM `project.dataset.yourtable`
GROUP BY dt
), temp2 AS (
SELECT
dt,
STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
FROM temp1
)
SELECT dt,
(SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2
我有一个简单的 table,如下所示,其中包含许多 ID 和日期。
ID Date
10R46 2014-11-23
10R46 2016-04-11
100R9 2016-12-21
10R91 2013-05-03
... ...
我想制定一个查询来计算日期滚动时间范围内的唯一 ID,例如十天。这意味着对于每个日期,它应该给我该日期和 10 天前的唯一 ID 的数量。结果应该是这个样子。
UniqueTenDays Date
200 2014-11-23
324 2014-11-24
522 2014-11-25
532 2014-11-26
... ...
类似于下面的内容,但我意识到我需要应用 WHERE 子句并以某种方式计算每个日期的 ID。
SELECT Date, COUNT(DISTINCT ID)
FROM T
WHERE Date BETWEEN DATE_SUB(Date, INTERVAL 10 DAY) AND Date
GROUP BY Date
ORDER BY Date
提前致谢。
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH temp1 AS (
SELECT dt, STRING_AGG(DISTINCT id) AS users
FROM `project.dataset.yourtable`
GROUP BY dt
), temp2 AS (
SELECT
dt,
STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
FROM temp1
)
SELECT dt,
(SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2
您可以使用下面的虚拟数据测试/玩它
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT '10R46' id, DATE '2014-11-23' dt UNION ALL
SELECT '10R46', DATE '2016-04-11' UNION ALL
SELECT '10R46', DATE '2016-04-12' UNION ALL
SELECT '10R47', DATE '2016-04-13' UNION ALL
SELECT '10R48', DATE '2016-04-14' UNION ALL
SELECT '100R9', DATE '2016-12-21' UNION ALL
SELECT '10R91', DATE '2013-05-03'
), temp1 AS (
SELECT dt, STRING_AGG(DISTINCT id) AS users
FROM `project.dataset.yourtable`
GROUP BY dt
), temp2 AS (
SELECT
dt,
STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
FROM temp1
)
SELECT dt,
(SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2