使用 BigQuery 查找每天的 运行 用户总数
Use BigQuery to find the running total number of users per day
我有一些用户数据如下,我想知道我每天看到的唯一身份用户的总数 运行。从基本查询开始:
SELECT
day, user_id, COUNT(DISTINCT(user_id)) AS cnt
FROM
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "C" user_id, "2015-02-01" day),
(select "C" user_id, "2015-02-02" day),
(select "D" user_id, "2015-02-04" day)
GROUP BY
day, user_id
本组成绩为:
Row day user_id cnt
1 2015-02-01 A 1
2 2015-02-01 B 1
3 2015-02-02 B 1
4 2015-02-01 C 1
5 2015-02-02 C 1
6 2015-02-04 D 1
我可以看到 2015-02-01
上有三个唯一用户,直到 2015-02-04
上没有新用户,而只有一个(用户 D)。
我需要这样的结果:
Row day running_count
1 2015-02-01 3
2 2015-02-02 3
3 2015-02-03 3
3 2015-02-04 4
其中 running_count
对应于每天新用户数的 运行 计数。例如,2015-02-02
为零,因为当天只有 user_id 的 B 和 C 出现,但他们已经被计算在 2015-02-01
.
在此先感谢您的帮助。
只查看 运行 计数的 MIN(date)、SUM() OVER()。它将缺少中间日期,但您可以通过 LEFT JOIN
获得它
SELECT day, SUM(c) OVER(ORDER BY day)
FROM (
SELECT day, COUNT(DISTINCT user_id) c
FROM (
SELECT MIN(day) day, user_id
FROM
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "C" user_id, "2015-02-01" day),
(select "C" user_id, "2015-02-02" day),
(select "D" user_id, "2015-02-04" day)
GROUP BY user_id
)
GROUP BY day
)
我有一些用户数据如下,我想知道我每天看到的唯一身份用户的总数 运行。从基本查询开始:
SELECT
day, user_id, COUNT(DISTINCT(user_id)) AS cnt
FROM
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "C" user_id, "2015-02-01" day),
(select "C" user_id, "2015-02-02" day),
(select "D" user_id, "2015-02-04" day)
GROUP BY
day, user_id
本组成绩为:
Row day user_id cnt
1 2015-02-01 A 1
2 2015-02-01 B 1
3 2015-02-02 B 1
4 2015-02-01 C 1
5 2015-02-02 C 1
6 2015-02-04 D 1
我可以看到 2015-02-01
上有三个唯一用户,直到 2015-02-04
上没有新用户,而只有一个(用户 D)。
我需要这样的结果:
Row day running_count
1 2015-02-01 3
2 2015-02-02 3
3 2015-02-03 3
3 2015-02-04 4
其中 running_count
对应于每天新用户数的 运行 计数。例如,2015-02-02
为零,因为当天只有 user_id 的 B 和 C 出现,但他们已经被计算在 2015-02-01
.
在此先感谢您的帮助。
只查看 运行 计数的 MIN(date)、SUM() OVER()。它将缺少中间日期,但您可以通过 LEFT JOIN
获得它SELECT day, SUM(c) OVER(ORDER BY day)
FROM (
SELECT day, COUNT(DISTINCT user_id) c
FROM (
SELECT MIN(day) day, user_id
FROM
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "A" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-01" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "B" user_id, "2015-02-02" day),
(select "C" user_id, "2015-02-01" day),
(select "C" user_id, "2015-02-02" day),
(select "D" user_id, "2015-02-04" day)
GROUP BY user_id
)
GROUP BY day
)