BigQuery 始终按行显示唯一性
BigQuery all time uniqueness by rows
我有一个非常相似的问题,我在上周之前已经有了这个问题:
我有这样一个数据库:
ID
Day
Value
1
2021-09-01
a
2
2021-09-01
b
3
2021-09-01
c
4
2021-09-02
d
5
2021-09-02
a
6
2021-09-02
a
7
2021-09-02
e
8
2021-09-03
c
9
2021-09-03
f
10
2021-09-03
a
我想计算我每天和所有时间有多少不同的行,但是所有时间的唯一性应该只计算之前的日期(如果用户是,我想计算背后的业务逻辑新的)。与之前的问题不同,我想保留行,但我想按行查看唯一性(作为新列)。这与我们作为新用户或回访用户在 Google Analytics 上所拥有的几乎相同。因此,如果用户在 2021-09-02 访问网站并在 2021-09-03 访问该网站,首先我希望看到新用户,但在 2021-09-03 我希望看到返回用户。
所以我想看看这个输出
ID
Day
Value
Type
1
2021-09-01
a
New
2
2021-09-01
b
New
3
2021-09-01
c
New
4
2021-09-02
d
New
5
2021-09-02
a
Returning
6
2021-09-02
a
Returning
7
2021-09-02
e
Returning
8
2021-09-03
c
New
9
2021-09-03
f
New
10
2021-09-03
a
Returning
如果我只检查一天我可以做到,但是如果我在整个数据库中检查这些就不能做到,因为检查之前的日期。
看来您想使用 doc
中详述的分析功能
将分析函数 OVER
与 PARTITION BY
结合使用,您可以通过值对数据进行分区,然后使用 ORDER BY
按日期排序。现在检查它是否是该分区中的第一行并相应地分配类型。
这个查询应该能得到你想要的东西;
WITH data as(
SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01", "b" )
UNION ALL ( SELECT "2021-09-01", "c" )
UNION ALL ( SELECT "2021-09-02", "d" )
UNION ALL ( SELECT "2021-09-02", "a" )
UNION ALL ( SELECT "2021-09-02", "a" )
UNION ALL ( SELECT "2021-09-02", "e" )
UNION ALL ( SELECT "2021-09-03", "c" )
UNION ALL ( SELECT "2021-09-03", "f" )
UNION ALL ( SELECT "2021-09-03", "a" )
)
SELECT day, value,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1, 'New','Returning') as type
FROM data
结果
Row
day
value
type
1
2021-09-01
a
New
2
2021-09-02
a
Returning
3
2021-09-02
a
Returning
4
2021-09-03
a
Returning
5
2021-09-01
b
New
6
2021-09-01
c
New
7
2021-09-03
c
Returning
8
2021-09-02
d
New
9
2021-09-02
e
New
10
2021-09-03
f
New
修改了附加要求
要为所有分组 values
提供与第一个事件相同的日期 New
类型,您可以使用另一个分析函数 FIRST_VALUE
并结合当前日期值。
WITH data as
(SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01","b")
UNION ALL ( SELECT "2021-09-01","c")
UNION ALL ( SELECT "2021-09-02","d")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-01","a")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-02","e")
UNION ALL ( SELECT "2021-09-03","c")
UNION ALL ( SELECT "2021-09-03","f")
UNION ALL ( SELECT"2021-09-03","a"))
SELECT *,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1 OR FIRST_VALUE(day) OVER (PARTITION BY value ORDER BY day) = day, 'New','Returning') as type
FROM data
结果
Row
day
value
type
1
2021-09-01
a
New
2
2021-09-01
a
New
3
2021-09-02
a
Returning
4
2021-09-02
a
Returning
5
2021-09-03
a
Returning
6
2021-09-01
b
New
7
2021-09-01
c
New
8
2021-09-03
c
Returning
9
2021-09-02
d
New
10
2021-09-02
e
New
11
2021-09-03
f
New
同时考虑以下方法
select *, if(0 = count(*) over prev_days, 'New', 'Returning') as type
from your_table
window prev_days as (
partition by value order by unix_date(date(day))
range between unbounded preceding and 1 preceding
)
我有一个非常相似的问题,我在上周之前已经有了这个问题:
我有这样一个数据库:
ID | Day | Value |
---|---|---|
1 | 2021-09-01 | a |
2 | 2021-09-01 | b |
3 | 2021-09-01 | c |
4 | 2021-09-02 | d |
5 | 2021-09-02 | a |
6 | 2021-09-02 | a |
7 | 2021-09-02 | e |
8 | 2021-09-03 | c |
9 | 2021-09-03 | f |
10 | 2021-09-03 | a |
我想计算我每天和所有时间有多少不同的行,但是所有时间的唯一性应该只计算之前的日期(如果用户是,我想计算背后的业务逻辑新的)。与之前的问题不同,我想保留行,但我想按行查看唯一性(作为新列)。这与我们作为新用户或回访用户在 Google Analytics 上所拥有的几乎相同。因此,如果用户在 2021-09-02 访问网站并在 2021-09-03 访问该网站,首先我希望看到新用户,但在 2021-09-03 我希望看到返回用户。 所以我想看看这个输出
ID | Day | Value | Type |
---|---|---|---|
1 | 2021-09-01 | a | New |
2 | 2021-09-01 | b | New |
3 | 2021-09-01 | c | New |
4 | 2021-09-02 | d | New |
5 | 2021-09-02 | a | Returning |
6 | 2021-09-02 | a | Returning |
7 | 2021-09-02 | e | Returning |
8 | 2021-09-03 | c | New |
9 | 2021-09-03 | f | New |
10 | 2021-09-03 | a | Returning |
如果我只检查一天我可以做到,但是如果我在整个数据库中检查这些就不能做到,因为检查之前的日期。
看来您想使用 doc
中详述的分析功能将分析函数 OVER
与 PARTITION BY
结合使用,您可以通过值对数据进行分区,然后使用 ORDER BY
按日期排序。现在检查它是否是该分区中的第一行并相应地分配类型。
这个查询应该能得到你想要的东西;
WITH data as(
SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01", "b" )
UNION ALL ( SELECT "2021-09-01", "c" )
UNION ALL ( SELECT "2021-09-02", "d" )
UNION ALL ( SELECT "2021-09-02", "a" )
UNION ALL ( SELECT "2021-09-02", "a" )
UNION ALL ( SELECT "2021-09-02", "e" )
UNION ALL ( SELECT "2021-09-03", "c" )
UNION ALL ( SELECT "2021-09-03", "f" )
UNION ALL ( SELECT "2021-09-03", "a" )
)
SELECT day, value,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1, 'New','Returning') as type
FROM data
结果
Row | day | value | type |
---|---|---|---|
1 | 2021-09-01 | a | New |
2 | 2021-09-02 | a | Returning |
3 | 2021-09-02 | a | Returning |
4 | 2021-09-03 | a | Returning |
5 | 2021-09-01 | b | New |
6 | 2021-09-01 | c | New |
7 | 2021-09-03 | c | Returning |
8 | 2021-09-02 | d | New |
9 | 2021-09-02 | e | New |
10 | 2021-09-03 | f | New |
修改了附加要求
要为所有分组 values
提供与第一个事件相同的日期 New
类型,您可以使用另一个分析函数 FIRST_VALUE
并结合当前日期值。
WITH data as
(SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01","b")
UNION ALL ( SELECT "2021-09-01","c")
UNION ALL ( SELECT "2021-09-02","d")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-01","a")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-02","e")
UNION ALL ( SELECT "2021-09-03","c")
UNION ALL ( SELECT "2021-09-03","f")
UNION ALL ( SELECT"2021-09-03","a"))
SELECT *,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1 OR FIRST_VALUE(day) OVER (PARTITION BY value ORDER BY day) = day, 'New','Returning') as type
FROM data
结果
Row | day | value | type |
---|---|---|---|
1 | 2021-09-01 | a | New |
2 | 2021-09-01 | a | New |
3 | 2021-09-02 | a | Returning |
4 | 2021-09-02 | a | Returning |
5 | 2021-09-03 | a | Returning |
6 | 2021-09-01 | b | New |
7 | 2021-09-01 | c | New |
8 | 2021-09-03 | c | Returning |
9 | 2021-09-02 | d | New |
10 | 2021-09-02 | e | New |
11 | 2021-09-03 | f | New |
同时考虑以下方法
select *, if(0 = count(*) over prev_days, 'New', 'Returning') as type
from your_table
window prev_days as (
partition by value order by unix_date(date(day))
range between unbounded preceding and 1 preceding
)