BigQuery 始终按行显示唯一性

BigQuery all time uniqueness by rows

我有一个非常相似的问题,我在上周之前已经有了这个问题:

我有这样一个数据库:

ID Day Value
1 2021-09-01 a
2 2021-09-01 b
3 2021-09-01 c
4 2021-09-02 d
5 2021-09-02 a
6 2021-09-02 a
7 2021-09-02 e
8 2021-09-03 c
9 2021-09-03 f
10 2021-09-03 a

我想计算我每天和所有时间有多少不同的行,但是所有时间的唯一性应该只计算之前的日期(如果用户是,我想计算背后的业务逻辑新的)。与之前的问题不同,我想保留行,但我想按行查看唯一性(作为新列)。这与我们作为新用户或回访用户在 Google Analytics 上所拥有的几乎相同。因此,如果用户在 2021-09-02 访问网站并在 2021-09-03 访问该网站,首先我希望看到新用户,但在 2021-09-03 我希望看到返回用户。 所以我想看看这个输出

ID Day Value Type
1 2021-09-01 a New
2 2021-09-01 b New
3 2021-09-01 c New
4 2021-09-02 d New
5 2021-09-02 a Returning
6 2021-09-02 a Returning
7 2021-09-02 e Returning
8 2021-09-03 c New
9 2021-09-03 f New
10 2021-09-03 a Returning

如果我只检查一天我可以做到,但是如果我在整个数据库中检查这些就不能做到,因为检查之前的日期。

看来您想使用 doc

中详述的分析功能

将分析函数 OVERPARTITION BY 结合使用,您可以通过值对数据进行分区,然后使用 ORDER BY 按日期排序。现在检查它是否是该分区中的第一行并相应地分配类型。

这个查询应该能得到你想要的东西;

WITH data as(
    SELECT "2021-09-01" day,"a" value
    UNION ALL ( SELECT "2021-09-01", "b" )
    UNION ALL ( SELECT "2021-09-01", "c" )
    UNION ALL ( SELECT "2021-09-02", "d" )
    UNION ALL ( SELECT "2021-09-02", "a" )
    UNION ALL ( SELECT "2021-09-02", "a" )
    UNION ALL ( SELECT "2021-09-02", "e" )
    UNION ALL ( SELECT "2021-09-03", "c" )
    UNION ALL ( SELECT "2021-09-03", "f" )
    UNION ALL ( SELECT "2021-09-03", "a" )
    )
    
    SELECT day, value,
      IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1, 'New','Returning') as type
    
    FROM data

结果

Row day value type
1 2021-09-01 a New
2 2021-09-02 a Returning
3 2021-09-02 a Returning
4 2021-09-03 a Returning
5 2021-09-01 b New
6 2021-09-01 c New
7 2021-09-03 c Returning
8 2021-09-02 d New
9 2021-09-02 e New
10 2021-09-03 f New

修改了附加要求

要为所有分组 values 提供与第一个事件相同的日期 New 类型,您可以使用另一个分析函数 FIRST_VALUE 并结合当前日期值。

WITH data as
(SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01","b")
UNION ALL ( SELECT "2021-09-01","c")
UNION ALL ( SELECT "2021-09-02","d")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-01","a")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-02","e")
UNION ALL ( SELECT "2021-09-03","c")
UNION ALL ( SELECT "2021-09-03","f")
UNION ALL ( SELECT"2021-09-03","a"))

SELECT *,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1 OR FIRST_VALUE(day) OVER (PARTITION BY value ORDER BY day) = day, 'New','Returning')  as type
FROM data

结果

Row day value type
1 2021-09-01 a New
2 2021-09-01 a New
3 2021-09-02 a Returning
4 2021-09-02 a Returning
5 2021-09-03 a Returning
6 2021-09-01 b New
7 2021-09-01 c New
8 2021-09-03 c Returning
9 2021-09-02 d New
10 2021-09-02 e New
11 2021-09-03 f New

同时考虑以下方法

select *, if(0 = count(*) over prev_days, 'New', 'Returning') as type
from your_table
window prev_days as (
  partition by value order by unix_date(date(day)) 
  range between unbounded preceding and 1 preceding 
)