SQL:运行 不同值的总数
SQL: Running total count of distinct values
我正在尝试在 window 中获取唯一值的滚动数。
这是我的 table 的样子:
SELECT
user_id
, order_date
, product
FROM example_table
WHERE user_id = 1
ORDER BY order_date ASC
user_id
order_date
product
1
2021-01-01
A
1
2021-01-01
B
1
2021-01-04
A
1
2021-01-07
C
1
2021-01-09
C
1
2021-01-20
A
这是我想要实现的目标:
user_id
order_date
product
cum_dist_count
1
2021-01-01
A
1
1
2021-01-02
B
2
1
2021-01-04
A
2
1
2021-01-07
C
3
1
2021-01-09
C
3
1
2021-01-20
A
3
换句话说,我希望能够看到客户到目前为止购买了多少独特的商品,并且能够看到特定日期的数量(因此对于上面的示例:在 2021-01-04 他们购买了 2 件独特的商品,2021 年 1 月 7 日,这个数字是 3 件)。
我尝试通过在 CTE 中选择 user_id 和乘积以及 min(order_date) 进行分组,然后在 user_id 和乘积上执行 ROW_NUMBER CTE 并且它部分起作用了——我能够看到独特产品的数量发生变化的日期(所以对于这个例子:2021-01-01、2021-01-02 和 2021-01-07,但后来我松开了这些行“ between”,我仍然希望能够访问它。
with cte as (
SELECT
user_id
, product
, min(order_date) as first_order
FROM example_table
GROUP BY 1,2
ORDER BY order_date ASC
)
SELECT
user_id
, first_order
, product
, ROW_NUMBER() OVER (PARTITION BY user_id, product ORDER BY first_order) AS number_of_unique_products
WHERE user_id = 1
通过以上,我会得到:
user_id
order_date
product
cum_dist_count
1
2021-01-01
A
1
1
2021-01-02
B
2
1
2021-01-07
C
3
数据库在 BigQuery StandardSQL 中。
非常感谢任何帮助!
对于每个项目,您可以记录它出现的最早日期。然后将它们加起来:
select et.* except (seqnum),
countif(seqnum = 1) over (partition by user_id order by order_date) as running_distinct_count
from (select et.*,
row_number() over (partition by user_id, product order by order_date) as seqnum
from example_table et
) et
以下适用于 BigQuery
select * except(cum_products),
(select count(distinct product) from t.cum_products product) as cum_dist_count
from (
select *,
array_agg(product) over prev_rows as cum_products
from example_table
window prev_rows as (partition by user_id order by order_date)
) t
如果应用于您问题中的示例数据
with example_table as (
select 1 user_id, '2021-01-01' order_date, 'A' product union all
select 1, '2021-01-02', 'B' union all
select 1, '2021-01-04', 'A' union all
select 1, '2021-01-07', 'C' union all
select 1, '2021-01-09', 'C' union all
select 1, '2021-01-20', 'A'
)
输出是
我正在尝试在 window 中获取唯一值的滚动数。
这是我的 table 的样子:
SELECT
user_id
, order_date
, product
FROM example_table
WHERE user_id = 1
ORDER BY order_date ASC
user_id | order_date | product |
---|---|---|
1 | 2021-01-01 | A |
1 | 2021-01-01 | B |
1 | 2021-01-04 | A |
1 | 2021-01-07 | C |
1 | 2021-01-09 | C |
1 | 2021-01-20 | A |
这是我想要实现的目标:
user_id | order_date | product | cum_dist_count |
---|---|---|---|
1 | 2021-01-01 | A | 1 |
1 | 2021-01-02 | B | 2 |
1 | 2021-01-04 | A | 2 |
1 | 2021-01-07 | C | 3 |
1 | 2021-01-09 | C | 3 |
1 | 2021-01-20 | A | 3 |
换句话说,我希望能够看到客户到目前为止购买了多少独特的商品,并且能够看到特定日期的数量(因此对于上面的示例:在 2021-01-04 他们购买了 2 件独特的商品,2021 年 1 月 7 日,这个数字是 3 件)。
我尝试通过在 CTE 中选择 user_id 和乘积以及 min(order_date) 进行分组,然后在 user_id 和乘积上执行 ROW_NUMBER CTE 并且它部分起作用了——我能够看到独特产品的数量发生变化的日期(所以对于这个例子:2021-01-01、2021-01-02 和 2021-01-07,但后来我松开了这些行“ between”,我仍然希望能够访问它。
with cte as (
SELECT
user_id
, product
, min(order_date) as first_order
FROM example_table
GROUP BY 1,2
ORDER BY order_date ASC
)
SELECT
user_id
, first_order
, product
, ROW_NUMBER() OVER (PARTITION BY user_id, product ORDER BY first_order) AS number_of_unique_products
WHERE user_id = 1
通过以上,我会得到:
user_id | order_date | product | cum_dist_count |
---|---|---|---|
1 | 2021-01-01 | A | 1 |
1 | 2021-01-02 | B | 2 |
1 | 2021-01-07 | C | 3 |
数据库在 BigQuery StandardSQL 中。
非常感谢任何帮助!
对于每个项目,您可以记录它出现的最早日期。然后将它们加起来:
select et.* except (seqnum),
countif(seqnum = 1) over (partition by user_id order by order_date) as running_distinct_count
from (select et.*,
row_number() over (partition by user_id, product order by order_date) as seqnum
from example_table et
) et
以下适用于 BigQuery
select * except(cum_products),
(select count(distinct product) from t.cum_products product) as cum_dist_count
from (
select *,
array_agg(product) over prev_rows as cum_products
from example_table
window prev_rows as (partition by user_id order by order_date)
) t
如果应用于您问题中的示例数据
with example_table as (
select 1 user_id, '2021-01-01' order_date, 'A' product union all
select 1, '2021-01-02', 'B' union all
select 1, '2021-01-04', 'A' union all
select 1, '2021-01-07', 'C' union all
select 1, '2021-01-09', 'C' union all
select 1, '2021-01-20', 'A'
)
输出是