如何在 Snowflake sql 中使用分区依据和排序依据来计算不同的值?
How to count distinct value with partition by and order by in Snowflake sql?
我的数据如下:
| user | eventorder| postal|
|:---- |:---------:| -----:|
| A | 1 | 60616 |
| A | 2 | 10000 |
| A | 3 | 60616 |
| B | 1 | 20000 |
| B | 2 | 30000 |
| B | 3 | 40000 |
| B | 4 | 30000 |
| B | 5 | 20000 |
我需要解决的问题:在用户旅行的每个事件顺序之前有多少个不同的站点?
理想的结果应该是这样的:
| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A | 1 | 60616 | 1 |
| A | 2 | 10000 | 2 |
| A | 3 | 60616 | 2 |
| B | 1 | 20000 | 1 |
| B | 2 | 30000 | 2 |
| B | 3 | 40000 | 3 |
| B | 4 | 30000 | 3 |
| B | 5 | 20000 | 3 |
以A为例,当事件顺序为1时,只走了60616 - 1站。
当事件顺序为 2 时,它已行进 60616 和 10000 - 2 个停靠点。
当事件顺序为 3 时,此用户经过的不同停靠点为 60616 和 10000。- 2 停靠点。
我不允许将 count distinct 与 partition by order by 一起使用。我想做类似 count(distinct(postal)) over (partition by user order by eventorder) 的事情,但这是不允许的。
有谁知道如何解决这个问题?非常感谢!
我使用了您提供的示例数据(只是 A 的一个子集,但这应该会向外扩展)。这里的目标基本上是为每一行生成一个数组,该数组累积以前事件的所有邮政。
with _temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
),
_intermediate as (
select usr
, eventorder
, postal
, array_slice(
array_agg(postal)
within group (order by eventorder)
OVER (Partition by usr)
, 0, eventorder) as full_array
from _temp
group by usr, eventorder, postal
)
select usr, eventorder, postal, count(distinct f.value)
from _intermediate i, lateral flatten(input => i.full_array) f
group by usr, eventorder, postal
也许最简单的方法是使用子查询并计算“1”:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by usr order by eventorder) as num_postals
from (select t.*,
row_number() over (partition by usr, postal order by eventorder) as seqnum
from t
) t
我喜欢@Daniel Zagales 的回答,但这是使用 dense_rank
和 sum
的解决方法
with temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
UNION ALL
select 'B' as usr, 1 as EventOrder, '20000' as Postal
UNION ALL
select 'B' as usr, 2 as EventOrder, '30000' as Postal
UNION ALL
select 'B' as usr, 3 as EventOrder, '40000' as Postal
UNION ALL
select 'B' as usr, 4 as EventOrder, '30000' as Postal
UNION ALL
select 'B' as usr, 5 as EventOrder, '20000' as Postal
),
temp2 as(
select temp.* ,dense_rank()over(partition by usr,Postal order by EventOrder) rks
from temp
)
select usr,eventorder,postal,sum(case when rks = 1 then 1 else 0 END)over(partition by usr order by EventOrder) travelledStop
from temp2
order by usr,EventOrder
基本上使用dense_rank
先出现停止而不是总结。
我的数据如下:
| user | eventorder| postal|
|:---- |:---------:| -----:|
| A | 1 | 60616 |
| A | 2 | 10000 |
| A | 3 | 60616 |
| B | 1 | 20000 |
| B | 2 | 30000 |
| B | 3 | 40000 |
| B | 4 | 30000 |
| B | 5 | 20000 |
我需要解决的问题:在用户旅行的每个事件顺序之前有多少个不同的站点?
理想的结果应该是这样的:
| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A | 1 | 60616 | 1 |
| A | 2 | 10000 | 2 |
| A | 3 | 60616 | 2 |
| B | 1 | 20000 | 1 |
| B | 2 | 30000 | 2 |
| B | 3 | 40000 | 3 |
| B | 4 | 30000 | 3 |
| B | 5 | 20000 | 3 |
以A为例,当事件顺序为1时,只走了60616 - 1站。 当事件顺序为 2 时,它已行进 60616 和 10000 - 2 个停靠点。 当事件顺序为 3 时,此用户经过的不同停靠点为 60616 和 10000。- 2 停靠点。
我不允许将 count distinct 与 partition by order by 一起使用。我想做类似 count(distinct(postal)) over (partition by user order by eventorder) 的事情,但这是不允许的。
有谁知道如何解决这个问题?非常感谢!
我使用了您提供的示例数据(只是 A 的一个子集,但这应该会向外扩展)。这里的目标基本上是为每一行生成一个数组,该数组累积以前事件的所有邮政。
with _temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
),
_intermediate as (
select usr
, eventorder
, postal
, array_slice(
array_agg(postal)
within group (order by eventorder)
OVER (Partition by usr)
, 0, eventorder) as full_array
from _temp
group by usr, eventorder, postal
)
select usr, eventorder, postal, count(distinct f.value)
from _intermediate i, lateral flatten(input => i.full_array) f
group by usr, eventorder, postal
也许最简单的方法是使用子查询并计算“1”:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by usr order by eventorder) as num_postals
from (select t.*,
row_number() over (partition by usr, postal order by eventorder) as seqnum
from t
) t
我喜欢@Daniel Zagales 的回答,但这是使用 dense_rank
和 sum
with temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
UNION ALL
select 'B' as usr, 1 as EventOrder, '20000' as Postal
UNION ALL
select 'B' as usr, 2 as EventOrder, '30000' as Postal
UNION ALL
select 'B' as usr, 3 as EventOrder, '40000' as Postal
UNION ALL
select 'B' as usr, 4 as EventOrder, '30000' as Postal
UNION ALL
select 'B' as usr, 5 as EventOrder, '20000' as Postal
),
temp2 as(
select temp.* ,dense_rank()over(partition by usr,Postal order by EventOrder) rks
from temp
)
select usr,eventorder,postal,sum(case when rks = 1 then 1 else 0 END)over(partition by usr order by EventOrder) travelledStop
from temp2
order by usr,EventOrder
基本上使用dense_rank
先出现停止而不是总结。