仅计算分组查询中的第一个唯一组合
Count only first unique combination in grouped query
我有一个 table 看起来像这样
| date | user_id | event_id | message_id |
|------------|---------|----------|------------|
| 2021-08-04 | 1 | 1 | 1 |
| 2021-08-04 | 1 | 1 | 2 |
| 2021-08-04 | 1 | 2 | 3 |
| 2021-08-04 | 2 | 1 | 4 |
| 2021-08-05 | 1 | 1 | 1 |
| 2021-08-05 | 2 | 2 | 5 |
我想按 user_id、日期和事件对所有内容进行分组。但问题是:我想计算 (date-user-event-message) 的唯一组合,并且只将它添加到日期行,它首先出现的地方。换句话说,如果我有相同的 message_id、相同的 user_id 和相同的 event_id 但不同的日期,我只想计算一次并添加到 date-user-event 行此消息最先出现。所以这就是我想要得到的:
| date | user_id | event_id | count | count_unique |
|------------|---------|----------|-------|--------------|
| 2021-08-04 | 1 | 1 | 2 | 2 | <--- Unique count is 2 because this is the first date when two unique combinations of user+event+message found
| 2021-08-04 | 1 | 2 | 1 | 1 |
| 2021-08-04 | 2 | 1 | 1 | 1 |
| 2021-08-05 | 1 | 1 | 1 | 0 | <--- Unique count is 0, because this message_id for the same user and event already exists for previous date
| 2021-08-05 | 2 | 2 | 1 | 1 |
这有点棘手,我很自信这是不可能的,但我仍然需要确定。
我想到了这个查询:
SELECT
date,
user_id,
event_id,
COUNT(*) as count,
COUNT(DISTINCT message_id) as count_unique
FROM events
GROUP BY user_id, event_id, date
但是我得到的结果显然不是我想要的:
| date | user_id | event_id | count | count_unique |
|------------|---------|----------|-------|--------------|
| 2021-08-04 | 1 | 1 | 2 | 2 |
| 2021-08-04 | 1 | 2 | 1 | 1 |
| 2021-08-04 | 2 | 1 | 1 | 1 |
| 2021-08-05 | 1 | 1 | 1 | 1 | <--- Unique count is 1, because it counts distinct message_ids within the group (row).
| 2021-08-05 | 2 | 2 | 1 | 1 |
所以基本上我需要以某种方式忽略不同计数的日期(例如,在组外计数),并且仅对行(组)的计数值求和,其中日期是首先找到该组合的日期。
此查询将过滤那些 user_id/event_id/message_id 组合出现的第一个日期(使用 row_number window 函数)- 然后在过滤集上聚合:
select
date
, user_id
, event_id
, count(distinct message_id) as count_messages
from
(
select distinct date
, user_id
, event_id
, message_id
, row_number() over
(
partition by user_id,event_id,message_id
order by date asc
) as rank_date
from events
) as DT
where rank_date = 1
换句话说 - 这应该只计算 user_id/event_id/message_id 组合出现的第一个日期。
要计算 count_unique
您只想保留用户为某个事件发送的消息的第一次时间。
要获得此数据集,您必须执行此查询。
select min(a_date) as date ,userid,event_id,message_id
from events
group by userid , event_id , message_id
所以这之后很容易计算出值count_unique
select count(*) as count_unique , date , userid , event_id
from (
select min(date) as date ,userid,event_id,message_id
from events
group by userid , event_id , message_id ) e
group by date , userid , event_id ;
现在您可以左连接查询,该查询按用户 ID、事件 ID 和日期对消息进行计数
select a.*,coalesce(b.count_unique,0) as count_unique
from (
select date , userid , event_id , count(*) as cnt from events
group by date , userid , event_id
) a left join (
select count(*) as count_unique , date , userid , event_id
from (
select min(date) as date ,userid,event_id,message_id
from events
group by userid , event_id , message_id ) e
group by date , userid , event_id
) b on a.date=b.date and
a.userid=b.userid and
a.event_id = b.event_id;
我有一个 table 看起来像这样
| date | user_id | event_id | message_id |
|------------|---------|----------|------------|
| 2021-08-04 | 1 | 1 | 1 |
| 2021-08-04 | 1 | 1 | 2 |
| 2021-08-04 | 1 | 2 | 3 |
| 2021-08-04 | 2 | 1 | 4 |
| 2021-08-05 | 1 | 1 | 1 |
| 2021-08-05 | 2 | 2 | 5 |
我想按 user_id、日期和事件对所有内容进行分组。但问题是:我想计算 (date-user-event-message) 的唯一组合,并且只将它添加到日期行,它首先出现的地方。换句话说,如果我有相同的 message_id、相同的 user_id 和相同的 event_id 但不同的日期,我只想计算一次并添加到 date-user-event 行此消息最先出现。所以这就是我想要得到的:
| date | user_id | event_id | count | count_unique |
|------------|---------|----------|-------|--------------|
| 2021-08-04 | 1 | 1 | 2 | 2 | <--- Unique count is 2 because this is the first date when two unique combinations of user+event+message found
| 2021-08-04 | 1 | 2 | 1 | 1 |
| 2021-08-04 | 2 | 1 | 1 | 1 |
| 2021-08-05 | 1 | 1 | 1 | 0 | <--- Unique count is 0, because this message_id for the same user and event already exists for previous date
| 2021-08-05 | 2 | 2 | 1 | 1 |
这有点棘手,我很自信这是不可能的,但我仍然需要确定。
我想到了这个查询:
SELECT
date,
user_id,
event_id,
COUNT(*) as count,
COUNT(DISTINCT message_id) as count_unique
FROM events
GROUP BY user_id, event_id, date
但是我得到的结果显然不是我想要的:
| date | user_id | event_id | count | count_unique |
|------------|---------|----------|-------|--------------|
| 2021-08-04 | 1 | 1 | 2 | 2 |
| 2021-08-04 | 1 | 2 | 1 | 1 |
| 2021-08-04 | 2 | 1 | 1 | 1 |
| 2021-08-05 | 1 | 1 | 1 | 1 | <--- Unique count is 1, because it counts distinct message_ids within the group (row).
| 2021-08-05 | 2 | 2 | 1 | 1 |
所以基本上我需要以某种方式忽略不同计数的日期(例如,在组外计数),并且仅对行(组)的计数值求和,其中日期是首先找到该组合的日期。
此查询将过滤那些 user_id/event_id/message_id 组合出现的第一个日期(使用 row_number window 函数)- 然后在过滤集上聚合:
select
date
, user_id
, event_id
, count(distinct message_id) as count_messages
from
(
select distinct date
, user_id
, event_id
, message_id
, row_number() over
(
partition by user_id,event_id,message_id
order by date asc
) as rank_date
from events
) as DT
where rank_date = 1
换句话说 - 这应该只计算 user_id/event_id/message_id 组合出现的第一个日期。
要计算 count_unique
您只想保留用户为某个事件发送的消息的第一次时间。
要获得此数据集,您必须执行此查询。
select min(a_date) as date ,userid,event_id,message_id
from events
group by userid , event_id , message_id
所以这之后很容易计算出值count_unique
select count(*) as count_unique , date , userid , event_id
from (
select min(date) as date ,userid,event_id,message_id
from events
group by userid , event_id , message_id ) e
group by date , userid , event_id ;
现在您可以左连接查询,该查询按用户 ID、事件 ID 和日期对消息进行计数
select a.*,coalesce(b.count_unique,0) as count_unique
from (
select date , userid , event_id , count(*) as cnt from events
group by date , userid , event_id
) a left join (
select count(*) as count_unique , date , userid , event_id
from (
select min(date) as date ,userid,event_id,message_id
from events
group by userid , event_id , message_id ) e
group by date , userid , event_id
) b on a.date=b.date and
a.userid=b.userid and
a.event_id = b.event_id;