PostgreSQL Select r.* by MIN() 在两列上进行分组
PostgreSQL Select the r.* by MIN() with group-by on two columns
名为 results
的 table 的示例模式
id
user_id
activity_id
activity_type_id
start_date_local
elapsed_time
1
100
11111
1
2014-01-07 04:34:38
4444
2
100
22222
1
2015-04-14 06:44:42
5555
3
100
33333
1
2015-04-14 06:44:42
7777
4
100
44444
2
2014-01-07 04:34:38
12345
5
200
55555
1
2015-12-22 16:32:56
5023
问题
Select activity_type_id
和 year
.
每个用户的最快活动(即最短运行时间)的结果
(基本上,在这个简化的示例中,记录 ID=3 应该从选择中排除,因为记录 ID=2 对于给定 activity_type_id 1 和 2015 年的用户 100 来说是最快的)
我试过的
SELECT user_id,
activity_type_id,
EXTRACT(year FROM start_date_local) AS year,
MIN(elapsed_time) AS fastest_time
FROM results
GROUP BY activity_type_id, user_id, year
ORDER BY activity_type_id, user_id, year;
实际
其中选择了我想要的正确结果集,但只包含按列分组
user_id
activity_type_id
year
fastest_time
100
1
2014
4444
100
1
2015
5555
100
2
2014
12345
200
1
2015
5023
目标
拥有包含所有列的实际完整记录。即 results.*
+ year
id
user_id
activity_id
activity_type_id
start_date_local
year
elapsed_time
1
100
11111
1
2014-01-07 04:34:38
2014
2014
2
100
22222
1
2015-04-14 06:44:42
2015
5555
4
100
44444
2
2014-01-07 04:34:38
2014
12345
5
200
55555
1
2015-12-22 16:32:56
2015
5023
您可以为此使用 window 函数:
select id, user_id, activity_id, activity_type_id, start_date_local, year, elapsed_time
from (
SELECT id,
user_id,
activity_id,
activity_type_id,
start_date_local,
EXTRACT(year FROM start_date_local) AS year,
elapsed_time,
min(elapsed_time) over (partition by user_id, activity_type_id, EXTRACT(year FROM start_date_local)) as fastest_time
FROM results
) t
where elapsed_time = fastest_time
order by activity_type_id, user_id, year;
或者使用 distinct on ()
select distinct on (activity_type_id, user_id, extract(year from start_date_local))
id,
user_id,
activity_id,
activity_type_id,
extract(year from start_date_local) as year,
elapsed_time
from results
order by activity_type_id, user_id, year, elapsed_time;
我想你想要这个:
SELECT DISTINCT ON (user_id, activity_type_id, EXTRACT(year FROM start_date_local))
*, EXTRACT(year FROM start_date_local) AS year
FROM results
ORDER BY user_id, activity_type_id, year, elapsed_time;
名为 results
的 table 的示例模式
id | user_id | activity_id | activity_type_id | start_date_local | elapsed_time |
---|---|---|---|---|---|
1 | 100 | 11111 | 1 | 2014-01-07 04:34:38 | 4444 |
2 | 100 | 22222 | 1 | 2015-04-14 06:44:42 | 5555 |
3 | 100 | 33333 | 1 | 2015-04-14 06:44:42 | 7777 |
4 | 100 | 44444 | 2 | 2014-01-07 04:34:38 | 12345 |
5 | 200 | 55555 | 1 | 2015-12-22 16:32:56 | 5023 |
问题
Select activity_type_id
和 year
.
(基本上,在这个简化的示例中,记录 ID=3 应该从选择中排除,因为记录 ID=2 对于给定 activity_type_id 1 和 2015 年的用户 100 来说是最快的)
我试过的
SELECT user_id,
activity_type_id,
EXTRACT(year FROM start_date_local) AS year,
MIN(elapsed_time) AS fastest_time
FROM results
GROUP BY activity_type_id, user_id, year
ORDER BY activity_type_id, user_id, year;
实际
其中选择了我想要的正确结果集,但只包含按列分组
user_id | activity_type_id | year | fastest_time |
---|---|---|---|
100 | 1 | 2014 | 4444 |
100 | 1 | 2015 | 5555 |
100 | 2 | 2014 | 12345 |
200 | 1 | 2015 | 5023 |
目标
拥有包含所有列的实际完整记录。即 results.*
+ year
id | user_id | activity_id | activity_type_id | start_date_local | year | elapsed_time |
---|---|---|---|---|---|---|
1 | 100 | 11111 | 1 | 2014-01-07 04:34:38 | 2014 | 2014 |
2 | 100 | 22222 | 1 | 2015-04-14 06:44:42 | 2015 | 5555 |
4 | 100 | 44444 | 2 | 2014-01-07 04:34:38 | 2014 | 12345 |
5 | 200 | 55555 | 1 | 2015-12-22 16:32:56 | 2015 | 5023 |
您可以为此使用 window 函数:
select id, user_id, activity_id, activity_type_id, start_date_local, year, elapsed_time
from (
SELECT id,
user_id,
activity_id,
activity_type_id,
start_date_local,
EXTRACT(year FROM start_date_local) AS year,
elapsed_time,
min(elapsed_time) over (partition by user_id, activity_type_id, EXTRACT(year FROM start_date_local)) as fastest_time
FROM results
) t
where elapsed_time = fastest_time
order by activity_type_id, user_id, year;
或者使用 distinct on ()
select distinct on (activity_type_id, user_id, extract(year from start_date_local))
id,
user_id,
activity_id,
activity_type_id,
extract(year from start_date_local) as year,
elapsed_time
from results
order by activity_type_id, user_id, year, elapsed_time;
我想你想要这个:
SELECT DISTINCT ON (user_id, activity_type_id, EXTRACT(year FROM start_date_local))
*, EXTRACT(year FROM start_date_local) AS year
FROM results
ORDER BY user_id, activity_type_id, year, elapsed_time;