从 Postgres 记录中的行中获取最大值并按多列分组
Get the maximum value from rows in Postgres records and group by multiple columns
我有一个 table 这样的:
p_id | createdat | pagetitle | sessionid | text | device | deviceserial
------+---------------------+-----------+-----------+-----------------+---------+--------------
| 2020-11-27 08:07:39 | | | App launch | android | 636363636890
| 2020-09-01 08:08:18 | | | search | Android | 636363636890
| 2020-09-02 08:10:10 | | | scan | Android | 636363636890
| 2020-09-02 08:12:10 | | | destroy | Android | 636363636890
| 2020-09-02 08:40:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:45:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:43:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:50:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:47:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:53:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:50:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:55:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:52:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:00:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:55:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:05:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:59:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:07:11 | | | hi | Android | 6625839827
| 2020-09-02 09:01:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:09:11 | | | hi | IOS | 6625839827
| 2020-09-02 09:03:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:09:11 | | | hi | Android | 6625839828
| 2020-09-02 09:03:10 | | | launchComponent | IOS | 636363636891
| 2020-09-02 09:13:11 | | | hi | Android | 6625839828
| 2020-09-02 09:06:10 | | | launchComponent | IOS | 636363636891
由此table,我想达到这样的效果:
deviceserial | event_count | hr device
--------------+-------------+---------------------+---------------------
6625839828 | 2 | 2020-09-02 09:00:00 |Android
636363636890 | 8 | 2020-09-02 08:00:00 |Android
636363636891 | 2 | 2020-09-02 09:00:00 |IOS
6625839827 | 5 | 2020-09-02 08:00:00 |IOS
这是我的 steps:I 按记录分组的 deviceserial,每小时作为 hr 和 device 和计数最大值(event_count)。
我试过这个查询:
select deviceserial,max(event_count) as event_count,hr,device
from (
select deviceserial,count(*) as event_count,
date_trunc('hour', createdat) as hr,device
from devices
group by deviceserial,hr,device
) t
group by deviceserial,hr,device
这是我的结果:
deviceserial | event_count | hr device
--------------+-------------+---------------------+---------------------
636363636890 1 2020-11-27 08:00:00 | android
636363636891 2 2020-09-02 09:00:00 | IOS
6625839827 4 2020-09-02 09:00:00 | IOS
6625839827 5 2020-09-02 08:00:00 | IOS
636363636890 8 2020-09-02 08:00:00 | Android
636363636890 1 2020-09-01 08:00:00 | Android
636363636890 2 2020-09-02 09:00:00 | Android
6625839828 2 2020-09-02 09:00:00 | Android
如果我没看错的话,你可以用distinct on
:
select distinct on (deviceserial)
deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device
from devices
group by deviceserial, hr, device
order by deviceserial, event_count desc
这会为您提供每个设备序列号发生最多事件的时间/设备。但是请注意,这不能正确处理关系(每个设备序列只给出一行)。如果你想允许顶级关系,你可以使用 rank()
代替:
select *
from (
select deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device,
rank() over(partition by deviceserial order by event_count desc) rn
from devices
group by deviceserial, hr, device
) t
where rn = 1
order by deviceserial
或者,在 Postgres 13 中:
select deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device
from devices
group by deviceserial, hr, device
order by rank() over(partition by deviceserial order by event_count desc)
fetch first row with ties
您可以使用 window 函数 rank()
如下:
select * from
(select deviceserial,count(*) as event_count,
date_trunc('hour', createdat) as hr, device,
rank() over (partition by deviceserial order by count(*) desc) as rn
from devices
group by deviceserial,hr,device)
where rn = 1
我有一个 table 这样的:
p_id | createdat | pagetitle | sessionid | text | device | deviceserial
------+---------------------+-----------+-----------+-----------------+---------+--------------
| 2020-11-27 08:07:39 | | | App launch | android | 636363636890
| 2020-09-01 08:08:18 | | | search | Android | 636363636890
| 2020-09-02 08:10:10 | | | scan | Android | 636363636890
| 2020-09-02 08:12:10 | | | destroy | Android | 636363636890
| 2020-09-02 08:40:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:45:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:43:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:50:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:47:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:53:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:50:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 08:55:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:52:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:00:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:55:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:05:11 | | | hi | IOS | 6625839827
| 2020-09-02 08:59:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:07:11 | | | hi | Android | 6625839827
| 2020-09-02 09:01:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:09:11 | | | hi | IOS | 6625839827
| 2020-09-02 09:03:10 | | | launchComponent | Android | 636363636890
| 2020-09-02 09:09:11 | | | hi | Android | 6625839828
| 2020-09-02 09:03:10 | | | launchComponent | IOS | 636363636891
| 2020-09-02 09:13:11 | | | hi | Android | 6625839828
| 2020-09-02 09:06:10 | | | launchComponent | IOS | 636363636891
由此table,我想达到这样的效果:
deviceserial | event_count | hr device
--------------+-------------+---------------------+---------------------
6625839828 | 2 | 2020-09-02 09:00:00 |Android
636363636890 | 8 | 2020-09-02 08:00:00 |Android
636363636891 | 2 | 2020-09-02 09:00:00 |IOS
6625839827 | 5 | 2020-09-02 08:00:00 |IOS
这是我的 steps:I 按记录分组的 deviceserial,每小时作为 hr 和 device 和计数最大值(event_count)。
我试过这个查询:
select deviceserial,max(event_count) as event_count,hr,device
from (
select deviceserial,count(*) as event_count,
date_trunc('hour', createdat) as hr,device
from devices
group by deviceserial,hr,device
) t
group by deviceserial,hr,device
这是我的结果:
deviceserial | event_count | hr device
--------------+-------------+---------------------+---------------------
636363636890 1 2020-11-27 08:00:00 | android
636363636891 2 2020-09-02 09:00:00 | IOS
6625839827 4 2020-09-02 09:00:00 | IOS
6625839827 5 2020-09-02 08:00:00 | IOS
636363636890 8 2020-09-02 08:00:00 | Android
636363636890 1 2020-09-01 08:00:00 | Android
636363636890 2 2020-09-02 09:00:00 | Android
6625839828 2 2020-09-02 09:00:00 | Android
如果我没看错的话,你可以用distinct on
:
select distinct on (deviceserial)
deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device
from devices
group by deviceserial, hr, device
order by deviceserial, event_count desc
这会为您提供每个设备序列号发生最多事件的时间/设备。但是请注意,这不能正确处理关系(每个设备序列只给出一行)。如果你想允许顶级关系,你可以使用 rank()
代替:
select *
from (
select deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device,
rank() over(partition by deviceserial order by event_count desc) rn
from devices
group by deviceserial, hr, device
) t
where rn = 1
order by deviceserial
或者,在 Postgres 13 中:
select deviceserial,
count(*) as event_count,
date_trunc('hour', createdat) as hr,
device
from devices
group by deviceserial, hr, device
order by rank() over(partition by deviceserial order by event_count desc)
fetch first row with ties
您可以使用 window 函数 rank()
如下:
select * from
(select deviceserial,count(*) as event_count,
date_trunc('hour', createdat) as hr, device,
rank() over (partition by deviceserial order by count(*) desc) as rn
from devices
group by deviceserial,hr,device)
where rn = 1