从 Postgres 记录中的行中获取最大值并按多列分组

Get the maximum value from rows in Postgres records and group by multiple columns

我有一个 table 这样的:

p_id |      createdat      | pagetitle | sessionid |      text       | device  | deviceserial
------+---------------------+-----------+-----------+-----------------+---------+--------------
      | 2020-11-27 08:07:39 |           |           | App launch      | android | 636363636890
      | 2020-09-01 08:08:18 |           |           | search          | Android | 636363636890
      | 2020-09-02 08:10:10 |           |           | scan            | Android | 636363636890
      | 2020-09-02 08:12:10 |           |           | destroy         | Android | 636363636890
      | 2020-09-02 08:40:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:45:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:43:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:50:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:47:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:53:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:50:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:55:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:52:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:00:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:55:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:05:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:59:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:07:11 |           |           | hi              | Android | 6625839827
      | 2020-09-02 09:01:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:09:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 09:03:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:09:11 |           |           | hi              | Android | 6625839828
      | 2020-09-02 09:03:10 |           |           | launchComponent | IOS     | 636363636891
      | 2020-09-02 09:13:11 |           |           | hi              | Android | 6625839828
      | 2020-09-02 09:06:10 |           |           | launchComponent | IOS     | 636363636891

由此table,我想达到这样的效果:

deviceserial | event_count |         hr             device
--------------+-------------+---------------------+---------------------
 6625839828   |           2 | 2020-09-02 09:00:00 |Android
 636363636890 |           8 | 2020-09-02 08:00:00 |Android
 636363636891 |           2 | 2020-09-02 09:00:00 |IOS
 6625839827   |           5 | 2020-09-02 08:00:00 |IOS
 

这是我的 steps:I 按记录分组的 deviceserial,每小时作为 hr 和 device 和计数最大值(event_count)。

我试过这个查询:

select deviceserial,max(event_count) as event_count,hr,device
from (
    select deviceserial,count(*) as event_count,
        date_trunc('hour', createdat) as hr,device
    from devices  
    group by deviceserial,hr,device
) t
group by deviceserial,hr,device

这是我的结果:

 deviceserial | event_count |         hr            device
--------------+-------------+---------------------+---------------------
636363636890      1          2020-11-27 08:00:00  |        android
636363636891      2          2020-09-02 09:00:00  |        IOS
6625839827        4          2020-09-02 09:00:00  |        IOS
6625839827        5          2020-09-02 08:00:00  |        IOS
636363636890      8          2020-09-02 08:00:00  |       Android
636363636890      1          2020-09-01 08:00:00  |       Android
636363636890      2          2020-09-02 09:00:00  |       Android
6625839828        2          2020-09-02 09:00:00  |       Android

如果我没看错的话,你可以用distinct on:

select distinct on (deviceserial) 
    deviceserial,
    count(*) as event_count,
    date_trunc('hour', createdat) as hr,
    device
from devices  
group by deviceserial, hr, device
order by deviceserial, event_count desc

这会为您提供每个设备序列号发生最多事件的时间/设备。但是请注意,这不能正确处理关系(每个设备序列只给出一行)。如果你想允许顶级关系,你可以使用 rank() 代替:

select *
from (
    select deviceserial,
        count(*) as event_count,
        date_trunc('hour', createdat) as hr,
        device,
        rank() over(partition by deviceserial order by event_count desc) rn
    from devices  
    group by deviceserial, hr, device
) t
where rn = 1
order by deviceserial

或者,在 Postgres 13 中:

select deviceserial,
    count(*) as event_count,
    date_trunc('hour', createdat) as hr,
    device
from devices  
group by deviceserial, hr, device
order by rank() over(partition by deviceserial order by event_count desc)
fetch first row with ties

您可以使用 window 函数 rank() 如下:

select * from
(select deviceserial,count(*) as event_count,
        date_trunc('hour', createdat) as hr, device,
        rank() over (partition by deviceserial order by count(*) desc) as rn
    from devices  
    group by deviceserial,hr,device)
where rn = 1