Python sqlite3 SQL 查询获取所有具有最新日期但限制每个唯一列的条目

Question

我有一个叫 'fileEvents' 的 table。它有四列（还有更多但与问题无关）：id、fileId、action 和 time。

相同的 fileId、action 和 time 值可以出现在多行中。

我想要的查询很简单，但我想不出一个有效的查询：为每个 fileId 获取自特定时间以来的最新条目。

我尝试了以下方法。

首先，我将尝试让所有条目从特定时间开始按时间排序：

SELECT * FROM `fileEvents` ORDER BY `time` DESC WHERE `time` < 1000

结果当然很好(id, action, fileId, time):

[(6, 0, 3, 810), (5, 0, 3, 410), (2, 0, 1, 210), (3, 0, 2, 210), (4, 0, 3, 210), (1, 0, 1, 200)]

所以一切都安排好了。但现在我只想要唯一的 fileIds. So I add a GROUP BYfileId`:

SELECT * FROM `fileEvents` GROUP BY `fileId` ORDER BY `time` DESC WHERE `time` < 1000

这当然是错误的。因为首先它会对结果进行分组，然后对它们进行排序，但是它们已经分组所以没有排序：

[(3, 0, 2, 210), (4, 0, 3, 210), (1, 0, 1, 200)]

当我尝试反转 GROUP BY 和 ORDER BY 时，我得到一个 OperationalError: near "GROUP": syntax error

此外，当我尝试执行子查询时，我首先获取排序列表，然后将它们分组，结果是错误的：

SELECT * FROM `fileEvents` WHERE `id` IN (
SELECT `id` FROM `fileEvents` ORDER BY `time` DESC WHERE `time` < 1000
) GROUP BY `fileId`

结果（错误）：

[(1, 0, 1, 200), (3, 0, 2, 210), (4, 0, 3, 210)]

我要找的结果是：

[(6, 0, 3, 810), (2, 0, 1, 210), (3, 0, 2, 210)]

有谁知道我怎样才能得到我想要的结果？我错过了什么？非常感谢！

Answer 1

这个 top-1-per-group 问题的典型解决方案是使用相关子查询进行过滤：

select fe.* 
from fileevents fe
where fe.time = (
    select max(fe1.time) 
    from fileevents fe1 
    where fe1.fileid = fe.fileid and fe1.time < 1000
)

为了提高此查询的性能，您需要在 (fileid, time) 上建立索引。

Answer 2

具有ROW_NUMBER()window功能：

select * -- replace * with the columns that you want in the result
from (
  select *, row_number() over (partition by fileid order by time desc) rn
  from fileevents 
  where time < 1000
) t
where rn = 1

Python sqlite3 SQL 查询获取所有具有最新日期但限制每个唯一列的条目

Python sqlite3 SQL query Get all entries with newest date but limit per single unique column

sql

sqlite

sql-order-by

greatest-n-per-group

python-3.x