查找每个 属性 值的行数,仅考虑 SQL 中的最新行

Find number of rows with each property value, taking into account only the most recent rows in SQL

我有一个数据库,其中 tables 表示对“页面”的“编辑”。每个编辑都有一个 ID 和一个时间戳以及一个“状态”,它具有某些离散值。页面有 ID,也有“类别”。

我希望找到给定类别中具有每种状态的页面数量,仅考虑最近编辑时的状态。

编辑:

+---------+---------+-----------+--------+
| edit_id | page_id | edit_time | status |
+---------+---------+-----------+--------+
| 1       | 10      | 20210502  | 90     |
| 2       | 10      | 20210503  | 91     |
| 3       | 20      | 20210504  | 91     |
| 4       | 30      | 20210504  | 90     |
| 5       | 30      | 20210505  | 92     |
| 6       | 40      | 20210505  | 90     |
| 7       | 50      | 20210503  | 90     |
+---------+---------+-----------+--------+

页数:

+---------+--------+
| page_id | cat_id |
+---------+--------+
| 10      | 100    |
| 20      | 100    |
| 30      | 100    |
| 40      | 200    |
+---------+--------+

我想得到,类别100:

+--------+-------+
| stat   | count |
+--------+-------+
| 90     | 1     |
| 91     | 2     |
| 92     | 1     |
+--------+-------+

页面 1030 有两次编辑,但后面的一次“覆盖”了第一次,所以只有状态为 9192 的编辑是算了。页面 2040 分别占 9190 之一,页面 50 属于错误的类别,因此没有特色。

我试过以下方法,但似乎不起作用。这个想法是 select 最大(即最新)编辑每个页面的正确类别。然后将其加入编辑 table 并按状态分组并计算行数:

SELECT stat, COUNT(*)
FROM edits as out_e
INNER JOIN (
    SELECT edit_id, page_id, max(edit_time) as last_edit
    FROM edits
    INNER JOIN pages on edit_page_id = page_id
    WHERE cat_id = 100
    GROUP BY page_id
) in_e ON out_e.edit_id = in_e.edit_id
GROUP BY stat
ORDER BY stat;
"""

例如在这个fiddle中:http://sqlfiddle.com/#!9/42f2ed/1

结果是:

+--------+-------+
| stat   | count |
+--------+-------+
| 90     | 3     |
| 91     | 1     |
+--------+-------+

获取此信息的正确方法是什么?

SELECT cat_id, stat, COUNT(*) cnt
FROM pages
JOIN edits ON pages.page_id = edits.edit_page_id
JOIN ( SELECT edit_page_id, MAX(edit_time) edit_time
       FROM edits
       GROUP BY edit_page_id ) last_time ON edits.edit_page_id = last_time.edit_page_id
                                        AND edits.edit_time = last_time.edit_time
GROUP BY cat_id, stat

输出:

cat_id stat cnt
100 90 1
100 91 2
100 92 1
200 90 1

https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=7592c7853481f6b5a9626c8d111c1d3b(查询适用于MariaDB 10.1)


Is it possible to join on the edit_id (which is unique key for each edit)? – Inductiveload

不,这不可能。 cnt=2 计算两个不同的 edit_id 值 - 必须使用什么值?

但您可以获得串联值列表 - 只需将 GROUP_CONCAT(edit_id) 添加到输出列表中即可。

https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=b2391972c3f7c4be4254e47514d0f1da

select e1.stat, count(e1.stat) as count 
from edits e1
join (
    select edit_page_id, max(edit_time) as edit_time 
    from edits
    where edit_page_id in (
        select page_id 
        from pages 
        where cat_id = 100
    )
    group by edit_page_id
) as e2
on e1.edit_page_id = e2.edit_page_id and e1.edit_time = e2.edit_time
group by e1.stat;

这是 link 到 fiddle - http://sqlfiddle.com/#!9/42f2ed/40/0

编辑:已更新以考虑 edit_time 而不是 stat 来查找最新记录

认为您不需要第二个连接 - 查看查询是否有帮助。

select
t1.stat, count(*) count_
from
(
SELECT 
  e.edit_id, p.page_id, e.stat,
  rank() over(partition by e.edit_page_id order by e.edit_time desc) edit_rank
FROM 
  edits e
  INNER JOIN pages p on e.edit_page_id = p.page_id
WHERE 
  p.cat_id = 100
) t1
where
t1.edit_rank = 1
group by
t1.stat

fiddle url : (https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=0f681dc8d93cc3eebf9a03e0c8d84850)