在 WHERE 子句中使用 OR 进行慢速 JOIN 查询 - 缺少可能的索引?

Slow JOIN Query with OR in WHERE Clause - Missing Possible Indexes?

我正在尝试检索关于属于特定用户的 "case" 的分页列表和 "notifications" 的总数。

通知有几个条件 "not locked"、"not private"、"not already seen",应该返回 # found,然后按创建日期降序排列。

最后一个条件是通知不是用户自己创建的,或者通知是"conduct"类型(枚举)并且通知中引用了user_id"ref_id"

针对 recent_changes 中的 200k 行和 cases 中的不到 4k 行以及 50 个用户,此查询 运行 花费了 5 秒。

+-----+
| cnt |
+-----+
|  13 |
+-----+
1 row in set (4.67 sec)

这个查询可以自己优化,还是需要重组?

SELECT count(*) as cnt
 FROM recent_changes rc 
 LEFT JOIN `case` c on c.id = rc.case_id 
 LEFT JOIN `user` u on u.id = rc.user_id
 WHERE 
 (
   rc.user_id != c.user_id AND c.user_id = '25'
   OR
   (rc.type = 'conduct' AND rc.ref_id = '25')
 )
 AND c.locked = 'N'  AND rc.private != 'Y' 
 AND seen = 'false'
 ORDER BY rc.datecreated DESC;

解释输出

+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
| id | select_type | table | type   | possible_keys            | key                     | key_len | ref                      | rows | Extra                        |
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
|  1 | SIMPLE      | c     | ALL    | PRIMARY,user_user_id_idx | NULL                    | NULL    | NULL                     | 3699 | Using where; Using temporary |
|  1 | SIMPLE      | rc    | ref    | idx_recent_changes_case  | idx_recent_changes_case | 5       | xxxxxxxxxxxxx.c.id       |   25 | Using where                  |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                  | PRIMARY                 | 4       | xxxxxxxxxxxxx.rc.user_id |    1 | Using index                  |
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+

recent_changes 上的索引:

+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table          | Non_unique | Key_name                     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| recent_changes |          0 | PRIMARY                      |            1 | id          | A         |      182807 |     NULL | NULL   |      | BTREE      |         |
| recent_changes |          1 | recent_changes_user_id_idx   |            1 | user_id     | A         |          96 |     NULL | NULL   | YES  | BTREE      |         |
| recent_changes |          1 | idx_recent_changes_user_case |            1 | user_id     | A         |          92 |     NULL | NULL   | YES  | BTREE      |         |
| recent_changes |          1 | idx_recent_changes_user_case |            2 | case_id     | A         |       18280 |     NULL | NULL   | YES  | BTREE      |         |
| recent_changes |          1 | idx_recent_changes_case      |            1 | case_id     | A         |        7312 |     NULL | NULL   | YES  | BTREE      |         |
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

索引 case table:

+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| case  |          0 | PRIMARY          |            1 | id                  | A         |        3753 |     NULL | NULL   |      | BTREE      |         |
| case  |          1 | id_idx           |            1 | member_id           | A         |        3753 |     NULL | NULL   | YES  | BTREE      |         |
| case  |          1 | user_user_id_idx |            1 | user_id             | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| case  |          1 | case_ha_id       |            1 | health_authority_id | A         |          28 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+

它在概念上做了以下工作:

在 recent_changes 中查找最近的行,其中:

i) recent_changes 行连接到 case_id 的 case table,当前 user_id 拥有 ii) 并且 recent_changes 行不是由当前 user_id

创建的

i) recent_changes 行是 "conduct" 类型,当前 user_id 在 recent_changes.ref_id 列

如果我删除 "OR (rc.type = 'conduct' AND rc.ref_id = '25')" 条件,那么我的响应时间将小于 1 秒。

如果我删除 "rc.user_id != c.user_id AND c.user_id = '25' OR" 条件,它仍然需要大约 5 秒才能完成。


编辑

更改加入顺序缩短了 1/2 秒,尽管我无法在 rc 上加入 case。case_id 直到我首先加入 rc : 'where clause'.

中的未知列 'rc.user_id'

新查询:

SELECT count(*) as cnt
FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE 
(
    rc.user_id != c.user_id AND c.user_id = '25'
    OR
    (rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false'
ORDER BY rc.datecreated DESC;

删除 "ORDER BY" 子句似乎不会增加新的连接顺序查询,尽管我现在更清楚它对性能的影响。

使用 UNION 并没有更快,但是 运行 分别 select 指出第一个 SELECT 只需要 .3s 而第二个 select 是超过 4 秒:

select count(*) as cnt
FROM (
SELECT count(*) FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE rc.user_id != c.user_id AND c.user_id = '25'
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false'
UNION ALL
SELECT count(*) as cnt
FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE rc.type = 'conduct' AND rc.ref_id = '25'
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false') x

我认为 recent_changes rc table 没有必要的索引,如 EXPLAIN:

EXPLAIN SELECT count(*) FROM `user` u  LEFT JOIN `recent_changes` rc on u.id = rc.user_id  LEFT JOIN `case` c on c.id = rc.case_id  WHERE rc.user_id != c.user_id AND c.user_id = '25' AND c.locked = 'N'  AND rc.private != 'Y'  AND seen = 'false';

运行时间 < .5 秒

+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                                                   | key                     | key_len | ref                      | rows | Extra       |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | c     | ref    | PRIMARY,user_user_id_idx                                                        | user_user_id_idx        | 5       | const                    |  383 | Using where |
|  1 | SIMPLE      | rc    | ref    | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case | idx_recent_changes_case | 5       | hsaedmp_jason.c.id       |   20 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                                                                         | PRIMARY                 | 4       | hsaedmp_jason.rc.user_id |    1 | Using index |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+

运行时间 > 4 秒

EXPLAIN SELECT count(*) as cnt FROM `user` u  LEFT JOIN `recent_changes` rc on u.id = rc.user_id  LEFT JOIN `case` c on c.id = rc.case_id  WHERE rc.type = 'conduct' AND rc.ref_id = '25' AND c.locked = 'N'  AND rc.private != 'Y'  AND seen = 'false';

Key = NULL 这不好。

+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                                                   | key                     | key_len | ref                      | rows | Extra       |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | c     | ALL    | PRIMARY                                                                         | NULL                    | NULL    | NULL                     | 3797 | Using where |
|  1 | SIMPLE      | rc    | ref    | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case | idx_recent_changes_case | 5       | hsaedmp_jason.c.id       |   20 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                                                                         | PRIMARY                 | 4       | hsaedmp_jason.rc.user_id |    1 | Using index |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+

我很困惑 EXPLAIN 显示 case table 没有使用密钥,但 recent_changes table 似乎是那个需要在 ref_id 列上有一个 INDEX?

这是对那个索引的解释,在这里看起来好多了,但我还没有能够在生产中测试它。

+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys
   | key                    | key_len | ref                      | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
|  1 | SIMPLE      | rc    | NULL       | ref    | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case,idx_recent_changes_case_date,idx_recent_changes_r
ef | idx_recent_changes_ref | 5       | const                    | 2096 |     3.12 | Using where |
|  1 | SIMPLE      | u     | NULL       | eq_ref | PRIMARY
   | PRIMARY                | 4       | hsaedmp_jason.rc.user_id |    1 |   100.00 | Using index |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY
   | PRIMARY                | 4       | hsaedmp_jason.rc.case_id |    1 |    50.00 | Using where |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+

更新

我使用 UNION 语句修改了查询,更改了 JOIN 顺序并在 recent_changes table 上添加了复合索引,从而使查询响应时间小于 10 毫秒。

这是使用 UNION 语句的新查询。

select count(*) as num
FROM (
(
SELECT rc1.*
FROM `user` u1 
LEFT JOIN `recent_changes` rc1 on u1.id = rc1.user_id 
LEFT JOIN `case` c1 on c1.id = rc1.case_id 
WHERE 
(rc1.user_id != c1.user_id AND c1.user_id = '1')
AND c1.locked = 'Y'
AND rc1.private != 'Y' 
AND seen = 'false'
ORDER BY rc1.datecreated DESC
)
UNION
(
SELECT rc.*
FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE
(rc.type = 'conduct' AND rc.ref_id = '1')
AND c.locked = 'Y'
AND rc.private != 'Y' 
AND seen = 'false'
ORDER BY rc.datecreated DESC
)
) x;

以及我根据需要的最终查询创建的索引。

ALTER TABLE recent_changes ADD INDEX idx_recent_changes_notification (type, ref_id, private, seen, user_id);

感谢大家的意见!

较小的 table 应该放在连接子句的第一个。 这取决于 table 中有多少条记录。我认为您的用户 table 是最小的。所以放在第一位。看来 'rc' table 是最大的。你应该把它放在加入的最后。

举个例子。

SELECT count(*) as cnt
FROM `user` u 
LEFT JOIN `case` c on c.id = rc.case_id 
LEFT JOIN `recent_changes` on u.id = rc.user_id 
WHERE 
(
    rc.user_id != c.user_id AND c.user_id = '25'
    OR
    (rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false'
ORDER BY rc.datecreated DESC;

另请参阅下文 post。这是 mssql 的东西,但几乎所有的 DBMS 在这里都有相同的点

https://www.mssqltips.com/sqlservertutorial/3201/how-join-order-can-affect-the-query-plan/

更新

我查看了你的问题,发现了另一个可疑的问题,它是关于 order by clause 的。 随着查询返回的行数越来越多,'order by' 的时间成本将急剧增加。根据我的经验,这是一个常见的问题。您是否尝试过删除 order by 子句?是不是快很多了?