避免在 UNION RESULT 上使用 "filesort"

Question

子查询1：

SELECT * from big_table
where category = 'fruits' and name = 'apple'
order by yyyymmdd desc

解释：

table       |   key           |   extra
big_table   |   name_yyyymmdd |   using where

看起来很棒！

子查询2：

SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
order by yyyymmdd desc

解释：

table       |   key               |   extra
big_table   |   category_yyyymmdd |   using where

看起来很棒！

现在，如果我将它们与 UNION 组合：

SELECT * from big_table
where category = 'fruits' and name = 'apple'

UNION

SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')

Order by yyyymmdd desc

解释：

table       |   key      |   extra
big_table   |   name     |   using index condition, using where
big_table   |   category |   using index condition
UNION RESULT|   NULL     |   using temporary; using filesort

不太好，它使用文件排序。

这是一个更复杂的查询的精简版，这里有一些关于 big_table:

的事实

big_table 有 10M + 行
有 5 个独特的 "category"s
有 5 个独特的 "taste"s
大约有 10,000 个唯一 "name"
大约有 10,000 个唯一 "yyyymmdd"
我已经在每个字段上创建了单个索引，加上复合 idx，例如 yyyymmdd_category_taste_name，但 Mysql 没有使用它。

Answer 1

这在没有 UNION 的情况下也必须有效

SELECT * from big_table
where 
    ( category = 'fruits' and name = 'apple' )
    OR
    ( category = 'fruits' and (taste = 'sweet' or wildcard = '*')
ORDER BY yyyymmdd desc;

Answer 2

SELECT * FROM big_table
    WHERE category = 'fruits'
      AND (  name = 'apple'
          OR taste = 'sweet'
          OR wildcard = '*' )
    ORDER BY yyyymmdd DESC

并且有 INDEX(catgory) 或一些索引 starting with category。但是，如果超过 20% 的 table 是 category = 'fruits'，则可能会决定忽略索引并简单地进行 table 扫描。（既然你说只有 5 个类别，我怀疑优化器会正确地避开索引。）

或者这个可能是有益的：INDEX(category, yyyymmdd)，在这个命令中。

UNION 必须进行排序（要么在内存中，要么在磁盘上，目前还不清楚），因为它无法按所需顺序获取行。

可能会使用复合索引 INDEX(yyyymmdd, ...) 来避免 'filesort'，但它不会使用 yyyymmdd 之后的任何列。

构建复合索引时，开始与任何WHERE列比较'='。之后，您可以添加一个范围或 group by 或 order by。 More details.

UNION 通常是避免缓慢 OR 的好选择，但在这种情况下，它需要三个索引

INDEX(category, name)
INDEX(category, taste)
INDEX(category, wildcard)

并且添加 yyyymmdd 将无济于事，除非您添加 LIMIT.

查询将是：

( SELECT * FROM big_table WHERE category = 'fruits' AND name = 'apple' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND taste = 'sweet' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND wildcard = '*' )
ORDER BY yyyymmdd DESC

添加限制会更加混乱。首先在三个复合索引的end上添加yyyymmdd，然后

( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
ORDER BY yyyymmdd DESC  LIMIT 10

添加 OFFSET 会更糟。

其他两种技术 -- "covering" 索引和 "lazy lookup" 可能会有所帮助，但我对此表示怀疑。

另一种技术是将所有单词放在同一列中并使用 FULLTEXT 索引。但由于多种原因，这可能会有问题。

避免在 UNION RESULT 上使用 "filesort"

Avoid "filesort" on UNION RESULT

mysql

indexing

query-optimization

filesort