如何改进索引内连接查询 Mysql？

Question

这是我在论坛上的第一个问题，如果我的问题有任何需要改进的地方，请随时告诉我。

我有一个大数据库，有两个 tables

"visit"（600 万行）基本上存储网站上的每次访问

    | visitdate           | city     |
    ----------------------------------
    | 2014-12-01 00:00:02 | Paris    |
    | 2015-01-03 00:00:02 | Marseille|

"cityweather"（100 万行）每天为许多城市存储 3 次天气信息

    | weatherdate           | city     |
    ------------------------------------
    | 2014-12-01 09:00:02   | Paris    |
    | 2014-12-01 09:00:02   | Marseille|

我明确指出 table 访问中可能存在不在 cityweather 中的城市，反之亦然，我只需要选择两个 table 共有的城市。

我首先有一个大查询，我尝试运行但失败了，因此我试图返回到连接这两个 table 的最简单的查询，但性能很糟糕。

SELECT COUNT(DISTINCT(t.city)) 
FROM visit t 
INNER JOIN cityweather d
ON t.city = d.city;

我准确地说，两个 table 都在列 city 上建立了索引，并且我已经在两个 table 上独立地执行了 COUNT(DISTINCT(city))，并且每个都花费了不到一秒钟的时间。

您可以在下面找到此查询 EXPLAIN 的结果：

    | id | select_type | table | type  | possible_keys | key      | key_len | ref          | rows         | Extra                    |
    ----------------------------------
    | 1  |  SIMPLE     | d     | index | idx_city      | idx_city | 303     | NULL         | 1190553      | Using where; Using index |
    | 1  |  SIMPLE     | t     | ref   | Idxcity       | Idxcity  | 303     | meteo.d.city | 465          | Using index              |

您会在 table 下方找到信息，尤其是两个 table 的引擎：

访问

    | Name  | Engine | Version | Row_Format | Rows    | Avg_row_len | Data_len  | Max_data_len | Index_len | Data_free |
    --------------------------------------------------------------------------------------------------------------------
    | visit | InnoDB | 10      | Compact    | 6208060 | 85          | 531628032 | 0            | 0         | 0         |

SHOW CREATE TABLE output :

    CREATE TABLE
`visit` (
`productid` varchar(8) DEFAULT NULL,
`visitdate` datetime DEFAULT NULL,
`minute` int(2) DEFAULT NULL,
`hour` int(2) DEFAULT NULL,
`weekday` int(1) DEFAULT NULL,
`quotation` int(10) unsigned DEFAULT NULL,
`amount` int(10) unsigned DEFAULT NULL,
`city` varchar(100) DEFAULT NULL,
`weathertype` varchar(30) DEFAULT NULL,
`temp` int(11) DEFAULT NULL,
`pressure` int(11) DEFAULT NULL,
`humidity` int(11) DEFAULT NULL,
KEY `Idxvisitdate` (`visitdate`),
KEY `Idxcity` (`city`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

花旗天气

    | Name        | Engine | Version | Row_Format | Rows    | Avg_row_len | Data_len  | Max_data_len | Index_len | Data_free |
    ------------------------------------------------------------------------------------------------------------------------------
    | cityweather | InnoDB | 10      | Compact    | 1190553 | 73          | 877670784 | 0            | 0         | 30408704  |

SHOW CREATE TABLE output :

CREATE TABLE `cityweather` (
`city` varchar(100) DEFAULT NULL,
`lat` decimal(13,9) DEFAULT NULL,
`lon` decimal(13,9) DEFAULT NULL,
`weatherdate` datetime DEFAULT NULL,
`temp` int(11) DEFAULT NULL,
`pressure` int(11) DEFAULT NULL,
`humidity` int(11) DEFAULT NULL,
KEY `Idxweatherdate` (`weatherdate`),
KEY `idx_city` (`city`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

我感觉问题出在 type = index 和 ref = NULL 但我不知道如何解决...

You can find here a close question that did not help me solve my problem

谢谢！

Answer 1

您的查询速度很慢，因为您使用的索引无法将行数减少到更快的数量。查看您的 EXPLAIN 输出：它告诉您在 table cityweather 中使用城市 (idx_city) 的索引将需要 1.190.553 行来处理。通过 city 加入您的 visit table 将再次需要来自 table.

的 465 行

因此，您的数据库必须处理 1.190.553 x 465 行。

由于您的查询是您无法提高其性能。但是您可以修改查询，例如通过在访问数据上添加条件来缩小结果范围。也尝试各种 EXISTS 查询。

更新

也许这有帮助：

CREATE TEMPORARY TABLE tmpTbl 
SELECT distinct city as city from cityweather;

ALTER TABLE tmpTbl Add index adweerf (city);

SELECT COUNT(DISTINCT(city)) FROM visit WHERE city in (SELECT city from tmpTbl);

Answer 2

由于IN ( SELECT ... )优化不好，改

SELECT COUNT(DISTINCT(city)) FROM visit WHERE city in (SELECT city from tmpTbl);

到

SELECT COUNT(*)
    FROM ( SELECT DISTINCT city FROM cityweather ) x
    WHERE EXISTS( SELECT * FROM visit
                   WHERE city = x.city );

table 都需要（并且拥有）city 上的索引。我很确定将较小的 table (cityweather) 放在 SELECT DISTINCT.

中会更好

其他要点：

每个 InnoDB table 确实应该有一个 PRIMARY KEY。
您可以通过使用 TINYINT UNSIGNED（1 字节）等而不是始终使用 4 字节 INT 来节省很多 space。
lat/lng 的 9 位小数对于城市来说过多，占用 12 个字节。我投票支持 DECIMAL(4,2)/(5,2)（1.6km / 1mi 分辨率；5 字节）或 DECIMAL(6,4)/(7,4)（16m/52ft，7 字节）。

如何改进索引内连接查询 Mysql？

How to improve an indexed inner join query Mysql?

mysql

indexing

join

query-performance