查询三个连接非常慢
Query with three join incredibly slow
我正在尝试 return 所有拥有足球 matches
的国家/地区都在特定 date
比赛。数据在以下 table 中定义:
比赛
id | country_id | name
50 1 Premier League
competition_seasons
id | competition_id | name
70 50 2019
competition_rounds
id | season_id | name
58 70 Regular Season
匹配
id | round_id | home | away | result | datetime
44 58 22 87 1 - 0 2019-03-16:00:00
competition
table 中存储了不同的比赛,然后每个比赛可以有多个 season
存储在 competition_seasons
中。一个season
也可以有不同的比赛rounds
,这些都存储在competition_rounds
.
所有 matches
都存储在 match
table 中并分组为 round_id
.
我为API写了这个方法:
$app->get('/country/get_countries/{date}', function (Request $request, Response $response, array $args)
{
$start_date = $args["date"] . " 00:00";
$end_date = $args["date"] . " 23:59";
$sql = $this->db->query("SELECT n.* FROM country n
LEFT JOIN competition c ON c.country_id = n.id
LEFT JOIN competition_seasons s ON s.competition_id = c.id
LEFT JOIN competition_rounds r ON r.season_id = s.id
LEFT JOIN `match` m ON m.round_id = r.id
WHERE m.datetime BETWEEN '" . $start_date . "' AND '" . $end_date . "'
GROUP BY n.id");
$sql->execute();
$countries = $sql->fetchAll();
return $response->withJson($countries);
});
有数千条记录按 id 组织,但查询花了大约 6、7 秒才能 return 所有在指定日期播放的 countries
。
如何优化这个过程?
性能
更新
我注意到一件有趣的事情,如果我这样做的话:
SELECT round_id, DATE("2019-03-18") FROM `match`
查询速度非常快,所以我猜想 datetime
字段会减慢连接部分的速度,您知道吗?
Table结构
CREATE TABLE IF NOT EXISTS `swp`.`competition` (
`id` INT NOT NULL,
`country_id` INT NULL,
`name` VARCHAR(255) NULL,
`category` INT NULL,
PRIMARY KEY (`id`),
INDEX `id_idx` (`country_id` ASC),
INDEX `FK_competition_types_competition_type_id_idx` (`category` ASC),
CONSTRAINT `FK_country_competition_country_id`
FOREIGN KEY (`country_id`)
REFERENCES `swp`.`country` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_categories_competition_category_id`
FOREIGN KEY (`category`)
REFERENCES `swp`.`competition_categories` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `swp`.`competition_seasons` (
`id` INT NOT NULL AUTO_INCREMENT,
`competition_id` INT NOT NULL,
`season_id` INT NULL,
`name` VARCHAR(45) NOT NULL,
`update_at` DATETIME NULL,
PRIMARY KEY (`id`),
INDEX `FK_competition_competition_seasons_competition_id_idx` (`competition_id` ASC),
CONSTRAINT `FK_competition_competition_seasons_competition_id`
FOREIGN KEY (`competition_id`)
REFERENCES `swp`.`competition` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `swp`.`competition_rounds` (
`id` INT NOT NULL AUTO_INCREMENT,
`round_id` INT NULL,
`season_id` INT NOT NULL,
`name` VARCHAR(255) NULL,
PRIMARY KEY (`id`),
INDEX `FK_competition_seasons_competition_rounds_season_id_idx` (`season_id` ASC),
CONSTRAINT `FK_competition_seasons_competition_rounds_season_id`
FOREIGN KEY (`season_id`)
REFERENCES `swp`.`competition_seasons` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `swp`.`match`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `swp`.`match` (
`id` INT NOT NULL,
`round_id` INT NOT NULL,
`group_id` INT NULL,
`datetime` DATETIME NULL,
`status` INT NULL,
`gameweek` INT NULL,
`home_team_id` INT NULL,
`home_team_half_time_score` INT NULL,
`home_team_score` INT NULL,
`home_extra_time` INT NULL,
`home_penalties` INT NULL,
`away_team_id` INT NULL,
`away_team_half_time_score` INT NULL,
`away_team_score` INT NULL,
`away_extra_time` INT NULL,
`away_penalties` INT NULL,
`venue_id` INT NULL,
`venue_attendance` INT NULL,
`aggregate_match_id` INT NULL,
PRIMARY KEY (`id`),
INDEX `home_team_id_idx` (`home_team_id` ASC),
INDEX `away_team_id_idx` (`away_team_id` ASC),
INDEX `venue_id_idx` (`venue_id` ASC),
INDEX `match_status_id_idx` (`status` ASC),
INDEX `FK_competition_rounds_match_round_id_idx` (`round_id` ASC),
INDEX `FK_match_match_aggregate_match_id_idx` (`aggregate_match_id` ASC),
INDEX `FK_competition_groups_match_group_id_idx` (`group_id` ASC),
CONSTRAINT `FK_team_match_home_team_id`
FOREIGN KEY (`home_team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_team_match_away_team_id`
FOREIGN KEY (`away_team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_venue_match_venue_id`
FOREIGN KEY (`venue_id`)
REFERENCES `swp`.`venue` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_status_match_status_id`
FOREIGN KEY (`status`)
REFERENCES `swp`.`match_status` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_rounds_match_round_id`
FOREIGN KEY (`round_id`)
REFERENCES `swp`.`competition_rounds` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_match_aggregate_match_id`
FOREIGN KEY (`aggregate_match_id`)
REFERENCES `swp`.`match` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_groups_match_group_id`
FOREIGN KEY (`group_id`)
REFERENCES `swp`.`competition_groups` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
首先,将查询写成:
SELECT n.*
FROM country n JOIN
competition c
ON c.country_id = n.id JOIN
competition_seasons s
ON s.competition_id = c.id JOIN
competition_rounds r
ON r.season_id = s.id JOIN
`match` m
ON m.round_id = r.id
WHERE m.datetime >= ? AND
m.datetime < ?
GROUP BY n.id;
这里的改动比较小,不会影响性能。但它们很重要:
JOIN
而不是 LEFT JOIN
,因为您要求条件匹配。
- 日期参数而不是修改查询字符串,因为这是个好主意。
>=
和 <
用于比较,因为这适用于日期和日期时间。您需要将结束日期增加 1 天 -- 但不要包含时间部分。
然后,为了性能,你需要索引:
match(datetime, round_id)
competition_rounds(id, season_id)
competition_seasons(id, competition_id)
competition(id, country_id)
country(id)
其实第一个是最重要的。如果相应的 id
列被声明为主键,则不需要最后四个。
对于 LEFT JOIN
,查询只能从上到下执行,这意味着最后 table 会扫描前 table 中条目的每个产品。此外,在没有任何聚合的情况下使用 LEFT JOIN
和 GROUP BY
是没有意义的,因为它总是 return 所有国家/地区 ID。话虽如此,我会这样重写它:
SELECT DISTINCT
c.country_id
FROM
competition c,
WHERE
EXISTS (
SELECT
*
FROM
competition_seasons s,
competition_rounds r,
`match` m
WHERE
s.competition_id = c.id
AND r.season_id = s.id
AND m.round_id = r.id
AND m.datetime BETWEEN ...
)
这将被我所知道的所有 RDB 正确优化。
请注意,(match.datetime, match.round_id)
上的 2 列索引 - 按此顺序会对性能产生巨大影响。或者写入速度是一个问题,建议至少在 (match.datetime)
上使用单个列索引。
关于字符串索引的重要说明:字符串比较在 RDB 中总是古怪的。确保对日期时间列使用二进制排序规则或使用本机 DATETIME 格式。各种 RDB 可能无法在不区分大小写的列上使用索引。
请注意,我删除了 n 上的连接 - 只是添加另一个 PK 查找以检查该国家/地区是否仍然存在于 table 国家/地区。如果您没有任何 ON DELETE CASCADE 或其他类型的确保数据一致性的约束,您可以将其添加回去,如下所示:
SELECT DISTINCT
n.id
FROM
country n
WHERE
EXISTS (
SELECT
*
FROM
competition c,
competition_seasons s,
competition_rounds r,
`match` m
WHERE
c.country_id=n.id
AND s.competition_id = c.id
AND r.season_id = s.id
AND m.round_id = r.id
AND m.datetime BETWEEN ...
)
我正在尝试 return 所有拥有足球 matches
的国家/地区都在特定 date
比赛。数据在以下 table 中定义:
比赛
id | country_id | name
50 1 Premier League
competition_seasons
id | competition_id | name
70 50 2019
competition_rounds
id | season_id | name
58 70 Regular Season
匹配
id | round_id | home | away | result | datetime
44 58 22 87 1 - 0 2019-03-16:00:00
competition
table 中存储了不同的比赛,然后每个比赛可以有多个 season
存储在 competition_seasons
中。一个season
也可以有不同的比赛rounds
,这些都存储在competition_rounds
.
所有 matches
都存储在 match
table 中并分组为 round_id
.
我为API写了这个方法:
$app->get('/country/get_countries/{date}', function (Request $request, Response $response, array $args)
{
$start_date = $args["date"] . " 00:00";
$end_date = $args["date"] . " 23:59";
$sql = $this->db->query("SELECT n.* FROM country n
LEFT JOIN competition c ON c.country_id = n.id
LEFT JOIN competition_seasons s ON s.competition_id = c.id
LEFT JOIN competition_rounds r ON r.season_id = s.id
LEFT JOIN `match` m ON m.round_id = r.id
WHERE m.datetime BETWEEN '" . $start_date . "' AND '" . $end_date . "'
GROUP BY n.id");
$sql->execute();
$countries = $sql->fetchAll();
return $response->withJson($countries);
});
有数千条记录按 id 组织,但查询花了大约 6、7 秒才能 return 所有在指定日期播放的 countries
。
如何优化这个过程?
性能
更新
我注意到一件有趣的事情,如果我这样做的话:
SELECT round_id, DATE("2019-03-18") FROM `match`
查询速度非常快,所以我猜想 datetime
字段会减慢连接部分的速度,您知道吗?
Table结构
CREATE TABLE IF NOT EXISTS `swp`.`competition` (
`id` INT NOT NULL,
`country_id` INT NULL,
`name` VARCHAR(255) NULL,
`category` INT NULL,
PRIMARY KEY (`id`),
INDEX `id_idx` (`country_id` ASC),
INDEX `FK_competition_types_competition_type_id_idx` (`category` ASC),
CONSTRAINT `FK_country_competition_country_id`
FOREIGN KEY (`country_id`)
REFERENCES `swp`.`country` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_categories_competition_category_id`
FOREIGN KEY (`category`)
REFERENCES `swp`.`competition_categories` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `swp`.`competition_seasons` (
`id` INT NOT NULL AUTO_INCREMENT,
`competition_id` INT NOT NULL,
`season_id` INT NULL,
`name` VARCHAR(45) NOT NULL,
`update_at` DATETIME NULL,
PRIMARY KEY (`id`),
INDEX `FK_competition_competition_seasons_competition_id_idx` (`competition_id` ASC),
CONSTRAINT `FK_competition_competition_seasons_competition_id`
FOREIGN KEY (`competition_id`)
REFERENCES `swp`.`competition` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `swp`.`competition_rounds` (
`id` INT NOT NULL AUTO_INCREMENT,
`round_id` INT NULL,
`season_id` INT NOT NULL,
`name` VARCHAR(255) NULL,
PRIMARY KEY (`id`),
INDEX `FK_competition_seasons_competition_rounds_season_id_idx` (`season_id` ASC),
CONSTRAINT `FK_competition_seasons_competition_rounds_season_id`
FOREIGN KEY (`season_id`)
REFERENCES `swp`.`competition_seasons` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `swp`.`match`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `swp`.`match` (
`id` INT NOT NULL,
`round_id` INT NOT NULL,
`group_id` INT NULL,
`datetime` DATETIME NULL,
`status` INT NULL,
`gameweek` INT NULL,
`home_team_id` INT NULL,
`home_team_half_time_score` INT NULL,
`home_team_score` INT NULL,
`home_extra_time` INT NULL,
`home_penalties` INT NULL,
`away_team_id` INT NULL,
`away_team_half_time_score` INT NULL,
`away_team_score` INT NULL,
`away_extra_time` INT NULL,
`away_penalties` INT NULL,
`venue_id` INT NULL,
`venue_attendance` INT NULL,
`aggregate_match_id` INT NULL,
PRIMARY KEY (`id`),
INDEX `home_team_id_idx` (`home_team_id` ASC),
INDEX `away_team_id_idx` (`away_team_id` ASC),
INDEX `venue_id_idx` (`venue_id` ASC),
INDEX `match_status_id_idx` (`status` ASC),
INDEX `FK_competition_rounds_match_round_id_idx` (`round_id` ASC),
INDEX `FK_match_match_aggregate_match_id_idx` (`aggregate_match_id` ASC),
INDEX `FK_competition_groups_match_group_id_idx` (`group_id` ASC),
CONSTRAINT `FK_team_match_home_team_id`
FOREIGN KEY (`home_team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_team_match_away_team_id`
FOREIGN KEY (`away_team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_venue_match_venue_id`
FOREIGN KEY (`venue_id`)
REFERENCES `swp`.`venue` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_status_match_status_id`
FOREIGN KEY (`status`)
REFERENCES `swp`.`match_status` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_rounds_match_round_id`
FOREIGN KEY (`round_id`)
REFERENCES `swp`.`competition_rounds` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_match_aggregate_match_id`
FOREIGN KEY (`aggregate_match_id`)
REFERENCES `swp`.`match` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_groups_match_group_id`
FOREIGN KEY (`group_id`)
REFERENCES `swp`.`competition_groups` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
首先,将查询写成:
SELECT n.*
FROM country n JOIN
competition c
ON c.country_id = n.id JOIN
competition_seasons s
ON s.competition_id = c.id JOIN
competition_rounds r
ON r.season_id = s.id JOIN
`match` m
ON m.round_id = r.id
WHERE m.datetime >= ? AND
m.datetime < ?
GROUP BY n.id;
这里的改动比较小,不会影响性能。但它们很重要:
JOIN
而不是LEFT JOIN
,因为您要求条件匹配。- 日期参数而不是修改查询字符串,因为这是个好主意。
>=
和<
用于比较,因为这适用于日期和日期时间。您需要将结束日期增加 1 天 -- 但不要包含时间部分。
然后,为了性能,你需要索引:
match(datetime, round_id)
competition_rounds(id, season_id)
competition_seasons(id, competition_id)
competition(id, country_id)
country(id)
其实第一个是最重要的。如果相应的 id
列被声明为主键,则不需要最后四个。
对于 LEFT JOIN
,查询只能从上到下执行,这意味着最后 table 会扫描前 table 中条目的每个产品。此外,在没有任何聚合的情况下使用 LEFT JOIN
和 GROUP BY
是没有意义的,因为它总是 return 所有国家/地区 ID。话虽如此,我会这样重写它:
SELECT DISTINCT
c.country_id
FROM
competition c,
WHERE
EXISTS (
SELECT
*
FROM
competition_seasons s,
competition_rounds r,
`match` m
WHERE
s.competition_id = c.id
AND r.season_id = s.id
AND m.round_id = r.id
AND m.datetime BETWEEN ...
)
这将被我所知道的所有 RDB 正确优化。
请注意,(match.datetime, match.round_id)
上的 2 列索引 - 按此顺序会对性能产生巨大影响。或者写入速度是一个问题,建议至少在 (match.datetime)
上使用单个列索引。
关于字符串索引的重要说明:字符串比较在 RDB 中总是古怪的。确保对日期时间列使用二进制排序规则或使用本机 DATETIME 格式。各种 RDB 可能无法在不区分大小写的列上使用索引。
请注意,我删除了 n 上的连接 - 只是添加另一个 PK 查找以检查该国家/地区是否仍然存在于 table 国家/地区。如果您没有任何 ON DELETE CASCADE 或其他类型的确保数据一致性的约束,您可以将其添加回去,如下所示:
SELECT DISTINCT
n.id
FROM
country n
WHERE
EXISTS (
SELECT
*
FROM
competition c,
competition_seasons s,
competition_rounds r,
`match` m
WHERE
c.country_id=n.id
AND s.competition_id = c.id
AND r.season_id = s.id
AND m.round_id = r.id
AND m.datetime BETWEEN ...
)