多对多匹配多重优化
Many-to-Many matching multiple optimization
我有一个包含多对多关系的应用程序。我需要 select 一个 table 中的所有行,这些行与另一个 table 中变量集的所有行相关联 table。
例如,我需要 select 与 bar
个实体关联的所有 foo
个实体 A
、B
、C
和 E
。用户可以 select 1、5、12 或 50 个 bar
个实体以按
过滤 foo
个实体
来自 tables 的相关字段:(ids 使用 uuid)
/* ~20k rows */
CREATE TABLE `foo` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`title` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/* ~30k rows */
CREATE TABLE `bar` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/* ~150k rows */
CREATE TABLE `foo_bar` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`foo_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
`bar_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `foo_id_foreign` (`foo_id`),
KEY `bar_id_foreign` (`bar_id`),
CONSTRAINT `bar_id_foreign` FOREIGN KEY (`bar_id`)
REFERENCES `bar` (`id`) ON DELETE CASCADE,
CONSTRAINT `foo_id_foreign` FOREIGN KEY (`foo_id`)
REFERENCES `foo` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
我尝试了从不同 SO 答案中看到的两种不同方法:多重连接和子查询。多重连接似乎工作得相当好,但似乎不是高度可扩展的。 运行 子查询似乎应该更好地扩展,但运行了几个小时。
多连接。它有效,但正如预期的那样,每个额外的连接都会以指数方式增加经过的时间。 3 bar
s 大约需要 800ms,这绝对是高的。解释看起来很合理。
select `foo`.*
from `foo`
inner join foo_bar `fb1` on `fb1`.`foo_id` = `foo`.`id`
inner join bar `b1` on `b1`.`id` = `fb1`.`bar_id` AND `b1`.`id` = :some_uuid1
inner join foo_bar `fb2` on `fb2`.`foo_id` = `foo`.`id`
inner join bar `b2` on `b2`.`id` = `fb2`.`bar_id` AND `b2`.`id` = :some_uuid2
inner join foo_bar `fb3` on `fb3`.`foo_id` = `foo`.`id`
inner join bar `b3` on `b3`.`id` = `fb3`.`bar_id` AND `b3`.`id` = :some_uuid3
group by `foo`.`id`
order by `foo`.`title` asc
limit 25 offset 0
子查询。无限期地运行。 where in (subquery)
与 inner join subquery
的效果相同,尽管解释最终看起来有点不同。
select `foo`.*
from `foo`
inner join (
select `foo_id`
from `foo_bar`
inner join `bar`
on `bar`.`id` = `foo_bar`.`bar_id`
where `bar`.`id` in (:some_uuid1, :some_uuid2, :some_uuid3)
group by `foo_id`
having COUNT(*) = 3
) as `subset` on `foo`.`id` = `subset`.`foo_id`
order by `foo`.`title` asc
limit 25 offset 0
解释:
id select_type table type key key_len rows extra
1 PRIMARY derived ALL NULL NULL 6618 Using temporary; Using filesort
1 PRIMARY foo eq_ref PRIMARY 108 1
2 DERIVED bar const PRIMARY 108 1 Using index; Using temporary; Using filesort
2 DERIVED foo_bar ref bar_id_foreign 109 16094 Using where
我的问题是我可以应用任何优化来使这种情况可用和可扩展吗?
你的规范化没问题。很高兴你有一个连接 table foo_bar
来处理多对多关系。
就优化您的 JOIN
而言,您无需在每次要检查新 ID 时都添加新连接,您可以使用 IN 运算符:
INNER JOIN foo_bar fb1 ON fb1.foo_id = foo.id AND fb1.id
IN (some_uuid1, some_uuid2, some_uuid3);
然后,如果您想获取与所有这三个匹配的行,整个查询将如下所示:
SELECT foo.id, foo.title
FROM foo
INNER JOIN foo_bar fb ON fb.foo_id = foo.id AND fb.id IN (some_uuid1, some_uuid2, some_uuid3)
GROUP BY foo.id
HAVING COUNT(*) = 3
ORDER BY foo.title
LIMIT 25;
我有一个包含多对多关系的应用程序。我需要 select 一个 table 中的所有行,这些行与另一个 table 中变量集的所有行相关联 table。
例如,我需要 select 与 bar
个实体关联的所有 foo
个实体 A
、B
、C
和 E
。用户可以 select 1、5、12 或 50 个 bar
个实体以按
foo
个实体
来自 tables 的相关字段:(ids 使用 uuid)
/* ~20k rows */
CREATE TABLE `foo` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`title` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/* ~30k rows */
CREATE TABLE `bar` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/* ~150k rows */
CREATE TABLE `foo_bar` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`foo_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
`bar_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `foo_id_foreign` (`foo_id`),
KEY `bar_id_foreign` (`bar_id`),
CONSTRAINT `bar_id_foreign` FOREIGN KEY (`bar_id`)
REFERENCES `bar` (`id`) ON DELETE CASCADE,
CONSTRAINT `foo_id_foreign` FOREIGN KEY (`foo_id`)
REFERENCES `foo` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
我尝试了从不同 SO 答案中看到的两种不同方法:多重连接和子查询。多重连接似乎工作得相当好,但似乎不是高度可扩展的。 运行 子查询似乎应该更好地扩展,但运行了几个小时。
多连接。它有效,但正如预期的那样,每个额外的连接都会以指数方式增加经过的时间。 3 bar
s 大约需要 800ms,这绝对是高的。解释看起来很合理。
select `foo`.*
from `foo`
inner join foo_bar `fb1` on `fb1`.`foo_id` = `foo`.`id`
inner join bar `b1` on `b1`.`id` = `fb1`.`bar_id` AND `b1`.`id` = :some_uuid1
inner join foo_bar `fb2` on `fb2`.`foo_id` = `foo`.`id`
inner join bar `b2` on `b2`.`id` = `fb2`.`bar_id` AND `b2`.`id` = :some_uuid2
inner join foo_bar `fb3` on `fb3`.`foo_id` = `foo`.`id`
inner join bar `b3` on `b3`.`id` = `fb3`.`bar_id` AND `b3`.`id` = :some_uuid3
group by `foo`.`id`
order by `foo`.`title` asc
limit 25 offset 0
子查询。无限期地运行。 where in (subquery)
与 inner join subquery
的效果相同,尽管解释最终看起来有点不同。
select `foo`.*
from `foo`
inner join (
select `foo_id`
from `foo_bar`
inner join `bar`
on `bar`.`id` = `foo_bar`.`bar_id`
where `bar`.`id` in (:some_uuid1, :some_uuid2, :some_uuid3)
group by `foo_id`
having COUNT(*) = 3
) as `subset` on `foo`.`id` = `subset`.`foo_id`
order by `foo`.`title` asc
limit 25 offset 0
解释:
id select_type table type key key_len rows extra
1 PRIMARY derived ALL NULL NULL 6618 Using temporary; Using filesort
1 PRIMARY foo eq_ref PRIMARY 108 1
2 DERIVED bar const PRIMARY 108 1 Using index; Using temporary; Using filesort
2 DERIVED foo_bar ref bar_id_foreign 109 16094 Using where
我的问题是我可以应用任何优化来使这种情况可用和可扩展吗?
你的规范化没问题。很高兴你有一个连接 table foo_bar
来处理多对多关系。
就优化您的 JOIN
而言,您无需在每次要检查新 ID 时都添加新连接,您可以使用 IN 运算符:
INNER JOIN foo_bar fb1 ON fb1.foo_id = foo.id AND fb1.id
IN (some_uuid1, some_uuid2, some_uuid3);
然后,如果您想获取与所有这三个匹配的行,整个查询将如下所示:
SELECT foo.id, foo.title
FROM foo
INNER JOIN foo_bar fb ON fb.foo_id = foo.id AND fb.id IN (some_uuid1, some_uuid2, some_uuid3)
GROUP BY foo.id
HAVING COUNT(*) = 3
ORDER BY foo.title
LIMIT 25;