多对多匹配多重优化

Many-to-Many matching multiple optimization

我有一个包含多对多关系的应用程序。我需要 select 一个 table 中的所有行,这些行与另一个 table 中变量集的所有行相关联 table。

例如,我需要 select 与 bar 个实体关联的所有 foo 个实体 ABCE。用户可以 select 1、5、12 或 50 个 bar 个实体以按

过滤 foo 个实体

来自 tables 的相关字段:(ids 使用 uuid)

/* ~20k rows */
CREATE TABLE `foo` (
   `id` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `title` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

/* ~30k rows */
CREATE TABLE `bar` (
  `id` char(36) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

/* ~150k rows */
CREATE TABLE `foo_bar` (
  `id` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `foo_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
  `bar_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `foo_id_foreign` (`foo_id`),
  KEY `bar_id_foreign` (`bar_id`),
  CONSTRAINT `bar_id_foreign` FOREIGN KEY (`bar_id`) 
      REFERENCES `bar` (`id`) ON DELETE CASCADE,
  CONSTRAINT `foo_id_foreign` FOREIGN KEY (`foo_id`) 
      REFERENCES `foo` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

我尝试了从不同 SO 答案中看到的两种不同方法:多重连接和子查询。多重连接似乎工作得相当好,但似乎不是高度可扩展的。 运行 子查询似乎应该更好地扩展,但运行了几个小时。

多连接。它有效,但正如预期的那样,每个额外的连接都会以指数方式增加经过的时间。 3 bars 大约需要 800ms,这绝对是高的。解释看起来很合理。

select `foo`.* 
from `foo`
inner join foo_bar `fb1` on `fb1`.`foo_id` = `foo`.`id`
inner join bar `b1` on `b1`.`id` = `fb1`.`bar_id` AND `b1`.`id` = :some_uuid1
inner join foo_bar `fb2` on `fb2`.`foo_id` = `foo`.`id`
inner join bar `b2` on `b2`.`id` = `fb2`.`bar_id` AND `b2`.`id` = :some_uuid2
inner join foo_bar `fb3` on `fb3`.`foo_id` = `foo`.`id`
inner join bar `b3` on `b3`.`id` = `fb3`.`bar_id` AND `b3`.`id` = :some_uuid3
group by `foo`.`id`
order by `foo`.`title` asc 
limit 25 offset 0

子查询。无限期地运行。 where in (subquery)inner join subquery 的效果相同,尽管解释最终看起来有点不同。

select `foo`.* 
from `foo`
inner join (
    select `foo_id` 
    from `foo_bar` 
    inner join `bar` 
        on `bar`.`id` = `foo_bar`.`bar_id`
    where `bar`.`id` in (:some_uuid1, :some_uuid2, :some_uuid3) 
    group by `foo_id` 
    having COUNT(*) = 3
) as `subset` on `foo`.`id`  = `subset`.`foo_id`
order by `foo`.`title` asc 
limit 25 offset 0

解释:

id  select_type table   type    key            key_len rows  extra
1   PRIMARY     derived ALL     NULL           NULL    6618  Using temporary; Using filesort
1   PRIMARY     foo     eq_ref  PRIMARY        108     1   
2   DERIVED     bar     const   PRIMARY        108     1     Using index; Using temporary; Using filesort
2   DERIVED     foo_bar ref     bar_id_foreign 109     16094 Using where

我的问题是我可以应用任何优化来使这种情况可用和可扩展吗?

你的规范化没问题。很高兴你有一个连接 table foo_bar 来处理多对多关系。

就优化您的 JOIN 而言,您无需在每次要检查新 ID 时都添加新连接,您可以使用 IN 运算符:

INNER JOIN foo_bar fb1 ON fb1.foo_id = foo.id AND fb1.id 
   IN (some_uuid1, some_uuid2, some_uuid3);

然后,如果您想获取与所有这三个匹配的行,整个查询将如下所示:

SELECT foo.id, foo.title
FROM foo
INNER JOIN foo_bar fb ON fb.foo_id = foo.id AND fb.id IN (some_uuid1, some_uuid2, some_uuid3)
GROUP BY foo.id
HAVING COUNT(*) = 3
ORDER BY foo.title
LIMIT 25;