高级搜索中有这么多子查询"Mysql-Laravel"

Question

我正在构建一个基于 Laravel 的“广告”网站作为 API，其中包含动态高级搜索。

广告可以有多个属性，用户可以进行高级搜索，但是如果用户选择了太多的属性进行过滤，这会导致太多的子查询，查询会变慢，有没有比这个更好的查询的建议.

select
 *
from
  `ads`
where
  `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 32
      and `value` in ('odio', 'quos', 'dignissimos', 'dolorem', 'qui')
  )
  and `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 171
      and `value` in ('itaque', 'non', 'dolor', 'laborum')
  )
  and `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 111
      and `value` in ('quia', 'non', 'nam', 'molestias')
  )
  and `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 144
      and `value` in ('delectus', 'nam', 'exercitationem', 'sit')
  )
  and `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 160
      and `value` in ('repellat', 'fugit', 'quaerat', 'vero')
  )
  and `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 176
      and `value` in ('mollitia', 'voluptates', 'maxime', 'culpa')
  )
  and `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      `attribute_id` = 177
      and `value` in ('necessitatibus', 'id')
  )

这是搜索代码，PS。我正在使用 mpyw/eloquent-has-by-non-dependent-subquery 而不是使用 whereHas 因为它更慢....

->when(!empty($search->attrs), function (Builder $query) use ($search) {
            foreach ($search->attrs as $key => $value) {
                if (!is_null($value)) {
                    $query->hasByNonDependentSubquery('adAttributes', function (Builder $q) use ($value, $key, $search) {
                        $q->where('attribute_id', $key)
                            ->when(is_array($value), fn($q) => $q->whereIn('value', $value))
                            ->when(!is_array($value), fn($q) => $q->where('value', $value));
                    });
                }
            }
            return $query;
        });

我在我的数据库中添加了一些数据，我无法共享所有数据，因为它的虚拟种子数据在 Fiddle [https://dbfiddle.uk/?rdbms=[ 上大约有 60k =23=].7&fiddle=6f00cc71fff716837906c22d46b0c899][1]

Answer 1

是否有理由不加入 table 并使用 where()？

$query = Ads::select('ads.*')->join('ads_attributes', 'ads.id', 'ads_attributes.ad_id');

if (!empty($search->attrs) {
    foreach ($search->attrs as $k=>$v) {
        $q->orWhere(function (Builder $q) (use $k, $v) {
            $q->where('ads_attributes.id', $k)
                ->when(
                    is_array($v),
                    fn ($q) => $q->whereIn('value', $v),
                    fn ($q) => $q->where('value', $v),
                );
        });
    }
}

return $query;

Answer 2

实体-属性-值架构模式是出了名的冗长和低效。但是很灵活。

它可能有助于将此“复合”索引（也就是“覆盖”）添加到 table ad_attributes:

INDEX(attribute_id, value, ad_attributes)

Answer 3

这可能完全没用，而且性能非常糟糕，但我依稀记得多年前在与 EAV 性能问题作斗争时使用过类似的东西。我没有 suitable 测试数据集来对此进行测试，所以我提出这个建议可能是在自欺欺人。

select
 *
from
  `ads`
where
  `ads`.`id` in (
    select
      `ad_attributes`.`ad_id`
    from
      `ad_attributes`
    where
      (`attribute_id` = 32  and `value` in ('odio', 'quos', 'dignissimos', 'dolorem', 'qui')) OR
      (`attribute_id` = 171 and `value` in ('itaque', 'non', 'dolor', 'laborum')) OR
      (`attribute_id` = 111 and `value` in ('quia', 'non', 'nam', 'molestias')) OR
      (`attribute_id` = 144 and `value` in ('delectus', 'nam', 'exercitationem', 'sit')) OR
      (`attribute_id` = 160 and `value` in ('repellat', 'fugit', 'quaerat', 'vero')) OR
      (`attribute_id` = 176 and `value` in ('mollitia', 'voluptates', 'maxime', 'culpa')) OR
      (`attribute_id` = 177 and `value` in ('necessitatibus', 'id'))
    group by `ad_attributes`.`ad_id`
    having count(`attribute_id`) = 7
  )

编辑正如我在下面的第一条评论中所述 - You may need to COUNT(DISTINCT attribute_id), depending on whether ads can have multiple rows for the same attribute_id。由于您的测试数据集对于相同的 ad_id、attribute_id 对确实有多行，因此您需要添加 DISTINCT.

我已经添加到你原来的SQL Fiddle and your db<>fiddle。不幸的是 db<>fiddle 没有 return 每个查询的执行时间。

根据您的 fiddle 中的 table，这些 table 目前没有索引。您的 ad_attributes table 上还有一个不必要的代理主键，而不是 (ad_id, attribute_id) 或 (ad_id, attribute_id，价值）。如果您 allow/need 每个 ad_id 多行相同的 attribute_id，则需要第二个版本，就像您的测试数据中当前的情况一样。这是有意为之还是您的测试数据创建方式存在错误？

在 ad_attributes 中的 776 个测试行中，有 106 组（ad_id、attribute_id、attribute_option_id、值）具有两个或多个副本，并且最差的有六份。我建议您的测试数据没有用，因为它不遵循生产数据集中所需的基本规则和约束。用完全随机的数据测试查询并不是特别有用。您的测试数据应该尝试模拟您可能在生产数据中看到的内容。

我不完全相信我理解你的数据结构的意图。拥有 attribute_option_id 和 value 列背后的想法是什么？ text 和 option 属性背后的想法是什么？你真的需要文本属性吗？请仔细考虑最后一个问题，因为随着数据集的增长，它有可能对数据质量和性能产生非常重大的影响。

高级搜索中有这么多子查询"Mysql-Laravel"

So many sub-queries in advanced search "Mysql-Laravel"

mysql

subquery

advanced-search

laravel

eloquent