如何优化结合了 INNER JOIN、DISTINCT 和 WHERE 的 SQL 查询?

How to optimize a SQL query that combines INNER JOINs, DISTINCT and WHERE?

SELECT DISTINCT options.id, options.foo_option_id, options.description
FROM vehicles 
INNER JOIN vehicle_options     ON vehicle_options.vehicle_id = vehicles.id 
INNER JOIN options             ON options.id = vehicle_options.option_id 
INNER JOIN discounted_vehicles ON vehicles.id = discounted_vehicles.vehicle_id 
WHERE discounted_vehicles.discount_id = 4;

上面的查询 returns 我 2067 行,它在 1.7 秒内在本地运行。 我想知道它是否尽可能快,或者我是否可以以某种方式进一步调整它,因为这个数据集会随着时间的推移而快速增长。

我在没有速度变化的情况下尝试过的事情:

1 - 更改连接顺序,从最小到最大连接 table。

2 - 向 discounted_vehicles.discount_id 添加索引。

1 - Change the join order, joining from the smallest to the biggest table.

在幕后,PostgreSQL 根据 SQL 优化器设计的解释计划重新排列 table 的顺序。你写的顺序没有意义。

2 - Adding an index to discounted_vehicles.discount_id.

这取决于 discount_id 列的选择性。你认为它会过滤掉 95% 的行,只留下 5% 吗?如果剩下 5% 或更少,索引会有所帮助。否则完整 table 扫描会更快。

此外,如果还没有,我会添加索引:

vehicle_options (vehicle_id)

但也许它已经被外键创建了。

尝试使用 groupby 而不是 distinct

SELECT 
    "options"."id",
    "options"."foo_option_id",
    "options"."description"
FROM
    "vehicles" 
    INNER JOIN "vehicle_options" ON "vehicle_options"."vehicle_id" = "vehicles"."id" 
    INNER JOIN "options" ON "options"."id" = "vehicle_options"."option_id" 
    INNER JOIN "discounted_vehicles" ON "vehicles"."id" = "discounted_vehicles"."vehicle_id" 
WHERE 
    "discounted_vehicles"."discount_id" = 4 
GROUP BY 
    "options.id";

不过,您需要先创建必要的索引,然后再尝试 运行 下面的查询

SELECT "options"."id", "options"."foo_option_id",
    "options"."description"
  FROM "vehicles" 
  INNER JOIN "vehicle_options" 
    ON "vehicle_options"."vehicle_id" = "vehicles"."id" 
  INNER JOIN "options" 
    ON "options"."id" = "vehicle_options"."option_id" 
  INNER JOIN "discounted_vehicles" 
    ON "vehicles"."id" = "discounted_vehicles"."vehicle_id" 
  WHERE "discounted_vehicles"."discount_id" = 4
  GROUP BY options"."id", "options"."foo_option_id",
    "options"."description"

最佳查询取决于缺失信息。
这在典型设置中应该快得多:

SELECT id, foo_option_id, description
FROM   options o
WHERE  EXISTS (
   SELECT
   FROM   discounted_vehicles d
   JOIN   vehicle_options vo USING (vehicle_id)
   WHERE  d.discount_id = 4
   AND    vo.option_id = o.id
   );

假设引用完整性,由 FK 约束强制执行,我们可以从查询中省略 table vehicle 并直接从 discounted_vehicles 连接到 vehicle_options

此外,如果每个不同选项有很多符合条件的行,EXISTS 通常会更快。

理想情况下,您应该在以下位置拥有多列索引:

discounted_vehicles(discount_id, vehicle_id)
vehicle_options(vehicle_id, option_id)

按此顺序索引列。您可能在提供第二个索引的 vehicle_options 上有 PK 约束,但列顺序应该匹配。相关:

根据实际数据分布情况,可能会有更快的查询方式。相关:

  • Optimize GROUP BY query to retrieve latest record per user
  • Select first row in each GROUP BY group?

更改 加入顺序 通常 无用。 Postgres 重新排序加入它期望最快的任何方式。 (例外情况适用。)相关:

  • Sample Query to show Cardinality estimation error in PostgreSQL SQL INNER JOIN over multiple tables equal to WHERE syntax