优化 SQL JOIN 调用

Optimizing SQL JOIN call

我有一个 SQL 查询性能不佳。我已经对连接进行了一些研究,观看了教程,确保我定义了正确的索引等,但老实说,对于如何提高这个所谓的查询的性能,我有点迷茫。

我有以下架构定义:

create_table "training_plans", :force => true do |t|
  t.integer  "user_id"
end

add_index "training_plans", ["user_id"], :name => "index_training_plans_on_user_id"

create_table "training_weeks", :force => true do |t|
  t.integer  "training_plan_id"
  t.date     "start_date"
end

add_index "training_weeks", ["training_plan_id", "start_date"], :name => "index_training_weeks_on_training_plan_id_and_start_date"
add_index "training_weeks", ["training_plan_id"], :name => "index_training_weeks_on_training_plan_id"

create_table "training_efforts", :force => true do |t|
  t.string   "name"
  t.date     "plandate"
  t.integer  "training_week_id"
end

add_index "training_efforts", ["plandate"], :name => "index_training_efforts_on_plandate"
add_index "training_efforts", ["training_week_id", "plandate"], :name => "index_training_efforts_on_training_week_id_and_plandate"
add_index "training_efforts", ["training_week_id"], :name => "index_training_efforts_on_training_week_id"

然后以下调用收集与特定 training_plan 关联的所有 training_efforts,包括所有相关的游乐设施对象,其中 training_effort plandates 在目标内日期范围,按平面排序结果。

    tefts = self.training_efforts.includes(:rides).order("plandate ASC").where("plandate >= ? AND plandate <= ?",
                                                      beginning_date,
                                                      end_date)

这会产生以下查询输出:

TrainingEffort Load (3393.6ms)  SELECT "training_efforts".* FROM "training_efforts" 
  INNER JOIN "training_weeks" ON "training_efforts"."training_week_id" = "training_weeks"."id" 
  WHERE "training_weeks"."training_plan_id" = 104 
  AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC

我相信我定义了正确的索引。 table 没有那么大。然而,这需要花费大量时间。作为进一步的背景,这是在 Heroku Postgres 上。最后我要提到的是,在我的开发系统上,查询比大多数查询慢(3.3 毫秒),但仍然不比平均速度慢 1000 倍...

在此先感谢您对优化此查询的帮助。

更新 这是查询的 EXPLAIN 输出(在我的开发系统上发布):

explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks" 
  ON "training_efforts"."training_week_id" = "training_weeks"."id" 
  WHERE "training_weeks"."training_plan_id" = 7 
  AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Sort  (cost=430.52..432.04 rows=606 width=120)
   Sort Key: training_efforts.plandate
   ->  Hash Join  (cost=15.12..402.51 rows=606 width=120)
         Hash Cond: (training_efforts.training_week_id = training_weeks.id)
         ->  Seq Scan on training_efforts  (cost=0.00..377.25 rows=1089 width=120)
               Filter: ((plandate >= '2015-01-05'::date) AND (plandate <= '2016-01-03'::date))
         ->  Hash  (cost=11.86..11.86 rows=261 width=4)
               ->  Seq Scan on training_weeks  (cost=0.00..11.86 rows=261 width=4)
                     Filter: (training_plan_id = 7) 

更新 2 尝试不同的查询以查看是否会使用我的索引,并注意到 training_efforts 的数量是 training_weeks 的 7 倍(两者都有日期列),我将尝试搜索training_week 日期而不是 training_effort 日期,如下所示:

explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks" 
  ON "training_weeks"."id" = "training_efforts"."training_week_id" 
  WHERE "training_weeks"."id" IN (SELECT "training_weeks"."id" FROM "training_weeks" 
  WHERE "training_weeks"."training_plan_id" = 7 AND (start_date >= '2015-01-05' AND start_date <= '2016-01-03')) 
  ORDER BY plandate ASC;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=376.83..378.34 rows=602 width=120)
   Sort Key: training_efforts.plandate
   ->  Nested Loop  (cost=14.23..349.04 rows=602 width=120)
         ->  Hash Semi Join  (cost=13.95..26.83 rows=86 width=8)
               Hash Cond: (training_weeks.id = training_weeks_1.id)
               ->  Seq Scan on training_weeks  (cost=0.00..10.69 rows=469 width=4)
               ->  Hash  (cost=12.87..12.87 rows=86 width=4)
                     ->  Bitmap Heap Scan on training_weeks training_weeks_1  (cost=5.37..12.87 rows=86 width=4)
                           Recheck Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
                           ->  Bitmap Index Scan on index_training_weeks_on_training_plan_id_and_start_date  (cost=0.00..5.35 rows=86 width=0)
                                 Index Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
         ->  Index Scan using index_training_efforts_on_training_week_id on training_efforts  (cost=0.28..3.68 rows=7 width=120)
               Index Cond: (training_week_id = training_weeks.id)

这似乎稍微好一点,但我仍然不确定这是优化的...

每个 table 有多少行?您最近重新创建了这些 table 还是它们已经过时了?你最近分析过table吗?看起来它正在做 seq_scans 而不是使用您的任何索引。

我会发出一个

vacuum analyze

在您的整个数据库中,或者至少在这两个 table 中。很多时候,如果优化器没有关于 table.

的正确统计信息,它会跳过索引。

看起来您实际上并没有使用 JOIN 的输出,所以我建议完全放弃它,看看是否可以提高性能。

我建议使用 原始查询(您应该能够调用 ActiveRecord 对象的 connection.execute 方法SQL 和参数,用 ? 替换需要由 SQL 库(即变量)插值的参数,然后传递这些参数作为方法的第二个参数的列表)。

对于原始 SQL,我建议您尝试如下操作(根据需要用占位符和参数替换任何可能变化的参数)。我怀疑这会表现得更好。

SELECT te.*
FROM training_efforts AS te
WHERE EXISTS (SELECT 1
              FROM training_weeks AS tw
              WHERE tw.training_week_id = te.training_week_id
                AND tw.training_plan_id = 7
                AND start_date >= '2015-01-05' AND start_date <= '2016-01-03'
            )
ORDER BY plandate ASC

就将其转换为 ActiveRecord 查询而言,我不确定它是否提供了相当程度的控制——最好将其保留为 原始查询.