优化 SQL JOIN 调用
Optimizing SQL JOIN call
我有一个 SQL 查询性能不佳。我已经对连接进行了一些研究,观看了教程,确保我定义了正确的索引等,但老实说,对于如何提高这个所谓的查询的性能,我有点迷茫。
我有以下架构定义:
create_table "training_plans", :force => true do |t|
t.integer "user_id"
end
add_index "training_plans", ["user_id"], :name => "index_training_plans_on_user_id"
create_table "training_weeks", :force => true do |t|
t.integer "training_plan_id"
t.date "start_date"
end
add_index "training_weeks", ["training_plan_id", "start_date"], :name => "index_training_weeks_on_training_plan_id_and_start_date"
add_index "training_weeks", ["training_plan_id"], :name => "index_training_weeks_on_training_plan_id"
create_table "training_efforts", :force => true do |t|
t.string "name"
t.date "plandate"
t.integer "training_week_id"
end
add_index "training_efforts", ["plandate"], :name => "index_training_efforts_on_plandate"
add_index "training_efforts", ["training_week_id", "plandate"], :name => "index_training_efforts_on_training_week_id_and_plandate"
add_index "training_efforts", ["training_week_id"], :name => "index_training_efforts_on_training_week_id"
然后以下调用收集与特定 training_plan 关联的所有 training_efforts,包括所有相关的游乐设施对象,其中 training_effort plandates 在目标内日期范围,按平面排序结果。
tefts = self.training_efforts.includes(:rides).order("plandate ASC").where("plandate >= ? AND plandate <= ?",
beginning_date,
end_date)
这会产生以下查询输出:
TrainingEffort Load (3393.6ms) SELECT "training_efforts".* FROM "training_efforts"
INNER JOIN "training_weeks" ON "training_efforts"."training_week_id" = "training_weeks"."id"
WHERE "training_weeks"."training_plan_id" = 104
AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC
我相信我定义了正确的索引。 table 没有那么大。然而,这需要花费大量时间。作为进一步的背景,这是在 Heroku Postgres 上。最后我要提到的是,在我的开发系统上,查询比大多数查询慢(3.3 毫秒),但仍然不比平均速度慢 1000 倍...
在此先感谢您对优化此查询的帮助。
更新
这是查询的 EXPLAIN 输出(在我的开发系统上发布):
explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks"
ON "training_efforts"."training_week_id" = "training_weeks"."id"
WHERE "training_weeks"."training_plan_id" = 7
AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=430.52..432.04 rows=606 width=120)
Sort Key: training_efforts.plandate
-> Hash Join (cost=15.12..402.51 rows=606 width=120)
Hash Cond: (training_efforts.training_week_id = training_weeks.id)
-> Seq Scan on training_efforts (cost=0.00..377.25 rows=1089 width=120)
Filter: ((plandate >= '2015-01-05'::date) AND (plandate <= '2016-01-03'::date))
-> Hash (cost=11.86..11.86 rows=261 width=4)
-> Seq Scan on training_weeks (cost=0.00..11.86 rows=261 width=4)
Filter: (training_plan_id = 7)
更新 2
尝试不同的查询以查看是否会使用我的索引,并注意到 training_efforts 的数量是 training_weeks 的 7 倍(两者都有日期列),我将尝试搜索training_week 日期而不是 training_effort 日期,如下所示:
explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks"
ON "training_weeks"."id" = "training_efforts"."training_week_id"
WHERE "training_weeks"."id" IN (SELECT "training_weeks"."id" FROM "training_weeks"
WHERE "training_weeks"."training_plan_id" = 7 AND (start_date >= '2015-01-05' AND start_date <= '2016-01-03'))
ORDER BY plandate ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=376.83..378.34 rows=602 width=120)
Sort Key: training_efforts.plandate
-> Nested Loop (cost=14.23..349.04 rows=602 width=120)
-> Hash Semi Join (cost=13.95..26.83 rows=86 width=8)
Hash Cond: (training_weeks.id = training_weeks_1.id)
-> Seq Scan on training_weeks (cost=0.00..10.69 rows=469 width=4)
-> Hash (cost=12.87..12.87 rows=86 width=4)
-> Bitmap Heap Scan on training_weeks training_weeks_1 (cost=5.37..12.87 rows=86 width=4)
Recheck Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
-> Bitmap Index Scan on index_training_weeks_on_training_plan_id_and_start_date (cost=0.00..5.35 rows=86 width=0)
Index Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
-> Index Scan using index_training_efforts_on_training_week_id on training_efforts (cost=0.28..3.68 rows=7 width=120)
Index Cond: (training_week_id = training_weeks.id)
这似乎稍微好一点,但我仍然不确定这是优化的...
每个 table 有多少行?您最近重新创建了这些 table 还是它们已经过时了?你最近分析过table吗?看起来它正在做 seq_scans 而不是使用您的任何索引。
我会发出一个
vacuum analyze
在您的整个数据库中,或者至少在这两个 table 中。很多时候,如果优化器没有关于 table.
的正确统计信息,它会跳过索引。
看起来您实际上并没有使用 JOIN
的输出,所以我建议完全放弃它,看看是否可以提高性能。
我建议使用 原始查询(您应该能够调用 ActiveRecord 对象的 connection.execute
方法SQL 和参数,用 ?
替换需要由 SQL 库(即变量)插值的参数,然后传递这些参数作为方法的第二个参数的列表)。
对于原始 SQL,我建议您尝试如下操作(根据需要用占位符和参数替换任何可能变化的参数)。我怀疑这会表现得更好。
SELECT te.*
FROM training_efforts AS te
WHERE EXISTS (SELECT 1
FROM training_weeks AS tw
WHERE tw.training_week_id = te.training_week_id
AND tw.training_plan_id = 7
AND start_date >= '2015-01-05' AND start_date <= '2016-01-03'
)
ORDER BY plandate ASC
就将其转换为 ActiveRecord 查询而言,我不确定它是否提供了相当程度的控制——最好将其保留为 原始查询.
我有一个 SQL 查询性能不佳。我已经对连接进行了一些研究,观看了教程,确保我定义了正确的索引等,但老实说,对于如何提高这个所谓的查询的性能,我有点迷茫。
我有以下架构定义:
create_table "training_plans", :force => true do |t|
t.integer "user_id"
end
add_index "training_plans", ["user_id"], :name => "index_training_plans_on_user_id"
create_table "training_weeks", :force => true do |t|
t.integer "training_plan_id"
t.date "start_date"
end
add_index "training_weeks", ["training_plan_id", "start_date"], :name => "index_training_weeks_on_training_plan_id_and_start_date"
add_index "training_weeks", ["training_plan_id"], :name => "index_training_weeks_on_training_plan_id"
create_table "training_efforts", :force => true do |t|
t.string "name"
t.date "plandate"
t.integer "training_week_id"
end
add_index "training_efforts", ["plandate"], :name => "index_training_efforts_on_plandate"
add_index "training_efforts", ["training_week_id", "plandate"], :name => "index_training_efforts_on_training_week_id_and_plandate"
add_index "training_efforts", ["training_week_id"], :name => "index_training_efforts_on_training_week_id"
然后以下调用收集与特定 training_plan 关联的所有 training_efforts,包括所有相关的游乐设施对象,其中 training_effort plandates 在目标内日期范围,按平面排序结果。
tefts = self.training_efforts.includes(:rides).order("plandate ASC").where("plandate >= ? AND plandate <= ?",
beginning_date,
end_date)
这会产生以下查询输出:
TrainingEffort Load (3393.6ms) SELECT "training_efforts".* FROM "training_efforts"
INNER JOIN "training_weeks" ON "training_efforts"."training_week_id" = "training_weeks"."id"
WHERE "training_weeks"."training_plan_id" = 104
AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC
我相信我定义了正确的索引。 table 没有那么大。然而,这需要花费大量时间。作为进一步的背景,这是在 Heroku Postgres 上。最后我要提到的是,在我的开发系统上,查询比大多数查询慢(3.3 毫秒),但仍然不比平均速度慢 1000 倍...
在此先感谢您对优化此查询的帮助。
更新 这是查询的 EXPLAIN 输出(在我的开发系统上发布):
explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks"
ON "training_efforts"."training_week_id" = "training_weeks"."id"
WHERE "training_weeks"."training_plan_id" = 7
AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=430.52..432.04 rows=606 width=120)
Sort Key: training_efforts.plandate
-> Hash Join (cost=15.12..402.51 rows=606 width=120)
Hash Cond: (training_efforts.training_week_id = training_weeks.id)
-> Seq Scan on training_efforts (cost=0.00..377.25 rows=1089 width=120)
Filter: ((plandate >= '2015-01-05'::date) AND (plandate <= '2016-01-03'::date))
-> Hash (cost=11.86..11.86 rows=261 width=4)
-> Seq Scan on training_weeks (cost=0.00..11.86 rows=261 width=4)
Filter: (training_plan_id = 7)
更新 2 尝试不同的查询以查看是否会使用我的索引,并注意到 training_efforts 的数量是 training_weeks 的 7 倍(两者都有日期列),我将尝试搜索training_week 日期而不是 training_effort 日期,如下所示:
explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks"
ON "training_weeks"."id" = "training_efforts"."training_week_id"
WHERE "training_weeks"."id" IN (SELECT "training_weeks"."id" FROM "training_weeks"
WHERE "training_weeks"."training_plan_id" = 7 AND (start_date >= '2015-01-05' AND start_date <= '2016-01-03'))
ORDER BY plandate ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=376.83..378.34 rows=602 width=120)
Sort Key: training_efforts.plandate
-> Nested Loop (cost=14.23..349.04 rows=602 width=120)
-> Hash Semi Join (cost=13.95..26.83 rows=86 width=8)
Hash Cond: (training_weeks.id = training_weeks_1.id)
-> Seq Scan on training_weeks (cost=0.00..10.69 rows=469 width=4)
-> Hash (cost=12.87..12.87 rows=86 width=4)
-> Bitmap Heap Scan on training_weeks training_weeks_1 (cost=5.37..12.87 rows=86 width=4)
Recheck Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
-> Bitmap Index Scan on index_training_weeks_on_training_plan_id_and_start_date (cost=0.00..5.35 rows=86 width=0)
Index Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
-> Index Scan using index_training_efforts_on_training_week_id on training_efforts (cost=0.28..3.68 rows=7 width=120)
Index Cond: (training_week_id = training_weeks.id)
这似乎稍微好一点,但我仍然不确定这是优化的...
每个 table 有多少行?您最近重新创建了这些 table 还是它们已经过时了?你最近分析过table吗?看起来它正在做 seq_scans 而不是使用您的任何索引。
我会发出一个
vacuum analyze
在您的整个数据库中,或者至少在这两个 table 中。很多时候,如果优化器没有关于 table.
的正确统计信息,它会跳过索引。看起来您实际上并没有使用 JOIN
的输出,所以我建议完全放弃它,看看是否可以提高性能。
我建议使用 原始查询(您应该能够调用 ActiveRecord 对象的 connection.execute
方法SQL 和参数,用 ?
替换需要由 SQL 库(即变量)插值的参数,然后传递这些参数作为方法的第二个参数的列表)。
对于原始 SQL,我建议您尝试如下操作(根据需要用占位符和参数替换任何可能变化的参数)。我怀疑这会表现得更好。
SELECT te.*
FROM training_efforts AS te
WHERE EXISTS (SELECT 1
FROM training_weeks AS tw
WHERE tw.training_week_id = te.training_week_id
AND tw.training_plan_id = 7
AND start_date >= '2015-01-05' AND start_date <= '2016-01-03'
)
ORDER BY plandate ASC
就将其转换为 ActiveRecord 查询而言,我不确定它是否提供了相当程度的控制——最好将其保留为 原始查询.