为了效率和优化,我应该使用 eager_load 还是包含?

For the sake of efficiency and optimization, should I use eager_load or includes?

目前,我的代码是这样的:

current_user.association.includes(a: [:b, {c: :d}, {e: :f}]).to_a

进行调用时,似乎每个包含都是通过其自身对数据库的 SELECT 调用来调用的。

但是,当我这样做时 current_user.association.eager_load(a: [:b, {c: :d}, {e: :f}]).to_a 我看到一个巨大的 SELECT 电话。

我问是因为我以前从未见过这种情况。我假设 eager_load 由于较少的数据库调用而更有效。

因此,事实证明,ActiveRecord 曾一度试图将所有内容放入一个查询中,但后来选择这并不是一个好主意。

我用上面的查询和 4000 条记录对此进行了探索。

快速分析:

eager_load 花费了 2,600 毫秒。 包括花费了 72 毫秒。

eager_load 花了 36 次,只要包含。

由于我无法从您的描述中推断出查询(a: [:b, {c: :d}, {e: :f}]),因此我需要稍微谈谈includes

includes是一种适应不同情况的查询方式

下面是一些示例代码:

# model and reference
class Blog < ActiveRecord::Base
  has_many :posts

  # t.string   "name"
  # t.string   "author"
end

class Post < ActiveRecord::Base
  belongs_to :blog

  # t.string   "title"
end

# seed
(1..3).each do |b_id|
  blog = Blog.create(name: "Blog #{b_id}", author: 'someone')
  (1..5).each { |p_id| blog.posts.create(title: "Post #{b_id}-#{p_id}") }
end

在一种情况下,它会触发两个单独的查询,就像 preload.

> Blog.includes(:posts)
  Blog Load (2.8ms)  SELECT "blogs".* FROM "blogs"
  Post Load (0.7ms)  SELECT "posts".* FROM "posts" WHERE "posts"."blog_id" IN (1, 2, 3)

在另一种情况下,当查询引用的 table 时,它只触发一个 LEFT OUTER JOIN 查询,就像 eager_load.

> Blog.includes(:posts).where(posts: {title: 'Post 1-1'})
  SQL (0.3ms)  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id" WHERE "posts"."title" = ?  [["title", "Post 1-1"]]

因此,我认为您可能会要求 includeseager_load 的不同部分,即

为了效率和优化,我们应该使用两个单独的查询还是一个 LEFT OUTER JOIN 查询?

这也让我很困惑。经过一番挖掘,我发现 Fabio Akita 的 article 说服了我。以下是一些参考资料和示例:

For some situations, the monster outer join becomes slower than many smaller queries. The bottom line is: generally it seems better to split a monster join into smaller ones. This avoid the cartesian product overload problem.

The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.

来自 Rails

的查询数据示例
> Blog.eager_load(:posts).map(&:name).count
  SQL (0.9ms)  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id"
 => 3

LEFT OUTER JOIN 查询返回的 SQL 数据示例

sqlite>  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id";
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|1|Post 1-1|2015-11-11 15:22:35.053689|2015-11-11 15:22:35.053689|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|2|Post 1-2|2015-11-11 15:22:35.058113|2015-11-11 15:22:35.058113|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|3|Post 1-3|2015-11-11 15:22:35.062776|2015-11-11 15:22:35.062776|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|4|Post 1-4|2015-11-11 15:22:35.065994|2015-11-11 15:22:35.065994|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|5|Post 1-5|2015-11-11 15:22:35.069632|2015-11-11 15:22:35.069632|1
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|6|Post 2-1|2015-11-11 15:22:35.078644|2015-11-11 15:22:35.078644|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|7|Post 2-2|2015-11-11 15:22:35.081845|2015-11-11 15:22:35.081845|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|8|Post 2-3|2015-11-11 15:22:35.084888|2015-11-11 15:22:35.084888|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|9|Post 2-4|2015-11-11 15:22:35.087778|2015-11-11 15:22:35.087778|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|10|Post 2-5|2015-11-11 15:22:35.090781|2015-11-11 15:22:35.090781|2
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|11|Post 3-1|2015-11-11 15:22:35.097479|2015-11-11 15:22:35.097479|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|12|Post 3-2|2015-11-11 15:22:35.103512|2015-11-11 15:22:35.103512|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|13|Post 3-3|2015-11-11 15:22:35.108775|2015-11-11 15:22:35.108775|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|14|Post 3-4|2015-11-11 15:22:35.112654|2015-11-11 15:22:35.112654|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|15|Post 3-5|2015-11-11 15:22:35.117601|2015-11-11 15:22:35.117601|3

我们从 Rails 得到了预期的结果,但从 SQL 得到了更大的结果。这就是 LEFT OUTER JOIN.

的效率损失

所以我的结论是,includes优于eager_load


我在研究时完成了一篇关于 Preload, Eager_load, Includes, References, and Joins in Rails 的博客 post。希望这可以帮助。

参考