ActiveRecord 集合减法

ActiveRecord Collection subtraction

对从类似的 ActiveRecord 集合中减去查询有疑问。

假设我有一个查询如下:

all_users = User.all
users_with_adequate_reviews = User.joins(:reviews).select("users.id, count(*) as num_reviews").group(:id).having("num_reviews > 5")

如果我这样做 all_users - users_with_adequate_reviews,我会得到我所期望的,即评论数少于 5 的用户。即使我只有 [=18],ActiveRecord 关系减法如何知道删除相似的记录=] 一些来自用户的属性(主要是 id)。正在寻找有关此的文档,但无法在任何地方找到它

减法的定义在哪里?

ActiveRecord::Delegation 模块上定义了 ActiveRecord 关系的减法。

如果您正在挖掘该源代码,您可以看到该方法是从 Array class.

委托的

所以我们需要深挖Array的减法来理解ActiveRecord关系的减法是如何工作的。


数组减法如何工作?

这取自documentation关于数组减法/差分的内容。

Array Difference

Returns a new array that is a copy of the original array, removing any items that also appear in other_ary. The order is preserved from the original array.

It compares elements using their hash and eql? methods for efficiency.

这意味着减法评估两个方法:hash && eql? 从每个对象执行任务。


这些方法如何作用于 ACTIVE RECORD 对象?

下面的代码取自 ActiveRecord::Core 模块。

def ==(comparison_object)
  super ||
    comparison_object.instance_of?(self.class) &&
    !id.nil? &&
    comparison_object.id == id
end
alias :eql? :==

def hash
  if id
    self.class.hash ^ id.hash
  else
    super
  end
end

您可以看到 hasheql? 都只评估 classid

这意味着 all_users - users_with_adequate_reviews 将排除某些对象 仅当两个元素中的任何对象具有相同的对象 ID 和对象的 class。


另一个样本

irb(main):001:0> users = User.all
  User Load (26.4ms)  SELECT  `users`.* FROM `users` LIMIT 11

=> #<ActiveRecord::Relation [
  #<User id: 1, name: "Bob", created_at: "2020-06-09 13:03:45", updated_at: "2020-06-09 13:03:45">, 
  #<User id: 2, name: "Danny", created_at: "2020-06-09 13:04:14", updated_at: "2020-06-09 13:04:14">, 
  #<User id: 3, name: "Alan", created_at: "2020-06-09 13:05:30", updated_at: "2020-06-09 13:05:30">, 
  #<User id: 4, name: "Joe", created_at: "2020-06-09 13:07:00", updated_at: "2020-06-09 13:07:00">]>

irb(main):002:0> users_with_multiple_emails = User.joins(:user_emails).select("users.id, users.name, count(*) as num_emails").group(:id).having("num_emails > 1")
  User Load (2.8ms)  SELECT  users.id, users.name, count(*) as num_emails FROM `users` INNER JOIN `user_emails` ON `user_emails`.`user_id` = `users`.`id` GROUP BY `users`.`id` HAVING (num_emails > 1) LIMIT 11

=> #<ActiveRecord::Relation [#<User id: 1, name: "Bob">]>

irb(main):003:0> users - users_with_multiple_emails

=> [
  #<User id: 2, name: "Danny", created_at: "2020-06-09 13:04:14", updated_at: "2020-06-09 13:04:14">, 
  #<User id: 3, name: "Alan", created_at: "2020-06-09 13:05:30", updated_at: "2020-06-09 13:05:30">, 
  #<User id: 4, name: "Joe", created_at: "2020-06-09 13:07:00", updated_at: "2020-06-09 13:07:00">]

如您所见,all users - users_with_multiple_emails 排除了第一个对象 (Bob)。

为什么?这是因为来自两个元素的 Bob 具有相同的 id 和 class (id: 1, class: User)

减法returns不同的结果如果是这样

irb(main):001:0> users = User.all
  User Load (26.4ms)  SELECT  `users`.* FROM `users` LIMIT 11

=> #<ActiveRecord::Relation [
  #<User id: 1, name: "Bob", created_at: "2020-06-09 13:03:45", updated_at: "2020-06-09 13:03:45">, 
  #<User id: 2, name: "Danny", created_at: "2020-06-09 13:04:14", updated_at: "2020-06-09 13:04:14">, 
  #<User id: 3, name: "Alan", created_at: "2020-06-09 13:05:30", updated_at: "2020-06-09 13:05:30">, 
  #<User id: 4, name: "Joe", created_at: "2020-06-09 13:07:00", updated_at: "2020-06-09 13:07:00">]>

irb(main):002:0> users_with_multiple_emails = User.joins(:user_emails).select("users.name, count(*) as num_emails").group(:id).having("num_emails > 1")
  User Load (2.3ms)  SELECT  users.name, count(*) as num_emails FROM `users` INNER JOIN `user_emails` ON `user_emails`.`user_id` = `users`.`id` GROUP BY `users`.`id` HAVING (num_emails > 1) LIMIT 11
=> #<ActiveRecord::Relation [#<User id: nil, name: "Bob">]>

irb(main):003:0> users - users_with_multiple_emails

=> [
  #<User id: 1, name: "Bob", created_at: "2020-06-09 13:03:45", updated_at: "2020-06-09 13:03:45">, 
  #<User id: 2, name: "Danny", created_at: "2020-06-09 13:04:14", updated_at: "2020-06-09 13:04:14">, 
  #<User id: 3, name: "Alan", created_at: "2020-06-09 13:05:30", updated_at: "2020-06-09 13:05:30">, 
  #<User id: 4, name: "Joe", created_at: "2020-06-09 13:07:00", updated_at: "2020-06-09 13:07:00">]

这次users_with_multiple_emails只有select名字&num_emails.

如您所见,all users - users_with_multiple_emails 不排除 Bob

为什么?这是因为两个元素的 Bob 具有不同的 id。

  • Bob 来自 users 的 id : 1
  • Bob 来自 users_with_multiple_emails 的 id:无