自定义查询以获取 table 的所有条目并且仅包含基于特定列的许多重复项中的第一个

Custom query to fetch all entries of a table and that only contains first of many duplicates based on a specific column

我有一个 Location 模型,table 看起来像

id name vin ip_address created_at updated_at
0 default 0 0.0.0.0/0 2021-11-08 11:54:26.822623 2021-11-08 11:54:26.822623
1 admin 1 10.108.150.143 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
2 V122 122 10.108.150.122 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
3 V123 123 10.108.150.123 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
4 V124 124 10.108.150.124 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
5 V122 122 10.108.150.122 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
6 V125 122 10.108.150.125 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885

我在 Location 模型中的方法

   def self.find_all_non_duplicate
     return self.find(:all, :conditions => "id <> 1")
   end

我想获取位置 table 的所有条目,id = 1 的条目除外,该条目仅包含基于列 ip_address 的许多重复项的第一个条目.

因为 id = 2 和 id = 5 的 ip_address 是重复的。我想保留许多重复项的第一个条目,即id = 2.

预期结果是

id name vin ip_address created_at updated_at
0 default 0 0.0.0.0/0 2021-11-08 11:54:26.822623 2021-11-08 11:54:26.822623
2 V122 122 10.108.150.122 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
3 V123 123 10.108.150.123 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
4 V124 124 10.108.150.124 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
6 V125 122 10.108.150.125 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885

要忽略id为1和5的条目

你需要的是 distinct on proposed to RoR quite recently here but not yet merged, as pointed out by @engineersmnky。在原始 SQL 中,它看起来像这样:

select distinct on (ip_address) * 
from test 
where id<>1 
order by ip_address,created_at;

这将转化为 RoR 的

self.where("id <> 1").distinct_on(:ip_address)

或者,直到新功能被接受:

self.where("id <> 1").select("distinct on (ip_address) *")

完整的数据库端测试:

drop table if exists test cascade;
create table test (
    id serial primary key,
    name text,
    vin integer,
    ip_address inet,
    created_at timestamp,
    updated_at timestamp);
insert into test 
(id,name,vin,ip_address,created_at,updated_at)
values
(0,'default', 0,'0.0.0.0/0'::inet,'2021-11-08 11:54:26.822623'::timestamp,'2021-11-08 11:54:26.822623'::timestamp),
(1,'admin',   1,'10.108.150.143'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
(2,'V122',    122,'10.108.150.122'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
(3,'V123',    123,'10.108.150.123'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
(4,'V124',    124,'10.108.150.124'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
(5,'V122',    122,'10.108.150.122'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
(6,'V125',    122,'10.108.150.125'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp);

select distinct on (ip_address) * 
from test where id<>1 
order by ip_address,created_at;
-- id |  name   | vin |   ip_address   |         created_at         |         updated_at
------+---------+-----+----------------+----------------------------+----------------------------
--  0 | default |   0 | 0.0.0.0/0      | 2021-11-08 11:54:26.822623 | 2021-11-08 11:54:26.822623
--  2 | V122    | 122 | 10.108.150.122 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
--  3 | V123    | 123 | 10.108.150.123 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
--  4 | V124    | 124 | 10.108.150.124 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
--  6 | V125    | 122 | 10.108.150.125 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
--(5 rows)