SQL - 创建 SQL 加入列表
SQL - create SQL to join lists
我有以下 table:
CREATE temp TABLE "t_table" (
usr_id bigint,
address varchar[],
msg_cnt bigint,
usr_cnt bigint,
source varchar[],
last_update timestamp
);
添加数据:
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (1, '{44.154.48.125,81.134.82.111,95.155.38.120,94.134.88.136}', 10, 3, '{src1,src2}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (2, '{44.154.48.125}', 10, 3, '{src1,src3}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (3, '{94.134.88.136}', 10, 3, '{src1,src4}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (4, '{127.0.0.1}', 10, 3, '{src1,src5}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (5, '{127.0.0.1,5.5.5.5}', 10, 3, '{src1,src3}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (6, '{1.1.0.9}', 10, 3, '{src1,src2}', '2019-10-16 22:16:22.163000');
查找共享地址的用户。
预期结果:
| users | address | sum_msg_cnt | sum_usr_cnt | max_last_date | source |
|---------------------------------|-------------------------------------------------------------|--------------|------------------|--------------------------------|-----------------------------|
| {1,2,3} | {44.154.48.125,81.134.82.111,95.155.38.120,94.134.88.136} | 30 | 9 | "2019-10-16 22:16:22.163000" | {src4,src1,src2,src3} |
| {4,5} | {127.0.0.1,5.5.5.5} | 20 | 6 | "2019-10-16 22:16:22.163000" | {src1,src5,src3} |
| {6} | {1.1.0.9} | 10 | 3 | "2019-10-16 22:16:22.163000" | {src1,src2} |
问题:
如何制定 SQL 查询以获得预期结果?
非常感谢。
更多信息:
PostgreSQL 9.5.19
使用 "group by" 运算符
对地址进行分组
我不知道这是否是最有效的方法,但我现在想不出更好的方法。
我认为这在更大的 table 上会有糟糕的表现。
with userlist as (
select array_agg(t.usr_id) as users,
a.address
from t_table t
left join unnest(t.address) as a(address) on true
group by a.address
), shared_users as (
select u.address,
array(select distinct ul.uid
from userlist u2, unnest(u2.users) as ul(uid)
where u.users && u2.users
order by ul.uid) as users
from userlist u
)
select users, array_agg(distinct address)
from shared_users
group by users;
它有什么作用?
第一个 CTE 收集共享至少一个地址的所有用户。 userlist
CTE 的输出是:
users | address
------+--------------
{1} | 95.155.38.120
{1,3} | 94.134.88.136
{1,2} | 44.154.48.125
{6} | 1.1.0.9
{4,5} | 127.0.0.1
{1} | 81.134.82.111
{5} | 5.5.5.5
现在这可用于聚合那些共享至少一个地址的用户列表。 shared_users
CTE 的输出是:
address | users
--------------+--------
95.155.38.120 | {1,2,3}
94.134.88.136 | {1,2,3}
44.154.48.125 | {1,2,3}
1.1.0.9 | {6}
127.0.0.1 | {4,5}
81.134.82.111 | {1,2,3}
5.5.5.5 | {4,5}
如您所见,我们现在有具有相同 usr_ids 列表的组。在最后一步中,我们可以按这些分组并聚合地址,然后 return:
users | array_agg
--------+----------------------------------------------------------
{1,2,3} | {44.154.48.125,81.134.82.111,94.134.88.136,95.155.38.120}
{4,5} | {127.0.0.1,5.5.5.5}
{6} | {1.1.0.9}
我有以下 table:
CREATE temp TABLE "t_table" (
usr_id bigint,
address varchar[],
msg_cnt bigint,
usr_cnt bigint,
source varchar[],
last_update timestamp
);
添加数据:
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (1, '{44.154.48.125,81.134.82.111,95.155.38.120,94.134.88.136}', 10, 3, '{src1,src2}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (2, '{44.154.48.125}', 10, 3, '{src1,src3}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (3, '{94.134.88.136}', 10, 3, '{src1,src4}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (4, '{127.0.0.1}', 10, 3, '{src1,src5}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (5, '{127.0.0.1,5.5.5.5}', 10, 3, '{src1,src3}', '2019-10-16 22:16:22.163000');
INSERT INTO "t_table"(usr_id, address, msg_cnt, usr_cnt, source, last_update) VALUES (6, '{1.1.0.9}', 10, 3, '{src1,src2}', '2019-10-16 22:16:22.163000');
查找共享地址的用户。
预期结果:
| users | address | sum_msg_cnt | sum_usr_cnt | max_last_date | source |
|---------------------------------|-------------------------------------------------------------|--------------|------------------|--------------------------------|-----------------------------|
| {1,2,3} | {44.154.48.125,81.134.82.111,95.155.38.120,94.134.88.136} | 30 | 9 | "2019-10-16 22:16:22.163000" | {src4,src1,src2,src3} |
| {4,5} | {127.0.0.1,5.5.5.5} | 20 | 6 | "2019-10-16 22:16:22.163000" | {src1,src5,src3} |
| {6} | {1.1.0.9} | 10 | 3 | "2019-10-16 22:16:22.163000" | {src1,src2} |
问题:
如何制定 SQL 查询以获得预期结果?
非常感谢。
更多信息:
PostgreSQL 9.5.19
使用 "group by" 运算符
对地址进行分组我不知道这是否是最有效的方法,但我现在想不出更好的方法。
我认为这在更大的 table 上会有糟糕的表现。
with userlist as (
select array_agg(t.usr_id) as users,
a.address
from t_table t
left join unnest(t.address) as a(address) on true
group by a.address
), shared_users as (
select u.address,
array(select distinct ul.uid
from userlist u2, unnest(u2.users) as ul(uid)
where u.users && u2.users
order by ul.uid) as users
from userlist u
)
select users, array_agg(distinct address)
from shared_users
group by users;
它有什么作用?
第一个 CTE 收集共享至少一个地址的所有用户。 userlist
CTE 的输出是:
users | address
------+--------------
{1} | 95.155.38.120
{1,3} | 94.134.88.136
{1,2} | 44.154.48.125
{6} | 1.1.0.9
{4,5} | 127.0.0.1
{1} | 81.134.82.111
{5} | 5.5.5.5
现在这可用于聚合那些共享至少一个地址的用户列表。 shared_users
CTE 的输出是:
address | users
--------------+--------
95.155.38.120 | {1,2,3}
94.134.88.136 | {1,2,3}
44.154.48.125 | {1,2,3}
1.1.0.9 | {6}
127.0.0.1 | {4,5}
81.134.82.111 | {1,2,3}
5.5.5.5 | {4,5}
如您所见,我们现在有具有相同 usr_ids 列表的组。在最后一步中,我们可以按这些分组并聚合地址,然后 return:
users | array_agg
--------+----------------------------------------------------------
{1,2,3} | {44.154.48.125,81.134.82.111,94.134.88.136,95.155.38.120}
{4,5} | {127.0.0.1,5.5.5.5}
{6} | {1.1.0.9}