SQL:select 个特定的三合会用户
SQL: select specific triads of users
给定 table 个用户:
User(id INT, username VARCHAR(30))
以及它们之间的定向关系:
Following(follower_id INT, followee_id INT)
我需要一个 SELECT 用于所有独特的三合会用户,例如:
A follows B
B follows A
A follows C
C not follows A
B not follows C
C follows B
我正在使用 SQLite 数据库并使用 Python。手头有一个 SELECT 上面的例子,我可能会很快完成我想要的所有其他三合会。这些基本上是用户三元组中定向连接的所有可能组合。
这有点复杂,但你可以做到:
with pairs as (
select f1.followee_id, f1.follower_id
from following f1 join
following f2
on f1.follower_id = f2.followee_id and
f1.followee_id = f2.follower_id
)
select p1.followee as A, p1.follower as B, p3.followee as C
from pairs p1 join
pairs p2
on p1.followee_id = p2.followee_id join
pairs p3
on p3.followee_id = p1.follower_id and
p3.follower_id = p2.follower_id;
想法是 pairs
获得相互关注的成对用户。然后再找其他加第三人称的对
另一种方法是生成所有组合,然后选择匹配的组合:
select a.id, b.id, c.id
from users a join
users b
on a.id < b.id join
users c
on b.id < c.id
where exists (select 1 from following f where f.follower_id = a.id and f.followee_id = b.id) and
exists (select 1 from following f where f.follower_id = b.id and f.followee_id = a.id) and
exists (select 1 from following f where f.follower_id = a.id and f.followee_id = c.id) and
exists (select 1 from following f where f.follower_id = c.id and f.followee_id = a.id) and
exists (select 1 from following f where f.follower_id = b.id and f.followee_id = c.id) and
exists (select 1 from following f where f.follower_id = c.id and f.followee_id = b.id);
如果您在 table 上设置了合理的索引,这个版本实际上可能有更好的性能。
编辑:
为了性能,following
table 应该在 follower_id, followee_id
上有索引——这是一个包含两列的复合索引。
SELECT ab.follower_id AS a_id,
ab.followee_id AS b_id,
ac.followee_id AS c_id
FROM following AS ab
JOIN following AS ba ON ab.followee_id = ba.follower_id
AND ab.follower_id = ba.followee_id
JOIN following AS ac ON ab.follower_id = ac.follower_id
JOIN following AS cb ON ac.followee_id = cb.follower_id
AND ab.followee_id = cb.followee_id
LEFT OUTER JOIN following AS ca ON ac.followee_id = ca.follower_id
AND ac.follower_id = ca.followee_id
LEFT OUTER JOIN following AS bc ON cb.followee_id = bc.follower_id
AND cb.follower_id = bc.followee_id
WHERE ab.follower_id < ab.followee_id
AND ab.followee_id < ac.followee_id
AND ca.follower_id IS NULL
AND bc.follower_id IS NULL
处理 300 万条记录时,执行时间为 30 秒,而 Gordon 提出的 EXIST
s 版本执行时间为 45k 秒。
给定 table 个用户:
User(id INT, username VARCHAR(30))
以及它们之间的定向关系:
Following(follower_id INT, followee_id INT)
我需要一个 SELECT 用于所有独特的三合会用户,例如:
A follows B
B follows A
A follows C
C not follows A
B not follows C
C follows B
我正在使用 SQLite 数据库并使用 Python。手头有一个 SELECT 上面的例子,我可能会很快完成我想要的所有其他三合会。这些基本上是用户三元组中定向连接的所有可能组合。
这有点复杂,但你可以做到:
with pairs as (
select f1.followee_id, f1.follower_id
from following f1 join
following f2
on f1.follower_id = f2.followee_id and
f1.followee_id = f2.follower_id
)
select p1.followee as A, p1.follower as B, p3.followee as C
from pairs p1 join
pairs p2
on p1.followee_id = p2.followee_id join
pairs p3
on p3.followee_id = p1.follower_id and
p3.follower_id = p2.follower_id;
想法是 pairs
获得相互关注的成对用户。然后再找其他加第三人称的对
另一种方法是生成所有组合,然后选择匹配的组合:
select a.id, b.id, c.id
from users a join
users b
on a.id < b.id join
users c
on b.id < c.id
where exists (select 1 from following f where f.follower_id = a.id and f.followee_id = b.id) and
exists (select 1 from following f where f.follower_id = b.id and f.followee_id = a.id) and
exists (select 1 from following f where f.follower_id = a.id and f.followee_id = c.id) and
exists (select 1 from following f where f.follower_id = c.id and f.followee_id = a.id) and
exists (select 1 from following f where f.follower_id = b.id and f.followee_id = c.id) and
exists (select 1 from following f where f.follower_id = c.id and f.followee_id = b.id);
如果您在 table 上设置了合理的索引,这个版本实际上可能有更好的性能。
编辑:
为了性能,following
table 应该在 follower_id, followee_id
上有索引——这是一个包含两列的复合索引。
SELECT ab.follower_id AS a_id,
ab.followee_id AS b_id,
ac.followee_id AS c_id
FROM following AS ab
JOIN following AS ba ON ab.followee_id = ba.follower_id
AND ab.follower_id = ba.followee_id
JOIN following AS ac ON ab.follower_id = ac.follower_id
JOIN following AS cb ON ac.followee_id = cb.follower_id
AND ab.followee_id = cb.followee_id
LEFT OUTER JOIN following AS ca ON ac.followee_id = ca.follower_id
AND ac.follower_id = ca.followee_id
LEFT OUTER JOIN following AS bc ON cb.followee_id = bc.follower_id
AND cb.follower_id = bc.followee_id
WHERE ab.follower_id < ab.followee_id
AND ab.followee_id < ac.followee_id
AND ca.follower_id IS NULL
AND bc.follower_id IS NULL
处理 300 万条记录时,执行时间为 30 秒,而 Gordon 提出的 EXIST
s 版本执行时间为 45k 秒。