SQL:select 个特定的三合会用户

SQL: select specific triads of users

给定 table 个用户:

User(id INT, username VARCHAR(30))

以及它们之间的定向关系:

Following(follower_id INT, followee_id INT)

我需要一个 SELECT 用于所有独特的三合会用户,例如:

A   follows   B
B   follows   A
A   follows   C
C not follows A
B not follows C
C   follows   B

我正在使用 SQLite 数据库并使用 Python。手头有一个 SELECT 上面的例子,我可能会很快完成我想要的所有其他三合会。这些基本上是用户三元组中定向连接的所有可能组合。

这有点复杂,但你可以做到:

with pairs as (
      select f1.followee_id, f1.follower_id
      from following f1 join
           following f2
           on f1.follower_id = f2.followee_id and
              f1.followee_id = f2.follower_id
     )
select p1.followee as A, p1.follower as B, p3.followee as C
from pairs p1 join
     pairs p2
     on p1.followee_id = p2.followee_id join
     pairs p3
     on p3.followee_id = p1.follower_id and
        p3.follower_id = p2.follower_id;

想法是 pairs 获得相互关注的成对用户。然后再找其他加第三人称的对

另一种方法是生成所有组合,然后选择匹配的组合:

select a.id, b.id, c.id
from users a join
     users b
     on a.id < b.id join
     users c
     on b.id < c.id
where exists (select 1 from following f where f.follower_id = a.id and f.followee_id = b.id) and
      exists (select 1 from following f where f.follower_id = b.id and f.followee_id = a.id) and
      exists (select 1 from following f where f.follower_id = a.id and f.followee_id = c.id) and
      exists (select 1 from following f where f.follower_id = c.id and f.followee_id = a.id) and
      exists (select 1 from following f where f.follower_id = b.id and f.followee_id = c.id) and
      exists (select 1 from following f where f.follower_id = c.id and f.followee_id = b.id);

如果您在 table 上设置了合理的索引,这个版本实际上可能有更好的性能。

编辑:

为了性能,following table 应该在 follower_id, followee_id 上有索引——这是一个包含两列的复合索引。

SELECT ab.follower_id AS a_id,
       ab.followee_id AS b_id,
       ac.followee_id AS c_id
FROM following AS ab
JOIN following AS ba ON ab.followee_id = ba.follower_id
                    AND ab.follower_id = ba.followee_id
JOIN following AS ac ON ab.follower_id = ac.follower_id
JOIN following AS cb ON ac.followee_id = cb.follower_id
                    AND ab.followee_id = cb.followee_id
LEFT OUTER JOIN following AS ca ON ac.followee_id = ca.follower_id
                               AND ac.follower_id = ca.followee_id
LEFT OUTER JOIN following AS bc ON cb.followee_id = bc.follower_id
                               AND cb.follower_id = bc.followee_id
WHERE ab.follower_id < ab.followee_id
  AND ab.followee_id < ac.followee_id
  AND ca.follower_id IS NULL
  AND bc.follower_id IS NULL

处理 300 万条记录时,执行时间为 30 秒,而 Gordon 提出的 EXISTs 版本执行时间为 45k 秒。