朋友分析之PLSQL递归查询

PLSQL recursive query for friend analysis

我有一个社交网络 table。 table 的名称是 RELATION_TABLE。 我有三列。 userid_1、userid_2、关系类型代码(如密友、家庭成员、熟人、大学朋友等)

Table 结构和示例记录:

DROP table RELATION_TABLE;
create table RELATION_TABLE
(
    USER_ID_1 NUMBER,
    USER_ID_2 NUMBER,
    RELATION_TYPE_CODE VARCHAR2(100) 
);

INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE) 
VALUES(1,2,'CLOSE FRIEND');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE) 
VALUES(4,1,'HIGH SCHOOL FRIEND');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE) 
VALUES(5,2,'FAMILY MEMBER');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE) 
VALUES(1,6,'COLLEAGUE');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE) 
VALUES(3,4,'PARTNER');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE) 
VALUES(3,6,'COLLEAGUE');
COMMIT;

示例记录:

USER_ID_1    USER_ID_2    RELATION_TYPE_CODE
1              2           CLOSE FRIEND
4              1           HIGH SCHOOL FRIEND
5              2           FAMILY MEMBER
1              6           COLLEAGUE
3              4           WIFE
3              6           COLLEAGUE

根据样本records:user: 1 与 4 有关系,4 与 3 有关系,最后 4 与 6 有关系,所以 1 可能与 4,3 和 6 有关系。

所以我需要写一个递归查询来插入所有可能的关系。 我尝试使用 connect by prior 但是没有像父子关系这样的直接关系。 USER_ID_1 列或 USER_ID_2 列中可以存在任何用户标识。可能有循环,我也需要忽略这些循环。

你有什么方法建议吗?

谢谢

由于您的数据集对关系没有方向性,如果您想获得所有传递关系,您需要处理以 USIR_ID_1 -> [=21= 开头的关系链] 或 USER_ID_2 -> USER_ID_1

正如您提到的,您使用的是 11g,递归子分解可能是您的一个选择,但由于直到 11gR2 才出现,我将在本例中避免使用 CONNECT BY.

在您的示例中,您希望整体获得 user # 1users # 3,4,6 的关系记录。 (并且可能是用户 # 2,来自您包含的 CLOSE FRIEND 关系)
要 assemble 这些关系,人们可能会首先尝试从 USER_ID_1 根关系树的组合查询开始,再加上 USER_ID_2 根关系树 NOCYCLE 忽略循环(但这行不通):

SELECT
  CONNECT_BY_ROOT USER_ID_2 AS STARTING_USER_ID,
  USER_ID_1                 AS RELATED_USER_ID
FROM RELATION_TABLE
START WITH USER_ID_2 = 1
CONNECT BY NOCYCLE PRIOR USER_ID_1 = USER_ID_2
UNION
SELECT
  CONNECT_BY_ROOT USER_ID_1 AS STARTING_USER_ID,
  USER_ID_2                 AS RELATED_USER_ID
FROM RELATION_TABLE
START WITH USER_ID_1 = 1
CONNECT BY NOCYCLE PRIOR USER_ID_2 = USER_ID_1
ORDER BY 1 ASC, 2 ASC;

结果:

STARTING_USER_ID  RELATED_USER_ID  
1                 2                
1                 3                
1                 4                
1                 6            

这看起来很接近(它有你提到的三个关系,加上 user # 1USER_ID_1 一侧的 1 -> 2 关系)

但仔细观察数据,会发现缺少关系。看记录的话,user # 2连着user # 5user # 1连着user # 2,所以user # 1也应该连着user # 5.我相信您在 post 中指出了这一点——没有直接的父子关系(网络没有方向性,但查询有方向性)

为了解决这个问题,一种(低效的)方法是查询 a -> bb -> a 关系的组合集——将数据集加倍,这样分层查询就可以继续进行关系是定向的。

在下面的查询中,user # 1 现在可以通过 user # 2 导航以连接到 user # 5。此查询的一个副作用是它创建了必须删除的人为自我关系。在提供的示例中,存在 UNION ALL 以添加真实的自我关系。

为了 space,我将在这里使用 LISTAGG 来压缩结果。

WITH PSEUDO_DIRECTED_RELATION AS (
  SELECT
    USER_ID_1 AS LEFT_ID,
    USER_ID_2 AS RIGHT_ID
  FROM RELATION_TABLE
  UNION
  SELECT
    USER_ID_2 AS LEFT_ID,
    USER_ID_1 AS RIGHT_ID
  FROM RELATION_TABLE)
SELECT STARTING_ID, LISTAGG(RELATED_ID,',') WITHIN GROUP (ORDER BY RELATED_ID ASC) AS RELATED_USERS
FROM (
SELECT
  DISTINCT
  CONNECT_BY_ROOT RIGHT_ID AS STARTING_ID,
  LEFT_ID                  AS RELATED_ID
FROM PSEUDO_DIRECTED_RELATION
WHERE LEFT_ID <> CONNECT_BY_ROOT RIGHT_ID
START WITH RIGHT_ID = 1
CONNECT BY NOCYCLE PRIOR LEFT_ID = RIGHT_ID
UNION ALL
SELECT
  USER_ID_1 AS STARTING_ID,
  USER_ID_2 AS RELATED_ID
FROM RELATION_TABLE
WHERE USER_ID_1 = USER_ID_2
      AND USER_ID_1 = 1)
  GROUP BY STARTING_ID
ORDER BY 1 ASC;

结果:

STARTING_ID  RELATED_USERS  
1            2,3,4,5,6      

现在 user # 1 通过 user # 2 连接到 user # 5
但也许这只是将所有内容链接到所有内容,所以让我们添加更多数据:

INSERT INTO RELATION_TABLE VALUES (7,9,'Siblings');
INSERT INTO RELATION_TABLE VALUES (7,13,'Pen Pals');
INSERT INTO RELATION_TABLE VALUES (22,7,'Colleagues');

并重新运行上面的查询。 user # 1 不应(完全)与 user # 7 相关。

STARTING_ID  RELATED_USERS  
1            2,3,4,5,6      

现在,如果我们将用户 #7 与其自身相关联

INSERT INTO RELATION_TABLE VALUES (7,7,'Self');

并重新运行 定位 user # 7 而不是 user # 1(更改 START WITH 等):

STARTING_ID  RELATED_USERS  
7            7,9,13,22      

如果不想查询单个root用户,可以去掉START WITH和self-relation predicate。

WITH PSEUDO_DIRECTED_RELATION AS (
  SELECT
    USER_ID_1 AS LEFT_ID,
    USER_ID_2 AS RIGHT_ID
  FROM RELATION_TABLE
  UNION
  SELECT
    USER_ID_2 AS LEFT_ID,
    USER_ID_1 AS RIGHT_ID
  FROM RELATION_TABLE)
SELECT STARTING_ID, LISTAGG(RELATED_ID,',') WITHIN GROUP (ORDER BY RELATED_ID ASC) AS RELATED_USERS
  FROM (
SELECT
  DISTINCT
  CONNECT_BY_ROOT RIGHT_ID AS STARTING_ID,
  LEFT_ID                  AS RELATED_ID
FROM PSEUDO_DIRECTED_RELATION
WHERE LEFT_ID <> CONNECT_BY_ROOT RIGHT_ID
CONNECT BY NOCYCLE PRIOR LEFT_ID = RIGHT_ID
UNION ALL
SELECT
  USER_ID_1 AS STARTING_ID,
  USER_ID_2 AS RELATED_ID
FROM RELATION_TABLE
WHERE USER_ID_1 = USER_ID_2)
  GROUP BY STARTING_ID
ORDER BY 1 ASC, 2 ASC;

显示每个用户的所有可传递相关用户的结果:

STARTING_ID  RELATED_USERS  
1            2,3,4,5,6      
2            1,3,4,5,6      
3            1,2,4,5,6      
4            1,2,3,5,6      
5            1,2,3,4,6      
6            1,2,3,4,5      
7            7,9,13,22      
9            7,13,22        
13           7,9,22         
22           7,9,13      

有版本。

WITH m AS
          (SELECT USER_ID_1 u1, USER_ID_2 u2 FROM RELATION_TABLE
           UNION
           SELECT USER_ID_2, USER_ID_1 FROM RELATION_TABLE),
     recur (usr, fri) AS
          (SELECT u1, u1 FROM m
           UNION ALL
           SELECT r.usr, u2 FROM recur r, m WHERE r.fri = m.u1)
           CYCLE fri SET cycle TO 1 DEFAULT 0
SELECT    usr,
         listagg(fri, ',') within GROUP (ORDER BY fri) friends
FROM (SELECT DISTINCT usr, fri FROM recur WHERE usr != fri AND cycle = 0)
GROUP BY  usr;