朋友分析之PLSQL递归查询
PLSQL recursive query for friend analysis
我有一个社交网络 table。 table 的名称是 RELATION_TABLE。
我有三列。 userid_1、userid_2、关系类型代码(如密友、家庭成员、熟人、大学朋友等)
Table 结构和示例记录:
DROP table RELATION_TABLE;
create table RELATION_TABLE
(
USER_ID_1 NUMBER,
USER_ID_2 NUMBER,
RELATION_TYPE_CODE VARCHAR2(100)
);
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(1,2,'CLOSE FRIEND');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(4,1,'HIGH SCHOOL FRIEND');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(5,2,'FAMILY MEMBER');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(1,6,'COLLEAGUE');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(3,4,'PARTNER');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(3,6,'COLLEAGUE');
COMMIT;
示例记录:
USER_ID_1 USER_ID_2 RELATION_TYPE_CODE
1 2 CLOSE FRIEND
4 1 HIGH SCHOOL FRIEND
5 2 FAMILY MEMBER
1 6 COLLEAGUE
3 4 WIFE
3 6 COLLEAGUE
根据样本records:user:
1 与 4 有关系,4 与 3 有关系,最后 4 与 6 有关系,所以 1 可能与 4,3 和 6 有关系。
所以我需要写一个递归查询来插入所有可能的关系。
我尝试使用 connect by prior 但是没有像父子关系这样的直接关系。 USER_ID_1 列或 USER_ID_2 列中可以存在任何用户标识。可能有循环,我也需要忽略这些循环。
你有什么方法建议吗?
谢谢
由于您的数据集对关系没有方向性,如果您想获得所有传递关系,您需要处理以 USIR_ID_1
-> [=21= 开头的关系链] 或 USER_ID_2
-> USER_ID_1
。
正如您提到的,您使用的是 11g,递归子分解可能是您的一个选择,但由于直到 11gR2 才出现,我将在本例中避免使用 CONNECT BY
.
在您的示例中,您希望整体获得 user # 1
到 users # 3,4,6
的关系记录。 (并且可能是用户 # 2,来自您包含的 CLOSE FRIEND
关系)
要 assemble 这些关系,人们可能会首先尝试从 USER_ID_1
根关系树的组合查询开始,再加上 USER_ID_2
根关系树 NOCYCLE
忽略循环(但这行不通):
SELECT
CONNECT_BY_ROOT USER_ID_2 AS STARTING_USER_ID,
USER_ID_1 AS RELATED_USER_ID
FROM RELATION_TABLE
START WITH USER_ID_2 = 1
CONNECT BY NOCYCLE PRIOR USER_ID_1 = USER_ID_2
UNION
SELECT
CONNECT_BY_ROOT USER_ID_1 AS STARTING_USER_ID,
USER_ID_2 AS RELATED_USER_ID
FROM RELATION_TABLE
START WITH USER_ID_1 = 1
CONNECT BY NOCYCLE PRIOR USER_ID_2 = USER_ID_1
ORDER BY 1 ASC, 2 ASC;
结果:
STARTING_USER_ID RELATED_USER_ID
1 2
1 3
1 4
1 6
这看起来很接近(它有你提到的三个关系,加上 user # 1
在 USER_ID_1
一侧的 1 -> 2
关系)
但仔细观察数据,会发现缺少关系。看记录的话,user # 2
连着user # 5
,user # 1
连着user # 2
,所以user # 1
也应该连着user # 5
.我相信您在 post 中指出了这一点——没有直接的父子关系(网络没有方向性,但查询有方向性)
为了解决这个问题,一种(低效的)方法是查询 a -> b
和 b -> a
关系的组合集——将数据集加倍,这样分层查询就可以继续进行关系是定向的。
在下面的查询中,user # 1
现在可以通过 user # 2
导航以连接到 user # 5
。此查询的一个副作用是它创建了必须删除的人为自我关系。在提供的示例中,存在 UNION ALL
以添加真实的自我关系。
为了 space,我将在这里使用 LISTAGG
来压缩结果。
WITH PSEUDO_DIRECTED_RELATION AS (
SELECT
USER_ID_1 AS LEFT_ID,
USER_ID_2 AS RIGHT_ID
FROM RELATION_TABLE
UNION
SELECT
USER_ID_2 AS LEFT_ID,
USER_ID_1 AS RIGHT_ID
FROM RELATION_TABLE)
SELECT STARTING_ID, LISTAGG(RELATED_ID,',') WITHIN GROUP (ORDER BY RELATED_ID ASC) AS RELATED_USERS
FROM (
SELECT
DISTINCT
CONNECT_BY_ROOT RIGHT_ID AS STARTING_ID,
LEFT_ID AS RELATED_ID
FROM PSEUDO_DIRECTED_RELATION
WHERE LEFT_ID <> CONNECT_BY_ROOT RIGHT_ID
START WITH RIGHT_ID = 1
CONNECT BY NOCYCLE PRIOR LEFT_ID = RIGHT_ID
UNION ALL
SELECT
USER_ID_1 AS STARTING_ID,
USER_ID_2 AS RELATED_ID
FROM RELATION_TABLE
WHERE USER_ID_1 = USER_ID_2
AND USER_ID_1 = 1)
GROUP BY STARTING_ID
ORDER BY 1 ASC;
结果:
STARTING_ID RELATED_USERS
1 2,3,4,5,6
现在 user # 1
通过 user # 2
连接到 user # 5
。
但也许这只是将所有内容链接到所有内容,所以让我们添加更多数据:
INSERT INTO RELATION_TABLE VALUES (7,9,'Siblings');
INSERT INTO RELATION_TABLE VALUES (7,13,'Pen Pals');
INSERT INTO RELATION_TABLE VALUES (22,7,'Colleagues');
并重新运行上面的查询。 user # 1
不应(完全)与 user # 7
相关。
STARTING_ID RELATED_USERS
1 2,3,4,5,6
现在,如果我们将用户 #7 与其自身相关联
INSERT INTO RELATION_TABLE VALUES (7,7,'Self');
并重新运行 定位 user # 7
而不是 user # 1
(更改 START WITH
等):
STARTING_ID RELATED_USERS
7 7,9,13,22
如果不想查询单个root用户,可以去掉START WITH
和self-relation predicate。
WITH PSEUDO_DIRECTED_RELATION AS (
SELECT
USER_ID_1 AS LEFT_ID,
USER_ID_2 AS RIGHT_ID
FROM RELATION_TABLE
UNION
SELECT
USER_ID_2 AS LEFT_ID,
USER_ID_1 AS RIGHT_ID
FROM RELATION_TABLE)
SELECT STARTING_ID, LISTAGG(RELATED_ID,',') WITHIN GROUP (ORDER BY RELATED_ID ASC) AS RELATED_USERS
FROM (
SELECT
DISTINCT
CONNECT_BY_ROOT RIGHT_ID AS STARTING_ID,
LEFT_ID AS RELATED_ID
FROM PSEUDO_DIRECTED_RELATION
WHERE LEFT_ID <> CONNECT_BY_ROOT RIGHT_ID
CONNECT BY NOCYCLE PRIOR LEFT_ID = RIGHT_ID
UNION ALL
SELECT
USER_ID_1 AS STARTING_ID,
USER_ID_2 AS RELATED_ID
FROM RELATION_TABLE
WHERE USER_ID_1 = USER_ID_2)
GROUP BY STARTING_ID
ORDER BY 1 ASC, 2 ASC;
显示每个用户的所有可传递相关用户的结果:
STARTING_ID RELATED_USERS
1 2,3,4,5,6
2 1,3,4,5,6
3 1,2,4,5,6
4 1,2,3,5,6
5 1,2,3,4,6
6 1,2,3,4,5
7 7,9,13,22
9 7,13,22
13 7,9,22
22 7,9,13
有版本。
WITH m AS
(SELECT USER_ID_1 u1, USER_ID_2 u2 FROM RELATION_TABLE
UNION
SELECT USER_ID_2, USER_ID_1 FROM RELATION_TABLE),
recur (usr, fri) AS
(SELECT u1, u1 FROM m
UNION ALL
SELECT r.usr, u2 FROM recur r, m WHERE r.fri = m.u1)
CYCLE fri SET cycle TO 1 DEFAULT 0
SELECT usr,
listagg(fri, ',') within GROUP (ORDER BY fri) friends
FROM (SELECT DISTINCT usr, fri FROM recur WHERE usr != fri AND cycle = 0)
GROUP BY usr;
我有一个社交网络 table。 table 的名称是 RELATION_TABLE。 我有三列。 userid_1、userid_2、关系类型代码(如密友、家庭成员、熟人、大学朋友等)
Table 结构和示例记录:
DROP table RELATION_TABLE;
create table RELATION_TABLE
(
USER_ID_1 NUMBER,
USER_ID_2 NUMBER,
RELATION_TYPE_CODE VARCHAR2(100)
);
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(1,2,'CLOSE FRIEND');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(4,1,'HIGH SCHOOL FRIEND');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(5,2,'FAMILY MEMBER');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(1,6,'COLLEAGUE');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(3,4,'PARTNER');
INSERT INTO RELATION_TABLE(USER_ID_1,USER_ID_2,RELATION_TYPE_CODE)
VALUES(3,6,'COLLEAGUE');
COMMIT;
示例记录:
USER_ID_1 USER_ID_2 RELATION_TYPE_CODE
1 2 CLOSE FRIEND
4 1 HIGH SCHOOL FRIEND
5 2 FAMILY MEMBER
1 6 COLLEAGUE
3 4 WIFE
3 6 COLLEAGUE
根据样本records:user: 1 与 4 有关系,4 与 3 有关系,最后 4 与 6 有关系,所以 1 可能与 4,3 和 6 有关系。
所以我需要写一个递归查询来插入所有可能的关系。 我尝试使用 connect by prior 但是没有像父子关系这样的直接关系。 USER_ID_1 列或 USER_ID_2 列中可以存在任何用户标识。可能有循环,我也需要忽略这些循环。
你有什么方法建议吗?
谢谢
由于您的数据集对关系没有方向性,如果您想获得所有传递关系,您需要处理以 USIR_ID_1
-> [=21= 开头的关系链] 或 USER_ID_2
-> USER_ID_1
。
正如您提到的,您使用的是 11g,递归子分解可能是您的一个选择,但由于直到 11gR2 才出现,我将在本例中避免使用 CONNECT BY
.
在您的示例中,您希望整体获得 user # 1
到 users # 3,4,6
的关系记录。 (并且可能是用户 # 2,来自您包含的 CLOSE FRIEND
关系)
要 assemble 这些关系,人们可能会首先尝试从 USER_ID_1
根关系树的组合查询开始,再加上 USER_ID_2
根关系树 NOCYCLE
忽略循环(但这行不通):
SELECT
CONNECT_BY_ROOT USER_ID_2 AS STARTING_USER_ID,
USER_ID_1 AS RELATED_USER_ID
FROM RELATION_TABLE
START WITH USER_ID_2 = 1
CONNECT BY NOCYCLE PRIOR USER_ID_1 = USER_ID_2
UNION
SELECT
CONNECT_BY_ROOT USER_ID_1 AS STARTING_USER_ID,
USER_ID_2 AS RELATED_USER_ID
FROM RELATION_TABLE
START WITH USER_ID_1 = 1
CONNECT BY NOCYCLE PRIOR USER_ID_2 = USER_ID_1
ORDER BY 1 ASC, 2 ASC;
结果:
STARTING_USER_ID RELATED_USER_ID
1 2
1 3
1 4
1 6
这看起来很接近(它有你提到的三个关系,加上 user # 1
在 USER_ID_1
一侧的 1 -> 2
关系)
但仔细观察数据,会发现缺少关系。看记录的话,user # 2
连着user # 5
,user # 1
连着user # 2
,所以user # 1
也应该连着user # 5
.我相信您在 post 中指出了这一点——没有直接的父子关系(网络没有方向性,但查询有方向性)
为了解决这个问题,一种(低效的)方法是查询 a -> b
和 b -> a
关系的组合集——将数据集加倍,这样分层查询就可以继续进行关系是定向的。
在下面的查询中,user # 1
现在可以通过 user # 2
导航以连接到 user # 5
。此查询的一个副作用是它创建了必须删除的人为自我关系。在提供的示例中,存在 UNION ALL
以添加真实的自我关系。
为了 space,我将在这里使用 LISTAGG
来压缩结果。
WITH PSEUDO_DIRECTED_RELATION AS (
SELECT
USER_ID_1 AS LEFT_ID,
USER_ID_2 AS RIGHT_ID
FROM RELATION_TABLE
UNION
SELECT
USER_ID_2 AS LEFT_ID,
USER_ID_1 AS RIGHT_ID
FROM RELATION_TABLE)
SELECT STARTING_ID, LISTAGG(RELATED_ID,',') WITHIN GROUP (ORDER BY RELATED_ID ASC) AS RELATED_USERS
FROM (
SELECT
DISTINCT
CONNECT_BY_ROOT RIGHT_ID AS STARTING_ID,
LEFT_ID AS RELATED_ID
FROM PSEUDO_DIRECTED_RELATION
WHERE LEFT_ID <> CONNECT_BY_ROOT RIGHT_ID
START WITH RIGHT_ID = 1
CONNECT BY NOCYCLE PRIOR LEFT_ID = RIGHT_ID
UNION ALL
SELECT
USER_ID_1 AS STARTING_ID,
USER_ID_2 AS RELATED_ID
FROM RELATION_TABLE
WHERE USER_ID_1 = USER_ID_2
AND USER_ID_1 = 1)
GROUP BY STARTING_ID
ORDER BY 1 ASC;
结果:
STARTING_ID RELATED_USERS
1 2,3,4,5,6
现在 user # 1
通过 user # 2
连接到 user # 5
。
但也许这只是将所有内容链接到所有内容,所以让我们添加更多数据:
INSERT INTO RELATION_TABLE VALUES (7,9,'Siblings');
INSERT INTO RELATION_TABLE VALUES (7,13,'Pen Pals');
INSERT INTO RELATION_TABLE VALUES (22,7,'Colleagues');
并重新运行上面的查询。 user # 1
不应(完全)与 user # 7
相关。
STARTING_ID RELATED_USERS
1 2,3,4,5,6
现在,如果我们将用户 #7 与其自身相关联
INSERT INTO RELATION_TABLE VALUES (7,7,'Self');
并重新运行 定位 user # 7
而不是 user # 1
(更改 START WITH
等):
STARTING_ID RELATED_USERS
7 7,9,13,22
如果不想查询单个root用户,可以去掉START WITH
和self-relation predicate。
WITH PSEUDO_DIRECTED_RELATION AS (
SELECT
USER_ID_1 AS LEFT_ID,
USER_ID_2 AS RIGHT_ID
FROM RELATION_TABLE
UNION
SELECT
USER_ID_2 AS LEFT_ID,
USER_ID_1 AS RIGHT_ID
FROM RELATION_TABLE)
SELECT STARTING_ID, LISTAGG(RELATED_ID,',') WITHIN GROUP (ORDER BY RELATED_ID ASC) AS RELATED_USERS
FROM (
SELECT
DISTINCT
CONNECT_BY_ROOT RIGHT_ID AS STARTING_ID,
LEFT_ID AS RELATED_ID
FROM PSEUDO_DIRECTED_RELATION
WHERE LEFT_ID <> CONNECT_BY_ROOT RIGHT_ID
CONNECT BY NOCYCLE PRIOR LEFT_ID = RIGHT_ID
UNION ALL
SELECT
USER_ID_1 AS STARTING_ID,
USER_ID_2 AS RELATED_ID
FROM RELATION_TABLE
WHERE USER_ID_1 = USER_ID_2)
GROUP BY STARTING_ID
ORDER BY 1 ASC, 2 ASC;
显示每个用户的所有可传递相关用户的结果:
STARTING_ID RELATED_USERS
1 2,3,4,5,6
2 1,3,4,5,6
3 1,2,4,5,6
4 1,2,3,5,6
5 1,2,3,4,6
6 1,2,3,4,5
7 7,9,13,22
9 7,13,22
13 7,9,22
22 7,9,13
有版本。
WITH m AS
(SELECT USER_ID_1 u1, USER_ID_2 u2 FROM RELATION_TABLE
UNION
SELECT USER_ID_2, USER_ID_1 FROM RELATION_TABLE),
recur (usr, fri) AS
(SELECT u1, u1 FROM m
UNION ALL
SELECT r.usr, u2 FROM recur r, m WHERE r.fri = m.u1)
CYCLE fri SET cycle TO 1 DEFAULT 0
SELECT usr,
listagg(fri, ',') within GROUP (ORDER BY fri) friends
FROM (SELECT DISTINCT usr, fri FROM recur WHERE usr != fri AND cycle = 0)
GROUP BY usr;