SQL: psql 中的逻辑问题
SQL: Logic problems in psql
我正在尝试开发一个查询来识别具有多个客户端 ID 的客户端。客户 ID 是第 1 列,第 19 和 20 列包含唯一的个人标识符,您可以将它们视为某种社会安全号码(我们称它们为 SSN.19 和 SSN.20)
我的第一个想法是查找具有匹配 SSN 但客户端 ID 不同的每一行,如下所示:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE a."19"=b."19" and a."20"=b."20" and a."1"<b."1" and a."1"='Value';
但是,它返回了 0 行。为了检查 table 是否确实没有重复项,我执行了以下查询:
select distinct "19" as hk, count("19") as dl from "clients_1" group by "19" order by dl desc;
select distinct "20" as hk, count("20") as dl from "clients_1" group by "20" order by dl desc;
事实证明,在这个特定的 table 上,没有客户端关联 SSN19,但 table 中有几个重复的 SSN20。所以我执行了以下查询来查找具有多个 ID 的客户:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE a."20"=b."20" and a."1"<b."1" and a."7"='Value';
此人返回了 table,其中包含多个 ID 不同但 SSN20 相同的客户端。之后,我开始考虑一种方法,可以将此查询概括为客户同时拥有 SSN19 和 SSN20 或其中之一的情况,所以我想到了以下几点:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE ((a."19"=b."19" and a."19" is not null) or (a."20"=b."20" and a."20" is not null)) and a."1"<b."1" and a."7"='Value';
但是,这个查询需要很长时间,我查询 运行 大约 20 分钟,但没有任何结果,而之前的尝试最多用了大约 2 分钟。我究竟做错了什么?
我相信像这样的东西会更好地表现并给你更多的灵活性:
SELECT
*
FROM
(
SELECT
COUNT(*) OVER (PARTITION BY "19") as 19_matches,
COUNT(*) OVER (PARTITION BY "20") as 20_matches,
COUNT(*) OVER (PARTITION BY "19","20") as both_matches,
clients_1.*
FROM
clients_1
WHERE "7" = 'value'
)
WHERE 19_matches > 1 OR 20_matches > 1 or both_matches > 1
ORDER BY "19","20"
除了丑陋的列名,这只是WHERE EXISTS(a similar record)
解决方案:
SELECT *
FROM clients_1 AS a
WHERE EXISTS(
SELECT* FROM clients_1 AS b
WHERE (a."19" = b."19" OR a."20" = b."20" )
AND a."1" <> b."1"
);
我正在尝试开发一个查询来识别具有多个客户端 ID 的客户端。客户 ID 是第 1 列,第 19 和 20 列包含唯一的个人标识符,您可以将它们视为某种社会安全号码(我们称它们为 SSN.19 和 SSN.20)
我的第一个想法是查找具有匹配 SSN 但客户端 ID 不同的每一行,如下所示:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE a."19"=b."19" and a."20"=b."20" and a."1"<b."1" and a."1"='Value';
但是,它返回了 0 行。为了检查 table 是否确实没有重复项,我执行了以下查询:
select distinct "19" as hk, count("19") as dl from "clients_1" group by "19" order by dl desc;
select distinct "20" as hk, count("20") as dl from "clients_1" group by "20" order by dl desc;
事实证明,在这个特定的 table 上,没有客户端关联 SSN19,但 table 中有几个重复的 SSN20。所以我执行了以下查询来查找具有多个 ID 的客户:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE a."20"=b."20" and a."1"<b."1" and a."7"='Value';
此人返回了 table,其中包含多个 ID 不同但 SSN20 相同的客户端。之后,我开始考虑一种方法,可以将此查询概括为客户同时拥有 SSN19 和 SSN20 或其中之一的情况,所以我想到了以下几点:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE ((a."19"=b."19" and a."19" is not null) or (a."20"=b."20" and a."20" is not null)) and a."1"<b."1" and a."7"='Value';
但是,这个查询需要很长时间,我查询 运行 大约 20 分钟,但没有任何结果,而之前的尝试最多用了大约 2 分钟。我究竟做错了什么?
我相信像这样的东西会更好地表现并给你更多的灵活性:
SELECT
*
FROM
(
SELECT
COUNT(*) OVER (PARTITION BY "19") as 19_matches,
COUNT(*) OVER (PARTITION BY "20") as 20_matches,
COUNT(*) OVER (PARTITION BY "19","20") as both_matches,
clients_1.*
FROM
clients_1
WHERE "7" = 'value'
)
WHERE 19_matches > 1 OR 20_matches > 1 or both_matches > 1
ORDER BY "19","20"
除了丑陋的列名,这只是WHERE EXISTS(a similar record)
解决方案:
SELECT *
FROM clients_1 AS a
WHERE EXISTS(
SELECT* FROM clients_1 AS b
WHERE (a."19" = b."19" OR a."20" = b."20" )
AND a."1" <> b."1"
);