在 PostgreSQL 12 上使用 WITH 查找唯一值、计算重复项并对它们进行排名
Find unique values, count duplicates and rank them using WITH on PostgreSQL 12
我有 3 个复杂的 table。对于这个问题,我将简化用法。我需要排名、计数(欺骗)和唯一记录(结果)。它适用于单个 table,但是,当包含另一个 WITH
并给出 INNER JOIN
时,我不再获得任何记录。
表格:
CREATE TABLE public.emails (
id bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
(MAXVALUE 9223372036854775807),
sender jsonb NOT NULL
);
CREATE TABLE public.contacts (
id bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
(MAXVALUE 9223372036854775807),
email text NOT NULL,
full_name text NOT NULL
);
-- sample data
insert into emails (sender) VALUES ('{"email": "dennis1@example.com", "name": "dennis1"}');
insert into emails (sender) VALUES ('{"email": "dennis1@example.com", "name": "dennis1"}');
insert into contacts (email, full_name) VALUES ('dennis1@example.com', 'dennis1');
insert into contacts (email, full_name) VALUES ('dennis1@example.com', 'dennis1');
insert into contacts (email, full_name) VALUES ('dennis5@example.com', 'dennis5');
insert into contacts (email, full_name) VALUES ('john@example.com', 'john');
预期结果:
email name rk count
dennis1@example.com dennis1 1 4
dennis5@example.com dennis5 1 1
john@example.com john 1 1
但是,我遇到了 2 个问题:
INNER JOIN
结果为零
ORDER BY "count"
无效。
我需要什么?
如您所见,table是不同的。一个 table 有 jsonb
列,另一个存储为 text
。所以,我分别提取每个 SELECT
查询中的那些,然后进行比较。
所以我需要的是,获取所有的电子邮件和姓名,唯一化它们,如果它们重复并进行排名则计算它们。我不需要重复条目,但将它们合并到 count
.
我该如何解决这个问题?
演示
在此处查看演示:https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b79700f74bbf14e190d5f5bf7fcd0670
提取 json 并在分组和应用 window 函数之前合并两个数据集。
WITH united as (
SELECT email, full_name FROM contacts
UNION ALL
SELECT sender->>'email', sender->>'name' FROM emails
)
SELECT
email
, full_name
, count(*) count, row_number() over (partition by email) rk
FROM united
GROUP BY 1, 2;
email | full_name | count | rk
---------------------+-----------+-------+----
dennis1@example.com | dennis1 | 4 | 1
dennis5@example.com | dennis5 | 1 | 1
john@example.com | john | 1 | 1
(3 rows)
我有 3 个复杂的 table。对于这个问题,我将简化用法。我需要排名、计数(欺骗)和唯一记录(结果)。它适用于单个 table,但是,当包含另一个 WITH
并给出 INNER JOIN
时,我不再获得任何记录。
表格:
CREATE TABLE public.emails (
id bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
(MAXVALUE 9223372036854775807),
sender jsonb NOT NULL
);
CREATE TABLE public.contacts (
id bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
(MAXVALUE 9223372036854775807),
email text NOT NULL,
full_name text NOT NULL
);
-- sample data
insert into emails (sender) VALUES ('{"email": "dennis1@example.com", "name": "dennis1"}');
insert into emails (sender) VALUES ('{"email": "dennis1@example.com", "name": "dennis1"}');
insert into contacts (email, full_name) VALUES ('dennis1@example.com', 'dennis1');
insert into contacts (email, full_name) VALUES ('dennis1@example.com', 'dennis1');
insert into contacts (email, full_name) VALUES ('dennis5@example.com', 'dennis5');
insert into contacts (email, full_name) VALUES ('john@example.com', 'john');
预期结果:
email name rk count
dennis1@example.com dennis1 1 4
dennis5@example.com dennis5 1 1
john@example.com john 1 1
但是,我遇到了 2 个问题:
INNER JOIN
结果为零ORDER BY "count"
无效。
我需要什么?
如您所见,table是不同的。一个 table 有 jsonb
列,另一个存储为 text
。所以,我分别提取每个 SELECT
查询中的那些,然后进行比较。
所以我需要的是,获取所有的电子邮件和姓名,唯一化它们,如果它们重复并进行排名则计算它们。我不需要重复条目,但将它们合并到 count
.
我该如何解决这个问题?
演示
在此处查看演示:https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b79700f74bbf14e190d5f5bf7fcd0670
提取 json 并在分组和应用 window 函数之前合并两个数据集。
WITH united as (
SELECT email, full_name FROM contacts
UNION ALL
SELECT sender->>'email', sender->>'name' FROM emails
)
SELECT
email
, full_name
, count(*) count, row_number() over (partition by email) rk
FROM united
GROUP BY 1, 2;
email | full_name | count | rk
---------------------+-----------+-------+----
dennis1@example.com | dennis1 | 4 | 1
dennis5@example.com | dennis5 | 1 | 1
john@example.com | john | 1 | 1
(3 rows)