过滤电子邮件和姓名,然后在 PostgreSQL 12 上使用 JSON 在两列中删除重复项
filter emails and names and then de-duplicate in two columns using JSON on PostgreSQL 12
我有 emails
table,其中有 sender
和 reporter
列。我想在这些列中搜索给定参数和 return 个唯一值。
让我用例子来解释。这是我的 table 和记录:
CREATE TABLE public.emails (
id bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
(MAXVALUE 9223372036854775807),
sender jsonb NOT NULL,
reporter jsonb not null
);
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis2@example.com", "name": "dennis1"}]', '[{"email": "john@example.com", "name": "john"}, {"email": "dennis1@example.com", "name": "dennis1"}, {"email": "dennis2@example.com", "name": "dennis2"}]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
我想获取电子邮件地址和姓名。我也想避免上当受骗。只有一封电子邮件和一个名字。我也不想将其作为数组获取,而是每行一封电子邮件和姓名。
正在搜索john
SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
*
FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
WHERE ("e"."email" ~* 'john' or "e"."name" ~* 'john'))
);
john
的预期结果:
email name
john@example.com john
正在搜索 ``(空):
SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
*
FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
WHERE ("e"."email" ~* '' or "e"."name" ~* ''))
);
``(空)的预期结果:
email name
john@example.com john
dennis1@example.com dennis1
dennis2@example.com dennis2
dennis2
在sender
和reporter
中都有,因此只需要其中一个。没有骗子。
事实上,这里有一个问题。如果 sender
或 reporter
列至少有一个 json 对象(不是 json 数组),那么此查询也会失败。
错误:cannot extract elements from an object
那是另外一回事了。
在这种情况下我怎样才能实现我的目标?
演示:https://dbfiddle.uk/?rdbms=postgres_12&fiddle=1bf9c5f83f5104e2392c31984cb4e939
在搜索之前规范化您的数据,然后使用 distinct on ()
子句删除重复数据:
with cte as (select x ->> 'name' as name, x ->> 'email' as email
from emails as e, jsonb_array_elements(e.sender || e.reporter) as x)
select distinct on (email) * from cte where
name ~* '' or email ~* ''
--name ~* 'john' or email ~* 'john'
order by email;
请注意,它将始终扫描整个 table,在这种情况下没有适用的索引。想想schema normalization.
我有 emails
table,其中有 sender
和 reporter
列。我想在这些列中搜索给定参数和 return 个唯一值。
让我用例子来解释。这是我的 table 和记录:
CREATE TABLE public.emails (
id bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
(MAXVALUE 9223372036854775807),
sender jsonb NOT NULL,
reporter jsonb not null
);
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis2@example.com", "name": "dennis1"}]', '[{"email": "john@example.com", "name": "john"}, {"email": "dennis1@example.com", "name": "dennis1"}, {"email": "dennis2@example.com", "name": "dennis2"}]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
我想获取电子邮件地址和姓名。我也想避免上当受骗。只有一封电子邮件和一个名字。我也不想将其作为数组获取,而是每行一封电子邮件和姓名。
正在搜索john
SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
*
FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
WHERE ("e"."email" ~* 'john' or "e"."name" ~* 'john'))
);
john
的预期结果:
email name
john@example.com john
正在搜索 ``(空):
SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
*
FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
WHERE ("e"."email" ~* '' or "e"."name" ~* ''))
);
``(空)的预期结果:
email name
john@example.com john
dennis1@example.com dennis1
dennis2@example.com dennis2
dennis2
在sender
和reporter
中都有,因此只需要其中一个。没有骗子。
事实上,这里有一个问题。如果 sender
或 reporter
列至少有一个 json 对象(不是 json 数组),那么此查询也会失败。
错误:cannot extract elements from an object
那是另外一回事了。
在这种情况下我怎样才能实现我的目标?
演示:https://dbfiddle.uk/?rdbms=postgres_12&fiddle=1bf9c5f83f5104e2392c31984cb4e939
在搜索之前规范化您的数据,然后使用 distinct on ()
子句删除重复数据:
with cte as (select x ->> 'name' as name, x ->> 'email' as email
from emails as e, jsonb_array_elements(e.sender || e.reporter) as x)
select distinct on (email) * from cte where
name ~* '' or email ~* ''
--name ~* 'john' or email ~* 'john'
order by email;
请注意,它将始终扫描整个 table,在这种情况下没有适用的索引。想想schema normalization.