为什么两张表全外连接的结果是NULL？

Question

我正在尝试从两个表中获取唯一值，这两个表都只有一个名为域的列。

DDL:

create table domains_1 (domain varchar);
create table domains_2 (domain varchar);

DML:

insert into domains_1 values ('example_1.com'), ('example_2.com');
insert into domains_2 values ('example_2.com'), ('example_3.com');

有几种方法可以做到这一点，我决定使用完全外部连接。

select case when a.domain is null then b.domain
            when b.domain is null then a.domain
       end as unique_domains
from domains_1 as a full outer join domains_2 as b on a.domain = b.domain;

令我惊讶的是，结果中除了唯一域之外还有 null。

我可以再添加一层 select 来排除空值，如下所示：

select * from
(select case when a.domain is null then b.domain
            when b.domain is null then a.domain
       end as unique_domains
from domains_1 as a full outer join domains_2 as b on a.domain = b.domain) t
where unique_domains is not null;

这个 null 怎么会首先出现在结果中？有没有更好的方法从结果中删除 null？

Answer 1

你的CASE表达式没有ELSE，所以它默认为空：

case when a.domain is null then b.domain
     when b.domain is null then a.domain
     ELSE NULL -- implicitly
end as unique_domains

值 'example_2.com' 具有匹配项，因此 a.domain 和 b.domain 都等于“'example_2.com'”并且不为空。因此，WHEN 都不匹配，应用了 ELSE NULL。

至于“更好的方法”：我可能会使用

select coalesce(a.domain, b.domain) as domain
from domains_1 as a full outer join domains_2 as b on a.domain = b.domain
where a.domain is null or b.domain is null;

Answer 2

SELECT 列表中的 CASE 表达式无法删除行（如您所愿）。这必须在 JOIN 或 WHERE 子句中发生。

由于您的列名可以方便地对齐，因此在连接子句中使用 USING 关键字可以简化工作。

要获得“唯一域”（在您的示例中包括 'example_2.com'）：

SELECT domain
FROM   domains_1
FULL   JOIN domains_2 USING (domain);

分别在其他 table 中获取不匹配的域（不包括您示例中的 'example_2.com'）：

SELECT domain
FROM   domains_1 a
FULL   JOIN domains_2 b USING (domain)
WHERE  a.domain IS NULL OR b.domain IS NULL;

db<>fiddle here

The manual:

[...] USING implies that only one of each pair of equivalent columns will be included in the join output, not both.

但是您仍然可以通过 table 限定引用每个源列，如所演示的那样。

还有多种其他查询技术可以消除在另一个 table 中具有匹配项的行：

Select rows which are not present in other table

值得注意的是，上述两个查询都不会删除内每个table的重复项，除非另一个[= =74=].

第二个查询的奇特等价物，但每个查询中没有可能的重复项 table:

(TABLE domains_1 EXCEPT TABLE domains_2)
UNION ALL
(TABLE domains_2 EXCEPT TABLE domains_1);

在删除结果中剩余的重复项之前，此变体只会杀死另一个 table 中每个匹配项的一个重复项。略有不同，但：

(TABLE domains_1 EXCEPT ALL TABLE domains_2)
UNION
(TABLE domains_2 EXCEPT ALL TABLE domains_1);

关于短语法：

为什么两张表全外连接的结果是NULL？

Why is there NULL in the result of a full outer join between two tables?

sql

postgresql

null

join

full-outer-join