具有相同表和连接逻辑但结果不同的两个查询
Two queries with same tables and join logic but different results
我试图仅在 1 table 中获取记录,即在 A 中而不是在 B 中
案例 1:
select count(distinct t.col1),count(distinct t.col2)
from `table1` e
right join
(
select distinct col1,col2
from `table2_*`
where _table_suffix between '20180101' and '20181231'
)t
on e.col1=t.col1
where date(timestamp_seconds(ts))>='2018-01-01'
and e.col1 is null
;
案例 2:
select count(distinct col1)
from `table2_*`
where _table_suffix between '20180101' and '20181231'
and col1 not in (
select distinct col1 from `table1`
where date(timestamp_seconds(ts))>='2018-01-01'
)
在 2 个代码中,案例 2 有效,而案例 1 输出 0。我还尝试将案例 1 作为左连接与 tables 反转,但结果是相同的 0 行。我是 Big Query 和标准 sql 版本的新手,我不确定为什么会发生这种情况。
Out of the 2 codes, case2 worked while case1 gave 0 as output.
这是因为 NOT IN
returns NULL 当列表中有一个 NULL 时。如果您不希望出现这种情况,请排除 NULL 值:
select count(distinct col1)
from `table2_*`
where _table_suffix between '20180101' and '20181231'
and col1 not in (
select distinct col1 from `table1`
where date(timestamp_seconds(ts))>='2018-01-01'
and col1 IS NOT NULL
)
如果使用 NOT IN
,您 不得 允许 NULL 作为 "in list"
的值
SELECT count(DISTINCT t.col1)
FROM `table2_ * ` AS t
WHERE t._table_suffix BETWEEN '20180101' AND '20181231'
AND col1 NOT IN (
SELECT DISTINCT e.col1
FROM `table1` AS e
WHERE DATE (timestamp_seconds(e.ts)) >= '2018-01-01'
AND e.col1 IS NOT NULL
);
我个人更喜欢使用 NOT EXISTS
:
SELECT count(DISTINCT t.col1)
FROM `table2_ * ` AS t
WHERE t._table_suffix BETWEEN '20180101' AND '20181231'
AND NOT EXISTS (
SELECT NULL
FROM `table1` AS e
WHERE DATE (timestamp_seconds(e.ts)) >= '2018-01-01'
AND e.col1 = t.col1
);
注意,这里的子查询select子句不需要return任何值,所以select null
或select 1
或select *
都是有效的。使用 exits
或 not exists
时,重要的是子查询的 from & where 子句。
我试图仅在 1 table 中获取记录,即在 A 中而不是在 B 中 案例 1:
select count(distinct t.col1),count(distinct t.col2)
from `table1` e
right join
(
select distinct col1,col2
from `table2_*`
where _table_suffix between '20180101' and '20181231'
)t
on e.col1=t.col1
where date(timestamp_seconds(ts))>='2018-01-01'
and e.col1 is null
;
案例 2:
select count(distinct col1)
from `table2_*`
where _table_suffix between '20180101' and '20181231'
and col1 not in (
select distinct col1 from `table1`
where date(timestamp_seconds(ts))>='2018-01-01'
)
在 2 个代码中,案例 2 有效,而案例 1 输出 0。我还尝试将案例 1 作为左连接与 tables 反转,但结果是相同的 0 行。我是 Big Query 和标准 sql 版本的新手,我不确定为什么会发生这种情况。
Out of the 2 codes, case2 worked while case1 gave 0 as output.
这是因为 NOT IN
returns NULL 当列表中有一个 NULL 时。如果您不希望出现这种情况,请排除 NULL 值:
select count(distinct col1)
from `table2_*`
where _table_suffix between '20180101' and '20181231'
and col1 not in (
select distinct col1 from `table1`
where date(timestamp_seconds(ts))>='2018-01-01'
and col1 IS NOT NULL
)
如果使用 NOT IN
,您 不得 允许 NULL 作为 "in list"
SELECT count(DISTINCT t.col1)
FROM `table2_ * ` AS t
WHERE t._table_suffix BETWEEN '20180101' AND '20181231'
AND col1 NOT IN (
SELECT DISTINCT e.col1
FROM `table1` AS e
WHERE DATE (timestamp_seconds(e.ts)) >= '2018-01-01'
AND e.col1 IS NOT NULL
);
我个人更喜欢使用 NOT EXISTS
:
SELECT count(DISTINCT t.col1)
FROM `table2_ * ` AS t
WHERE t._table_suffix BETWEEN '20180101' AND '20181231'
AND NOT EXISTS (
SELECT NULL
FROM `table1` AS e
WHERE DATE (timestamp_seconds(e.ts)) >= '2018-01-01'
AND e.col1 = t.col1
);
注意,这里的子查询select子句不需要return任何值,所以select null
或select 1
或select *
都是有效的。使用 exits
或 not exists
时,重要的是子查询的 from & where 子句。