为什么我的连接过滤器被应用到我在 redshift 中的整个查询？

Question

我正在尝试构建一个棘手的跨粒度连接，tableA 的粒度低于 tableB。在此示例中，我试图实现以下结果：

tableA.id	tableA.fieldA	tableB.id
123	1	123
123	2	无
123	3	无
234	1	234
234	2	无
234	3	无

这是我完成此任务的查询：

Select
    *
from
    tableA

    left join tableB
    on tableB.id = tableA.id
    and tableA.fieldA = 1

不幸的是，“tableA.fieldA = 1”过滤器在整个查询中充当过滤器，而不仅仅是在连接上，导致以下结果：

tableA.id	tableA.fieldA	tableB.id
123	1	123
234	1	234

任何人都可以告诉我发生了什么事以及如何实现我的目标吗？谢谢！

Answer 1

不应该那样发生；无论 ON 子句的计算结果如何，您都应该获得 tableA 中的所有行。

但是您可以根据 select 中的条件做您想做的事：

select a.*,
       (case when a.fieldA = 1 then b.id end)
from tableA a left join
     tableB b
     on b.id = a.id ;

Answer 2

我根据你的描述编写了以下代码，SQL 在我的 Redshift 上，我得到了你正在寻找的答案。

CREATE TABLE table_a (
  ID int,
  A int);
                   
INSERT INTO table_a VALUES (123, 1)  ;  
INSERT INTO table_a VALUES (123, 2)  ;
INSERT INTO table_a VALUES (123, 3)  ;
INSERT INTO table_a VALUES (234, 1)  ;
INSERT INTO table_a VALUES (234, 2)  ;
INSERT INTO table_a VALUES (234, 3)  ;

CREATE TABLE table_b (
  ID int);
  
INSERT INTO table_b VALUES (123)  ;  
INSERT INTO table_b VALUES (234)  ;

Select
    *
from
    table_A
    left join table_B
    on table_B.id = table_A.id
    and table_A.a = 1;

结果：

id  a   id
123 1   123
123 2   NULL
123 3   NULL
234 1   234
234 2   NULL
234 3   NULL

我目前看到了一些可能性 - 1) 您的集群在您使用的确切版本中存在问题/错误运行或 2) 您所写的问题并不代表全部情况或 3 ) 我的代码不代表你的情况。

我上面提供的代码是否在您的集群上重现了该问题？如果不能，您能否提供必要的成分（DDL，SQL）来重现问题？

为什么我的连接过滤器被应用到我在 redshift 中的整个查询？

Why is my join filter being applied to my entire query in redshift?

sql

amazon-redshift