我怎样才能防止 left join returns 记录比未连接查询多？

Question

我正在尝试使用 Left Join 来加入多个 tables，以便根据来自 main table (table_a) 的 id 字段仅获取找到的记录:

Select table_a.id, table_b.location, table_c.material
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id

一切似乎都很顺利，我在输出中得到了预期的字段，而且记录数是 11,000（与 table_a 相同）

但是当我在查询中添加下一个左连接时，使用 tabl_b 的 id 字段而不是基于 table:a 的 id 字段，我得到 11,500 条记录：

Select table_a.id, table_b.location, table_c.material, table_d.sales
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
Left Join table_d.id 
On table_b.id = table_d.id

你知道我怎样才能避免这个问题吗？

Answer 1

table_d 与 table_b 的匹配项超过 1 个是有原因的。在这里，重要的是要考虑问题的业务规则。通常我们不能简单的忽略多重结果，要么需要对多余的列进行分组，求和，平均，要么按照一定的规则从多重匹配中选择一个。例如，这里我假设从 table_d 我想要最近的记录匹配，即月份列。我使用 rank 和 partition 来获得一个 ID 来订购“重复项”，在这种情况下我只想要第一个匹配项 (order_c = 1):

WITH cte AS (
  Select table_a.id, table_b.location, table_c.material, table_d.sales
, RANK() OVER (partition by table_a.id order by table_d.month desc) as order_c
  From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
Left Join table_d
On table_b.id = table_d.id)

select id, location, material, sales, order_c 
from cte where order_c =1

您可以看到 fiddle in action.

create table table_a (ID INT);
create table table_b (ID INT, location varchar(10));
create table table_c (ID INT, material varchar(10));
create table table_d (ID INT, sales INT, month INT);
INSERT into table_a(ID) 
VALUES (1), (2), (3), (4), (5);
INSERT into table_b(ID, location) 
VALUES (1, 'UK'),
      (9, 'USA');
INSERT into table_c(ID, material) 
VALUES (1, 'paper');
INSERT into table_d(ID, sales, month) 
VALUES (1, 345, 1), (1, 599, 2);

我怎样才能防止 left join returns 记录比未连接查询多？

How can I to prevent left join returns more records than in the unjoined query?

sql

subquery