我怎样才能防止 left join returns 记录比未连接查询多?
How can I to prevent left join returns more records than in the unjoined query?
我正在尝试使用 Left Join 来加入多个 tables,以便根据来自 main table (table_a) 的 id 字段仅获取找到的记录:
Select table_a.id, table_b.location, table_c.material
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
一切似乎都很顺利,我在输出中得到了预期的字段,而且记录数是 11,000(与 table_a 相同)
但是当我在查询中添加下一个左连接时,使用 tabl_b 的 id 字段而不是基于 table:a 的 id 字段,我得到 11,500 条记录:
Select table_a.id, table_b.location, table_c.material, table_d.sales
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
Left Join table_d.id
On table_b.id = table_d.id
你知道我怎样才能避免这个问题吗?
table_d 与 table_b 的匹配项超过 1 个是有原因的。在这里,重要的是要考虑问题的业务规则。通常我们不能简单的忽略多重结果,要么需要对多余的列进行分组,求和,平均,要么按照一定的规则从多重匹配中选择一个。例如,这里我假设从 table_d 我想要最近的记录匹配,即月份列。我使用 rank 和 partition 来获得一个 ID 来订购“重复项”,在这种情况下我只想要第一个匹配项 (order_c = 1):
WITH cte AS (
Select table_a.id, table_b.location, table_c.material, table_d.sales
, RANK() OVER (partition by table_a.id order by table_d.month desc) as order_c
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
Left Join table_d
On table_b.id = table_d.id)
select id, location, material, sales, order_c
from cte where order_c =1
您可以看到 fiddle in action.
create table table_a (ID INT);
create table table_b (ID INT, location varchar(10));
create table table_c (ID INT, material varchar(10));
create table table_d (ID INT, sales INT, month INT);
INSERT into table_a(ID)
VALUES (1), (2), (3), (4), (5);
INSERT into table_b(ID, location)
VALUES (1, 'UK'),
(9, 'USA');
INSERT into table_c(ID, material)
VALUES (1, 'paper');
INSERT into table_d(ID, sales, month)
VALUES (1, 345, 1), (1, 599, 2);
我正在尝试使用 Left Join 来加入多个 tables,以便根据来自 main table (table_a) 的 id 字段仅获取找到的记录:
Select table_a.id, table_b.location, table_c.material
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
一切似乎都很顺利,我在输出中得到了预期的字段,而且记录数是 11,000(与 table_a 相同)
但是当我在查询中添加下一个左连接时,使用 tabl_b 的 id 字段而不是基于 table:a 的 id 字段,我得到 11,500 条记录:
Select table_a.id, table_b.location, table_c.material, table_d.sales
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
Left Join table_d.id
On table_b.id = table_d.id
你知道我怎样才能避免这个问题吗?
table_d 与 table_b 的匹配项超过 1 个是有原因的。在这里,重要的是要考虑问题的业务规则。通常我们不能简单的忽略多重结果,要么需要对多余的列进行分组,求和,平均,要么按照一定的规则从多重匹配中选择一个。例如,这里我假设从 table_d 我想要最近的记录匹配,即月份列。我使用 rank 和 partition 来获得一个 ID 来订购“重复项”,在这种情况下我只想要第一个匹配项 (order_c = 1):
WITH cte AS (
Select table_a.id, table_b.location, table_c.material, table_d.sales
, RANK() OVER (partition by table_a.id order by table_d.month desc) as order_c
From table_a
Left Join table_b
On table_a.id = table_b.id
Left Join table_c
On table_a.id = table_c.id
Left Join table_d
On table_b.id = table_d.id)
select id, location, material, sales, order_c
from cte where order_c =1
您可以看到 fiddle in action.
create table table_a (ID INT);
create table table_b (ID INT, location varchar(10));
create table table_c (ID INT, material varchar(10));
create table table_d (ID INT, sales INT, month INT);
INSERT into table_a(ID)
VALUES (1), (2), (3), (4), (5);
INSERT into table_b(ID, location)
VALUES (1, 'UK'),
(9, 'USA');
INSERT into table_c(ID, material)
VALUES (1, 'paper');
INSERT into table_d(ID, sales, month)
VALUES (1, 345, 1), (1, 599, 2);