优化子查询和连接
Optimizing sub-queries and joins
我有 2 个 table,它们之间可以有多对多关系:
Person (pid, name, a,b) ,
Attributes (attribId, d,e)
映射存在于单独的 table:
Mapping (mapId, pid, attribId)
目标是获取符合筛选条件的人员的所有人员和属性值。筛选条件基于属性 table 中的列。例如 - d 列。
例如:
Person ->
(1,'person1','a1','b1')
(2,'person2','a1','b1')
Attributes ->
(1,'d1','e1')
(2,'d2','e1')
(3,'d3','e1')
(4,'d3','e2')
Mapping ->
(1,1,1)
(2,1,2)
(3,1,3)
After running the query ->
Result:
(1,'person1','a1','b1')(1,'d1','e1')
(1,'person1','a1','b1')(2,'d2','e1')
(1,'person1','a1','b1')(3,'d3','e1')
我一直在尝试的查询->
select p.*, a.*
from
Person p
left outer join
Mapping m
on p.pid=m.pid
left outer join
Attributes a
on m.attribId=a.attribId
where
p.pid in (select p1.pid
from
Person p1
left outer join
Mapping m1
on p1.pid=m1.pid
left outer join
Attributes a1
on m1.attribId=a1.attribId
where
a1.d = 'd1')
同样,我也必须丢弃具有特定 d 值的 Person 条目。
因此,目前,最终查询如下所示:
SELECT
p.*,
a.*
FROM Person p
LEFT OUTER JOIN Mapping m
ON p.pid = m.pid
LEFT OUTER JOIN Attributes a
ON m.attribId = a.attribId
WHERE p.pid IN (SELECT
p1.pid
FROM Person p1
LEFT OUTER JOIN Mapping m1
ON p1.pid = m1.pid
LEFT OUTER JOIN Attributes a1
ON m1.attribId = a1.attribId
WHERE a1.d = 'd1')
AND p.pid NOT IN (SELECT
p2.pid
FROM Person p2
LEFT OUTER JOIN Mapping m2
ON p2.pid = m2.pid
LEFT OUTER JOIN Attributes a2
ON m2.attribId = a2.attribId
WHERE a2.d = 'd5');
感觉这个查询效率很低,因为在 3 个地方完成了相同的连接。有没有办法为所有子查询重用连接并提高效率?
您可以使用以下方法得到所有满足过滤器的人:
select m.pid
from mapping m join
attributes a
on m.attribId = a.attribId and a.d = 'dS';
您可以使用 IN
或 EXISTS
或 JOIN
获得所有 person/attribute 组合。哪个更好取决于数据库。但思路是:
select p.*, a.*
from person p join
mapping m
on p.pid = m.pid join
attributes a
on m.attribId = a.attribId
where p.pid in (select m.pid
from mapping m join
attributes a
on m.attribId = a.attribId and a.d = 'dS'
);
我认为没有理由对这些查询使用 left join
。
编辑:
如果过滤条件是基于多列,则子查询使用group by
和having
:
select m.pid
from mapping m join
attributes a
on m.attribId = a.attribId and a.d = 'dS'
group by m.pid
having sum(case when a.d = 'dS' then 1 else 0 end) > 0 and -- at least one of these
sum(case when a.d = 'd1' then 1 else 0 end) = 0; -- none of these
我注意到的第一件事是您在子查询中使用左连接,内连接也可以工作并且速度更快。第二次从嵌套选择中删除 Person,因为不需要它。
select m2.pid
from
Mapping m2
inner join
Attributes a2
on m2.attribId=a2.attribId
where
a2.d = 'd5'
我们也可以这样做
select p.* from person where pid in
(Select m.pid from mapping m where m.aid in
(select aid from attribute a where a.d = "something"))
我知道有关于连接与子查询的讨论,但在这种情况下,我认为子查询会更快。
我有 2 个 table,它们之间可以有多对多关系:
Person (pid, name, a,b) ,
Attributes (attribId, d,e)
映射存在于单独的 table:
Mapping (mapId, pid, attribId)
目标是获取符合筛选条件的人员的所有人员和属性值。筛选条件基于属性 table 中的列。例如 - d 列。
例如:
Person ->
(1,'person1','a1','b1')
(2,'person2','a1','b1')
Attributes ->
(1,'d1','e1')
(2,'d2','e1')
(3,'d3','e1')
(4,'d3','e2')
Mapping ->
(1,1,1)
(2,1,2)
(3,1,3)
After running the query ->
Result:
(1,'person1','a1','b1')(1,'d1','e1')
(1,'person1','a1','b1')(2,'d2','e1')
(1,'person1','a1','b1')(3,'d3','e1')
我一直在尝试的查询->
select p.*, a.*
from
Person p
left outer join
Mapping m
on p.pid=m.pid
left outer join
Attributes a
on m.attribId=a.attribId
where
p.pid in (select p1.pid
from
Person p1
left outer join
Mapping m1
on p1.pid=m1.pid
left outer join
Attributes a1
on m1.attribId=a1.attribId
where
a1.d = 'd1')
同样,我也必须丢弃具有特定 d 值的 Person 条目。
因此,目前,最终查询如下所示:
SELECT
p.*,
a.*
FROM Person p
LEFT OUTER JOIN Mapping m
ON p.pid = m.pid
LEFT OUTER JOIN Attributes a
ON m.attribId = a.attribId
WHERE p.pid IN (SELECT
p1.pid
FROM Person p1
LEFT OUTER JOIN Mapping m1
ON p1.pid = m1.pid
LEFT OUTER JOIN Attributes a1
ON m1.attribId = a1.attribId
WHERE a1.d = 'd1')
AND p.pid NOT IN (SELECT
p2.pid
FROM Person p2
LEFT OUTER JOIN Mapping m2
ON p2.pid = m2.pid
LEFT OUTER JOIN Attributes a2
ON m2.attribId = a2.attribId
WHERE a2.d = 'd5');
感觉这个查询效率很低,因为在 3 个地方完成了相同的连接。有没有办法为所有子查询重用连接并提高效率?
您可以使用以下方法得到所有满足过滤器的人:
select m.pid
from mapping m join
attributes a
on m.attribId = a.attribId and a.d = 'dS';
您可以使用 IN
或 EXISTS
或 JOIN
获得所有 person/attribute 组合。哪个更好取决于数据库。但思路是:
select p.*, a.*
from person p join
mapping m
on p.pid = m.pid join
attributes a
on m.attribId = a.attribId
where p.pid in (select m.pid
from mapping m join
attributes a
on m.attribId = a.attribId and a.d = 'dS'
);
我认为没有理由对这些查询使用 left join
。
编辑:
如果过滤条件是基于多列,则子查询使用group by
和having
:
select m.pid
from mapping m join
attributes a
on m.attribId = a.attribId and a.d = 'dS'
group by m.pid
having sum(case when a.d = 'dS' then 1 else 0 end) > 0 and -- at least one of these
sum(case when a.d = 'd1' then 1 else 0 end) = 0; -- none of these
我注意到的第一件事是您在子查询中使用左连接,内连接也可以工作并且速度更快。第二次从嵌套选择中删除 Person,因为不需要它。
select m2.pid
from
Mapping m2
inner join
Attributes a2
on m2.attribId=a2.attribId
where
a2.d = 'd5'
我们也可以这样做
select p.* from person where pid in
(Select m.pid from mapping m where m.aid in
(select aid from attribute a where a.d = "something"))
我知道有关于连接与子查询的讨论,但在这种情况下,我认为子查询会更快。