Teradata SQL 助手 - NOT EXISTS、INNER JOIN、IN 或其他方式
Teradata SQL Assistant - NOT EXISTS, INNER JOIN, IN or other way
有一个 table 包含有关汽车的数据。
model_id
color
1
black
1
green
2
black
3
blue
3
white
4
red
5
white
5
black
任务是如果模型可以是黑色的,那么就只留下它(黑色),如果模型不能是黑色的,那么就留下它所有的颜色:
model_id
color
1
black
2
black
3
blue
3
white
4
red
5
black
第一种方式:
create table black_cars as
(select * from cars where color = 'black')
with data;
create table not_black_cars as (
select c.* from cars as c
where not exists (select 1 from black_cars as bc where bc.model_id = c.model_id)
with data;
select * from black_cars
union
select * from not_black_cars;
第二种方式:
select * from cars where color = 'black
union
select * from cars as c
inner join
(select distinct model_id from cars except select model_id from cars where color = 'black') as nbc
on c.model_id = nbc.model_id
第三种方式(低性能):
select * from cars where color = 'black
union
select * from cars where model_id not in
(select model_id from cars where color = 'black)
我不确定这两种方式中的哪一种是最佳的。如果有人能提出最好的方法,我将不胜感激。
我会简单地使用:
select c.*
from cars c
where c.color = 'black' or
not exists (select 1
from cars c2
where c2.model_id = c.model_id and
c2.color = 'black'
);
在很多情况下,您可能会发现 window 函数具有良好的性能:
select c.*
from cars c
qualify c.color = 'black' or
sum(case when c.color = 'black' then 1 else 0 end) over (partition by c.model_id) = 0;
至于所有版本中哪个版本的性能最好——您必须在您的数据和系统上试用它们。这取决于很多因素:
- table 上的索引。
- table的大小。
- 每个模型的平均颜色数。
- black/non-black 颜色数量在模型中的分布。
而且可能更多。以上之一似乎是合理的 - 你的方法不涉及临时 tables.
这是另一种基于 OLAP 函数的方法,可以轻松扩展到更复杂的逻辑:
SELECT *
FROM cars
QUALIFY
Rank()
Over (PARTITION BY model_id
-- preferred condition first
ORDER BY CASE WHEN color = 'black' THEN 1 ELSE 0 END DESC) = 1
;
Teradata 中 OLAP 函数的性能可能与基于 EXISTS 等的解决方案一样好或更好。只有 PARTITION BY 列上的 PI 可能会击败它,主要取决于每个值的行数。
有一个 table 包含有关汽车的数据。
model_id | color |
---|---|
1 | black |
1 | green |
2 | black |
3 | blue |
3 | white |
4 | red |
5 | white |
5 | black |
任务是如果模型可以是黑色的,那么就只留下它(黑色),如果模型不能是黑色的,那么就留下它所有的颜色:
model_id | color |
---|---|
1 | black |
2 | black |
3 | blue |
3 | white |
4 | red |
5 | black |
第一种方式:
create table black_cars as
(select * from cars where color = 'black')
with data;
create table not_black_cars as (
select c.* from cars as c
where not exists (select 1 from black_cars as bc where bc.model_id = c.model_id)
with data;
select * from black_cars
union
select * from not_black_cars;
第二种方式:
select * from cars where color = 'black
union
select * from cars as c
inner join
(select distinct model_id from cars except select model_id from cars where color = 'black') as nbc
on c.model_id = nbc.model_id
第三种方式(低性能):
select * from cars where color = 'black
union
select * from cars where model_id not in
(select model_id from cars where color = 'black)
我不确定这两种方式中的哪一种是最佳的。如果有人能提出最好的方法,我将不胜感激。
我会简单地使用:
select c.*
from cars c
where c.color = 'black' or
not exists (select 1
from cars c2
where c2.model_id = c.model_id and
c2.color = 'black'
);
在很多情况下,您可能会发现 window 函数具有良好的性能:
select c.*
from cars c
qualify c.color = 'black' or
sum(case when c.color = 'black' then 1 else 0 end) over (partition by c.model_id) = 0;
至于所有版本中哪个版本的性能最好——您必须在您的数据和系统上试用它们。这取决于很多因素:
- table 上的索引。
- table的大小。
- 每个模型的平均颜色数。
- black/non-black 颜色数量在模型中的分布。
而且可能更多。以上之一似乎是合理的 - 你的方法不涉及临时 tables.
这是另一种基于 OLAP 函数的方法,可以轻松扩展到更复杂的逻辑:
SELECT *
FROM cars
QUALIFY
Rank()
Over (PARTITION BY model_id
-- preferred condition first
ORDER BY CASE WHEN color = 'black' THEN 1 ELSE 0 END DESC) = 1
;
Teradata 中 OLAP 函数的性能可能与基于 EXISTS 等的解决方案一样好或更好。只有 PARTITION BY 列上的 PI 可能会击败它,主要取决于每个值的行数。