Teradata SQL 助手 - NOT EXISTS、INNER JOIN、IN 或其他方式

Question

有一个 table 包含有关汽车的数据。

model_id	color
1	black
1	green
2	black
3	blue
3	white
4	red
5	white
5	black

任务是如果模型可以是黑色的，那么就只留下它（黑色），如果模型不能是黑色的，那么就留下它所有的颜色：

model_id	color
1	black
2	black
3	blue
3	white
4	red
5	black

第一种方式：

create table black_cars as

(select * from cars where color = 'black')

with data;

create table not_black_cars as (

select c.* from cars as c

where not exists (select 1 from black_cars as bc where bc.model_id = c.model_id)

with data;

select * from black_cars

union

select * from not_black_cars;

第二种方式：

select * from cars where color = 'black

union

select * from cars as c

inner join

(select distinct model_id from cars except select model_id from cars where color = 'black') as nbc

on c.model_id = nbc.model_id

第三种方式（低性能）：

select * from cars where color = 'black

union

select * from cars where model_id not in

(select model_id from cars where color = 'black)

我不确定这两种方式中的哪一种是最佳的。如果有人能提出最好的方法，我将不胜感激。

Answer 1

我会简单地使用：

select c.*
from cars c
where c.color = 'black' or
      not exists (select 1
                  from cars c2
                  where c2.model_id = c.model_id and
                        c2.color = 'black'
                 );

在很多情况下，您可能会发现 window 函数具有良好的性能：

select c.*
from cars c
qualify c.color = 'black' or
        sum(case when c.color = 'black' then 1 else 0 end) over (partition by c.model_id) = 0;

至于所有版本中哪个版本的性能最好——您必须在您的数据和系统上试用它们。这取决于很多因素：

table 上的索引。
table的大小。
每个模型的平均颜色数。
black/non-black 颜色数量在模型中的分布。

而且可能更多。以上之一似乎是合理的 - 你的方法不涉及临时 tables.

Answer 2

这是另一种基于 OLAP 函数的方法，可以轻松扩展到更复杂的逻辑：

SELECT *
FROM cars
QUALIFY
   Rank() 
   Over (PARTITION BY model_id
         -- preferred condition first
         ORDER BY CASE WHEN color = 'black' THEN 1 ELSE 0 END DESC) = 1
;

Teradata 中 OLAP 函数的性能可能与基于 EXISTS 等的解决方案一样好或更好。只有 PARTITION BY 列上的 PI 可能会击败它，主要取决于每个值的行数。

Teradata SQL 助手 - NOT EXISTS、INNER JOIN、IN 或其他方式

Teradata SQL Assistant - NOT EXISTS, INNER JOIN, IN or other way

sql

union

teradata

not-exists

except