使用 case 语句聚合 redshift 中的字段时，将二进制字段替换为 1 和 0 是否性能更高？

Question

例如，以下哪个计算应该执行得更快？

sum ( 
    case 
        when fieldA is not null
        then 1
        else 0
    end ) total

或

sum ( 
    case 
        when fieldA is not null
        then fieldB -- binary field, 1 or 0. 
    end ) total

为了这个例子，假设当 fieldA 不为 null 时，fieldB 将始终等于 1。如果 fieldA 为 null，fieldB 也可以等于 1，这就是我使用 case 语句的原因。

Answer 1

这两个查询不会做同样的事情，除非fieldB统一为1（或者当fieldA不是NULL时统一为1） .通常，您应该运行执行您真正需要的查询。

Redshift 是一个列式数据库。这意味着查询中使用的每一列都会增加执行开销。

因此，如果可以的话，最好避免阅读专栏。当然，如果该列在查询的其他地方被引用，那么这不适用。

此外，SUM() 对数字进行运算。我不确定“二进制”是否意味着该值是一个数字。如果不是，则需要转换，这也增加了开销。

when using a case statement to aggregate fields in redshift, is it more performant to replace binary fields with 1s and 0s?