Impala - 不使用 UNION ALL 将列转换为行

Impala - transform columns to rows without using UNION ALL

我有一个 table 与用户和每周 activity,例如这是 user_activity table:

userid | wk1 | wk2 | wk3
u1     |  1  |  0  |  1
u2     |  0  |  1  |  0
u3     |  1  |  0  |  1

我想将其转换为:

week   | active
wk1    |   2
wk2    |   1
wk3    |   2

我可以像这样使用 UNION ALL 来实现:

SELECT 'wk1' as week,
       SUM( wk1 ) AS active
FROM user_activity
UNION ALL
SELECT 'wk2' as week,
       SUM( wk2 ) AS active
FROM user_activity
UNION ALL
SELECT 'wk3' as week,
       SUM( wk3 ) AS active
FROM user_activity;

有没有不使用 UNION ALL 也能达到同样效果的方法?

谢谢!

编辑: Impala版本:2.6.0

没有 UNION ALL 的原因:对于每个 SELECT,将从 HDFS 扫描整个 table。如果我有一个巨大的 table,这将导致 OOM 错误。

您可以尝试逆透视和聚合。这样 user_activity table 将只被读取一次。

select
    w.week,
    sum(case w.week 
        when 'wk1' then wk1
        when 'wk2' then wk2
        when 'wk3' then wk3
    end) active
from user_activity u
cross join (
    select 'wk1' week union all
    select 'wk2' week union all
    select 'wk3' week 
) w group by w.week;

生产:

+------+--------+
| week | active |
+------+--------+
| wk1  | 2      |
| wk2  | 1      |
| wk3  | 2      |
+------+--------+

它只需要一个聚合而不是三个。我仅使用 UNION ALL 来构建自定义数据透视查询。我没有在用户 table.

上使用它

撇开琐碎不谈"use UNION",这个问题似乎有点荒唐。但这是一种方法:

with nounionall as (
      select (case row_number() over (order by userid)
                  when 1 then 'wk1'
                  when 2 then 'wk2'
                  when 3 then 'wk3'
              end) as week
      from user_activity ua
      limit 3
     )
select nounionall.week,
       sum(case when nounionall.week = 'wk1' then wk1
                when nounionall.week = 'wk2' then wk2
                when nounionall.week = 'wk3' then wk3
           end) as actives
from nounionall cross join
     user_activity ua
group by nounionall.week

这应该可以解决您的性能问题。
table 仅被扫描一次。
记录没有重复 X3.
这里的UNION ALL只用于单条记录。

select  concat('wk',cast(c.i as string))    as week
       ,case c.i 
            when 1 then wk1 
            when 2 then wk2
            when 3 then wk3
        end                 as active

from   (select  sum(wk1) AS wk1
               ,sum(wk2) AS wk2
               ,sum(wk3) AS wk3 

        from    user_activity
        ) t

        cross join (            select 1 as i
                    union all   select 2 
                    union all   select 3
                    ) c
;                    

+------+--------+
| week | active |
+------+--------+
| wk1  | 2      |
| wk2  | 1      |
| wk3  | 2      |
+------+--------+

这个没有任何 union all 的非常简单的解决方案怎么样:-

SELECT [week],active
FROM (
    SELECT SUM(wk1)wk1,SUM(wk2)wk2,SUM(wk3)wk3
    FROM user_activity)pvt
UNPIVOT ([active] FOR [Week] IN  (wk1,wk2,wk3)) unpvt