Impala - 不使用 UNION ALL 将列转换为行
Impala - transform columns to rows without using UNION ALL
我有一个 table 与用户和每周 activity,例如这是 user_activity
table:
userid | wk1 | wk2 | wk3
u1 | 1 | 0 | 1
u2 | 0 | 1 | 0
u3 | 1 | 0 | 1
我想将其转换为:
week | active
wk1 | 2
wk2 | 1
wk3 | 2
我可以像这样使用 UNION ALL 来实现:
SELECT 'wk1' as week,
SUM( wk1 ) AS active
FROM user_activity
UNION ALL
SELECT 'wk2' as week,
SUM( wk2 ) AS active
FROM user_activity
UNION ALL
SELECT 'wk3' as week,
SUM( wk3 ) AS active
FROM user_activity;
有没有不使用 UNION ALL 也能达到同样效果的方法?
谢谢!
编辑:
Impala版本:2.6.0
没有 UNION ALL 的原因:对于每个 SELECT
,将从 HDFS 扫描整个 table。如果我有一个巨大的 table,这将导致 OOM 错误。
您可以尝试逆透视和聚合。这样 user_activity table 将只被读取一次。
select
w.week,
sum(case w.week
when 'wk1' then wk1
when 'wk2' then wk2
when 'wk3' then wk3
end) active
from user_activity u
cross join (
select 'wk1' week union all
select 'wk2' week union all
select 'wk3' week
) w group by w.week;
生产:
+------+--------+
| week | active |
+------+--------+
| wk1 | 2 |
| wk2 | 1 |
| wk3 | 2 |
+------+--------+
它只需要一个聚合而不是三个。我仅使用 UNION ALL
来构建自定义数据透视查询。我没有在用户 table.
上使用它
撇开琐碎不谈"use UNION
",这个问题似乎有点荒唐。但这是一种方法:
with nounionall as (
select (case row_number() over (order by userid)
when 1 then 'wk1'
when 2 then 'wk2'
when 3 then 'wk3'
end) as week
from user_activity ua
limit 3
)
select nounionall.week,
sum(case when nounionall.week = 'wk1' then wk1
when nounionall.week = 'wk2' then wk2
when nounionall.week = 'wk3' then wk3
end) as actives
from nounionall cross join
user_activity ua
group by nounionall.week
这应该可以解决您的性能问题。
table 仅被扫描一次。
记录没有重复 X3.
这里的UNION ALL只用于单条记录。
select concat('wk',cast(c.i as string)) as week
,case c.i
when 1 then wk1
when 2 then wk2
when 3 then wk3
end as active
from (select sum(wk1) AS wk1
,sum(wk2) AS wk2
,sum(wk3) AS wk3
from user_activity
) t
cross join ( select 1 as i
union all select 2
union all select 3
) c
;
+------+--------+
| week | active |
+------+--------+
| wk1 | 2 |
| wk2 | 1 |
| wk3 | 2 |
+------+--------+
这个没有任何 union all 的非常简单的解决方案怎么样:-
SELECT [week],active
FROM (
SELECT SUM(wk1)wk1,SUM(wk2)wk2,SUM(wk3)wk3
FROM user_activity)pvt
UNPIVOT ([active] FOR [Week] IN (wk1,wk2,wk3)) unpvt
我有一个 table 与用户和每周 activity,例如这是 user_activity
table:
userid | wk1 | wk2 | wk3
u1 | 1 | 0 | 1
u2 | 0 | 1 | 0
u3 | 1 | 0 | 1
我想将其转换为:
week | active
wk1 | 2
wk2 | 1
wk3 | 2
我可以像这样使用 UNION ALL 来实现:
SELECT 'wk1' as week,
SUM( wk1 ) AS active
FROM user_activity
UNION ALL
SELECT 'wk2' as week,
SUM( wk2 ) AS active
FROM user_activity
UNION ALL
SELECT 'wk3' as week,
SUM( wk3 ) AS active
FROM user_activity;
有没有不使用 UNION ALL 也能达到同样效果的方法?
谢谢!
编辑: Impala版本:2.6.0
没有 UNION ALL 的原因:对于每个 SELECT
,将从 HDFS 扫描整个 table。如果我有一个巨大的 table,这将导致 OOM 错误。
您可以尝试逆透视和聚合。这样 user_activity table 将只被读取一次。
select
w.week,
sum(case w.week
when 'wk1' then wk1
when 'wk2' then wk2
when 'wk3' then wk3
end) active
from user_activity u
cross join (
select 'wk1' week union all
select 'wk2' week union all
select 'wk3' week
) w group by w.week;
生产:
+------+--------+
| week | active |
+------+--------+
| wk1 | 2 |
| wk2 | 1 |
| wk3 | 2 |
+------+--------+
它只需要一个聚合而不是三个。我仅使用 UNION ALL
来构建自定义数据透视查询。我没有在用户 table.
撇开琐碎不谈"use UNION
",这个问题似乎有点荒唐。但这是一种方法:
with nounionall as (
select (case row_number() over (order by userid)
when 1 then 'wk1'
when 2 then 'wk2'
when 3 then 'wk3'
end) as week
from user_activity ua
limit 3
)
select nounionall.week,
sum(case when nounionall.week = 'wk1' then wk1
when nounionall.week = 'wk2' then wk2
when nounionall.week = 'wk3' then wk3
end) as actives
from nounionall cross join
user_activity ua
group by nounionall.week
这应该可以解决您的性能问题。
table 仅被扫描一次。
记录没有重复 X3.
这里的UNION ALL只用于单条记录。
select concat('wk',cast(c.i as string)) as week
,case c.i
when 1 then wk1
when 2 then wk2
when 3 then wk3
end as active
from (select sum(wk1) AS wk1
,sum(wk2) AS wk2
,sum(wk3) AS wk3
from user_activity
) t
cross join ( select 1 as i
union all select 2
union all select 3
) c
;
+------+--------+
| week | active |
+------+--------+
| wk1 | 2 |
| wk2 | 1 |
| wk3 | 2 |
+------+--------+
这个没有任何 union all 的非常简单的解决方案怎么样:-
SELECT [week],active
FROM (
SELECT SUM(wk1)wk1,SUM(wk2)wk2,SUM(wk3)wk3
FROM user_activity)pvt
UNPIVOT ([active] FOR [Week] IN (wk1,wk2,wk3)) unpvt