Hive 中的 GROUPING SETS 后如何重塑数据?
How to reshape data after GROUPING SETS in Hive?
我想聚合多个不同维度的列。我认为 GOUPING SETS 适合我的问题,但我无法弄清楚如何 transform/reshape 来自 GROUPING SETS 的结果 table。
这是我使用 GROUPING SETS 的查询:
select date, dim1, dim2, dim3, sum(value) as sum_value
from table
grouping by date, dim1, dim2, dim3
grouping sets ((date, dim1), (date, dim2), (date, dim3))
查询将导致 table 像这样:
date dim1 dim2 dim3 sum_value
2017-01-01 A NULL NULL [value_A]
2017-01-01 B NULL NULL [value_B]
2017-01-01 NULL C NULL [value_C]
2017-01-01 NULL D NULL [value_D]
2017-01-01 NULL NULL E [value_E]
2017-01-01 NULL NULL F [value_F]
但我真正需要的是这样的table:
date dim factor sum_value
2017-01-01 dim1 A [value_A]
2017-01-01 dim1 B [value_B]
2017-01-01 dim2 C [value_C]
2017-01-01 dim2 D [value_D]
2017-01-01 dim3 E [value_E]
2017-01-01 dim3 F [value_F]
实际维数远远超过 3,因此硬编码查询不是一个好主意。有没有办法通过分组集或其他聚合方法重塑 table 以获得所需的 table?
谢谢!
select `date`
,elt(log2(GROUPING__ID - 1),'dim1','dim2','dim3') as dim
,coalesce (dim1,dim2,dim3) as factor
,sum(value) as sum_value
from `table`
group by `date`,dim1,dim2,dim3
grouping sets ((`date`,dim1),(`date`,dim2),(`date`,dim3))
我想聚合多个不同维度的列。我认为 GOUPING SETS 适合我的问题,但我无法弄清楚如何 transform/reshape 来自 GROUPING SETS 的结果 table。
这是我使用 GROUPING SETS 的查询:
select date, dim1, dim2, dim3, sum(value) as sum_value
from table
grouping by date, dim1, dim2, dim3
grouping sets ((date, dim1), (date, dim2), (date, dim3))
查询将导致 table 像这样:
date dim1 dim2 dim3 sum_value
2017-01-01 A NULL NULL [value_A]
2017-01-01 B NULL NULL [value_B]
2017-01-01 NULL C NULL [value_C]
2017-01-01 NULL D NULL [value_D]
2017-01-01 NULL NULL E [value_E]
2017-01-01 NULL NULL F [value_F]
但我真正需要的是这样的table:
date dim factor sum_value
2017-01-01 dim1 A [value_A]
2017-01-01 dim1 B [value_B]
2017-01-01 dim2 C [value_C]
2017-01-01 dim2 D [value_D]
2017-01-01 dim3 E [value_E]
2017-01-01 dim3 F [value_F]
实际维数远远超过 3,因此硬编码查询不是一个好主意。有没有办法通过分组集或其他聚合方法重塑 table 以获得所需的 table?
谢谢!
select `date`
,elt(log2(GROUPING__ID - 1),'dim1','dim2','dim3') as dim
,coalesce (dim1,dim2,dim3) as factor
,sum(value) as sum_value
from `table`
group by `date`,dim1,dim2,dim3
grouping sets ((`date`,dim1),(`date`,dim2),(`date`,dim3))