计算 Hive 中不同的每一列
Count distinct each column in Hive
给出以下 table:
--------------------------------------------------------------------------------------
| browser (col1) | os (col2) | device (col2) | ... | city (col650) |
--------------------------------------------------------------------------------------
| Chrome | Android | Samsung | ... | Berlin |
--------------------------------------------------------------------------------------
| Chrome | Android | Samsung | ... | Cologne |
--------------------------------------------------------------------------------------
| Mozilla | Android | Huawei | ... | Munich |
--------------------------------------------------------------------------------------
| Chrome | Android | Sony | ... | Berlin |
--------------------------------------------------------------------------------------
我想获取每列的不同值:
--------------------------------------------------------------------------------------
| browser (col1) | os (col2) | device (col2) | ... | city (col650) |
--------------------------------------------------------------------------------------
| 2 | 1 | 3 | ... | 4 |
--------------------------------------------------------------------------------------
table 有 650 个不同的列,因此无法在查询中指定每一列。
您必须对排名为 1 的所有 650 columns.Sum 个行值执行此操作。
select
sum(case when col1Rank=1 then 1 ekse 0 end) as col1,
sum(case when col2Rank=1 then 1 else 0 end) as col2,
sum(case when col3Rank=1 then 1 else 0 end) as col3
from
(
select
row_number() over(partition by col1 order by col1) as col1Rank,
row_number() over(partition by col2 order by col2) as col2Rank,
row_number() over(partition by col3 order by col3) as col3Rank
from table_name
) A;
给出以下 table:
--------------------------------------------------------------------------------------
| browser (col1) | os (col2) | device (col2) | ... | city (col650) |
--------------------------------------------------------------------------------------
| Chrome | Android | Samsung | ... | Berlin |
--------------------------------------------------------------------------------------
| Chrome | Android | Samsung | ... | Cologne |
--------------------------------------------------------------------------------------
| Mozilla | Android | Huawei | ... | Munich |
--------------------------------------------------------------------------------------
| Chrome | Android | Sony | ... | Berlin |
--------------------------------------------------------------------------------------
我想获取每列的不同值:
--------------------------------------------------------------------------------------
| browser (col1) | os (col2) | device (col2) | ... | city (col650) |
--------------------------------------------------------------------------------------
| 2 | 1 | 3 | ... | 4 |
--------------------------------------------------------------------------------------
table 有 650 个不同的列,因此无法在查询中指定每一列。
您必须对排名为 1 的所有 650 columns.Sum 个行值执行此操作。
select
sum(case when col1Rank=1 then 1 ekse 0 end) as col1,
sum(case when col2Rank=1 then 1 else 0 end) as col2,
sum(case when col3Rank=1 then 1 else 0 end) as col3
from
(
select
row_number() over(partition by col1 order by col1) as col1Rank,
row_number() over(partition by col2 order by col2) as col2Rank,
row_number() over(partition by col3 order by col3) as col3Rank
from table_name
) A;