计算 Hive 中不同的每一列

Count distinct each column in Hive

给出以下 table:

--------------------------------------------------------------------------------------
| browser (col1)  | os (col2)     | device (col2)  |    ...   |     city (col650)    |       
--------------------------------------------------------------------------------------
| Chrome          | Android       | Samsung        |    ...   | Berlin               |
--------------------------------------------------------------------------------------
| Chrome          | Android       | Samsung        |    ...   | Cologne              |
--------------------------------------------------------------------------------------
| Mozilla         | Android       | Huawei         |    ...   | Munich               |
--------------------------------------------------------------------------------------
| Chrome          | Android       | Sony           |    ...   | Berlin               |
--------------------------------------------------------------------------------------

我想获取每列的不同值:

--------------------------------------------------------------------------------------
| browser (col1)  | os (col2)     | device (col2)  |    ...   |     city (col650)    |       
--------------------------------------------------------------------------------------
| 2               | 1             | 3              |    ...   | 4                    |
--------------------------------------------------------------------------------------

table 有 650 个不同的列,因此无法在查询中指定每一列。

您必须对排名为 1 的所有 650 columns.Sum 个行值执行此操作。

 select
         sum(case when col1Rank=1 then 1 ekse 0 end) as col1,
         sum(case when col2Rank=1 then 1 else 0 end) as col2,
         sum(case when col3Rank=1 then 1 else 0 end) as col3

from 
(
    select
         row_number() over(partition by col1 order by col1) as col1Rank,
         row_number() over(partition by col2 order by col2) as col2Rank,
         row_number() over(partition by col3 order by col3) as col3Rank     
    from table_name
) A;