如何在 Hive SQL 中对一列中的数据进行分组并将其分布在另一列中?

How to grouby data in one column and distribute it in another column in HiveSQL?

我有以下数据:

CompanyID Department No of People Country
45390 HR 100 UK
45390 Service 250 UK
98712 Service 300 US
39284 Admin 142 Norway
85932 Admin 260 Germany

我想知道来自不同国家的同一部门有多少人?

需要输出

Department No of People Country
HR 100 UK
Service 250 UK
300 US
Admin 142 Norway
260 Germany

我能够获取数据,但该查询重复了该部门。

""" select Department, Country,count(Department) from dataset
    group by Country,Department
    order by Department """

如何获得所需的输出?

您生成的结果集并不是真正的关系结果集。为什么?因为行取决于“前一”行中的内容。而在关系数据库中,没有“上一个”行这样的东西。这种处理往往在应用层处理。

当然,SQL可以为所欲为。你只需要小心:

select (case when 1 = row_number() over (partition by Department order by Country) 
             then Department
        end) as Department,
       Country, count(*) as num_people,         
from dataset
group by Country,Department
order by Department, Country;

请注意,order by 需要匹配 window 函数子句,以确保 row_number() 认为是第一行的内容确实是结果集中的第一行。