如何编写查询以避免 select 不同和大小 collect_set 配置单元查询中的单个缩减器?

How to write query to avoid single reducer in select distinct and size collect_set hive queries?

如何重写这些查询以避免在 reduce 阶段使用单个 reducer?它需要永远,我失去了使用它的并行性的好处。

select id
, count(distinct locations) AS unique_locations
  from
  mytable
;

select id
, size(collect_set(locations)) AS unique_locations
  from
  mytable
;

使用两个查询适用于 count(distinct var):

SELECT
 count(1)
FROM (
 SELECT DISTINCT locations as unique_locations 
 from my_table
 ) t;

尺码也一样 collect_set 我认为:

SELECT
  size(unique_locations)
FROM (
 SELECT collect_set(locations) as unique_locations 
 from my_table
 ) t;