如何编写查询以避免 select 不同和大小 collect_set 配置单元查询中的单个缩减器?
How to write query to avoid single reducer in select distinct and size collect_set hive queries?
如何重写这些查询以避免在 reduce 阶段使用单个 reducer?它需要永远,我失去了使用它的并行性的好处。
select id
, count(distinct locations) AS unique_locations
from
mytable
;
和
select id
, size(collect_set(locations)) AS unique_locations
from
mytable
;
使用两个查询适用于 count(distinct var):
SELECT
count(1)
FROM (
SELECT DISTINCT locations as unique_locations
from my_table
) t;
尺码也一样 collect_set 我认为:
SELECT
size(unique_locations)
FROM (
SELECT collect_set(locations) as unique_locations
from my_table
) t;
如何重写这些查询以避免在 reduce 阶段使用单个 reducer?它需要永远,我失去了使用它的并行性的好处。
select id
, count(distinct locations) AS unique_locations
from
mytable
;
和
select id
, size(collect_set(locations)) AS unique_locations
from
mytable
;
使用两个查询适用于 count(distinct var):
SELECT
count(1)
FROM (
SELECT DISTINCT locations as unique_locations
from my_table
) t;
尺码也一样 collect_set 我认为:
SELECT
size(unique_locations)
FROM (
SELECT collect_set(locations) as unique_locations
from my_table
) t;