reducer 占用 mapper 核心

reducer takes mapper cores

hadoop
mapreduce
hadoop-yarn

我正在运行在具有 88 个内核和 60 个减速器的 hadoop 集群上执行 mapreduce 作业。由于某种原因，它只使用了 79 个集群核心。一开始它运行s 有 79 个映射器，但当完成一半拆分时，它使用 53 个映射器和 26 个缩减器，运行ning 映射器的数量后来继续减少，这增加了作业完成时间。日志显示这 26 个 reducer 正在复制计算数据。是否有可能首先使 hadoop 运行所有映射器都在减速器之后？就像在 spark 或 tez 作业中一样，他们使用所有内核进行映射，然后使用所有内核进行缩减。

将 mapreduce.job.reduce.slowstart.completedmaps 设置为 1.0。引自 mapred-default.xml:

mapreduce.job.reduce.slowstart.completedmaps

0.05

Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.

reducer 占用 mapper 核心

reducer takes mapper cores

hadoop

mapreduce

hadoop-yarn