Map only 任务中会出现 Shuffle 和 sort 吗?
Will there be Shuffle and sort in Map only task?
shuffle 和 sort 阶段是在 map 任务结束之前进行,还是在 map 任务生成输出之后进行,以便不再回头查看 map 任务。这是一个让我感到困惑的 'Map only task' 案例。
如果在 Map only 任务中没有 Shuffle 和 sort,谁能解释一下数据是如何写入最终输出文件的。
当你有一个 map-only 任务时,根本没有混洗,这意味着映射器会将最终输出直接写入 HDFS。
另一方面,当你有一个完整的 Map-Reduce 程序时,带有映射器和缩减器,是的,洗牌可以在缩减阶段开始之前开始。
在 SO 中引用 this very nice answer:
First of all shuffling is the process of transfering data from the
mappers to the reducers, so I think it is obvious that it is necessary
for the reducers, since otherwise, they wouldn't be able to have any
input (or input from every mapper). Shuffling can start even before
the map phase has finished, to save some time. That's why you can see
a reduce status greater than 0% (but less than 33%) when the map
status is not yet 100%.
希望这个答案已经澄清了您的困惑。
shuffle 和 sort 阶段是在 map 任务结束之前进行,还是在 map 任务生成输出之后进行,以便不再回头查看 map 任务。这是一个让我感到困惑的 'Map only task' 案例。 如果在 Map only 任务中没有 Shuffle 和 sort,谁能解释一下数据是如何写入最终输出文件的。
当你有一个 map-only 任务时,根本没有混洗,这意味着映射器会将最终输出直接写入 HDFS。
另一方面,当你有一个完整的 Map-Reduce 程序时,带有映射器和缩减器,是的,洗牌可以在缩减阶段开始之前开始。
在 SO 中引用 this very nice answer:
First of all shuffling is the process of transfering data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn't be able to have any input (or input from every mapper). Shuffling can start even before the map phase has finished, to save some time. That's why you can see a reduce status greater than 0% (but less than 33%) when the map status is not yet 100%.
希望这个答案已经澄清了您的困惑。