在 MapReduce Hadoop 中排序

Sorting in MapReduce Hadoop

我有几个关于 Hadoop MapReduce 的基本问题。

MapReduce 中有保证排序的地方吗？

1.Assume if 100 mappers were executed and zero reducer. Will it generate 100 files?

是的。

All individual are sorted?

没有。如果不使用缩减器，则映射器的输出不会排序。排序仅在存在减少阶段时发生。

Across all mapper output are sorted?

不是，原因同上。

2.Input for reducer is Key -> Values. For each key, all values are sorted?

没有。但是，键是排序的。在改组阶段之后，reducer 获得映射器的输出，它对映射器的排序输出键进行合并排序（因为有一个减少阶段），当它开始减少时，键被排序。

3.Assume if 50 reducers were executed. Will it generate 50 files?

是的。（除非你使用 MultipleOutputs）

All individual files are sorted?

没有。排序后的输入不保证排序后的输出。输出取决于您在 reduce 方法中使用的算法。

Across all reducer output are sorted?

不是，原因同上。但是，如果您使用 Identity Reducer，即，您只需在获得它时写入 reducer 的输入，reducer 的输出将按每个 REDUCER 排序，而不是全局排序。

Is there any place where guaranteed sorting happens in MapReduce?

排序发生在reduce 阶段，并且应用于每个mapper 的输出键和每个reducer 的输入键。如果你想全局排序reducer的输入，你可以使用单个reducer，或者一个TotalOrderPartitioner，这有点棘手...