如何将reduce分区放入hadoop集群中设计的机器中？

How to put the reduce partitions into designed machines in hadoop cluster?

例如：

减少结果：part-00000，part-00001 ... part-00008，集群有 3 个数据节点，我想

将 part-00000、part-00001 和 part-00002 放入 slave0
将 part-00003、part-00004 和 part-00005 放入 slave1
将 part-00006、part-00007 和 part-00008 放入 slave2

我该怎么做？

不是这样的。 HDFS 中的文件不存储在任何特定的数据节点中。每个文件由块组成，每个块被复制到多个节点（默认为 3 个）。所以每个文件实际上存储在不同的节点中，因为组成它的块存储在不同的节点中。

HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

看到你问题中的分区标签，可能值得说明的是，分区程序定义了分区（不是数据节点），每个键将结束。例如，知道您有 9 个 reduce 任务（9 个分区），您可能希望平均分配每个此类任务的工作量。为此，您可以定义它，例如，应将以字母 "s" 开头的键发送到分区 0，将以字母 "a" 或 "b" 开头的键发送到分区1 等（只是一个愚蠢的例子来说明分区程序的作用）。

如何将reduce分区放入hadoop集群中设计的机器中？

How to put the reduce partitions into designed machines in hadoop cluster?

reduce

hadoop

mapreduce

hdfs

partition