Hadoop：是否可以避免某些文件的复制？

Question

据我所知，在 hdfs 中所有文件都被复制，但我们会在我们的工作中进行某些日志记录，我们不希望复制这些文件，因为它可能会不必要地维护复制副本，是否可以这样做？即避免仅复制日志文件。?

Answer 1

您可以通过使用 -setrep 标志和 hadoop fs shell 命令来设置复制。

Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>

Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.

Options:

The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.
The -R flag is accepted for backwards compatibility. It has no effect.
Example:

hadoop fs -setrep -w 3 /user/hadoop/dir1

要避免复制，您可以将 numReplicas 设置为 1。

Hadoop：是否可以避免某些文件的复制？

Hadoop: Is it possible to avoid replication for certain files?

replication

hadoop

hdfs