Hadoop:是否可以避免某些文件的复制?
Hadoop: Is it possible to avoid replication for certain files?
据我所知,在 hdfs 中所有文件都被复制,但我们会在我们的工作中进行某些日志记录,我们不希望复制这些文件,因为它可能会不必要地维护复制副本,是否可以这样做?即避免仅复制日志文件。?
您可以通过使用 -setrep 标志和 hadoop fs shell 命令来设置复制。
Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.
Options:
The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.
The -R flag is accepted for backwards compatibility. It has no effect.
Example:
hadoop fs -setrep -w 3 /user/hadoop/dir1
要避免复制,您可以将 numReplicas 设置为 1。
据我所知,在 hdfs 中所有文件都被复制,但我们会在我们的工作中进行某些日志记录,我们不希望复制这些文件,因为它可能会不必要地维护复制副本,是否可以这样做?即避免仅复制日志文件。?
您可以通过使用 -setrep 标志和 hadoop fs shell 命令来设置复制。
Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.
Options:
The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.
The -R flag is accepted for backwards compatibility. It has no effect.
Example:
hadoop fs -setrep -w 3 /user/hadoop/dir1
要避免复制,您可以将 numReplicas 设置为 1。