将文件从 s3:// 复制到本地(hadoop)文件系统时出错
Getting error when copying file from s3:// to local(hadoop) file system
我正在尝试使用 python 将文件从 s3 复制到 hadoop 文件系统。我收到以下错误:
cp: `foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz': No such file or directory
我最近正在迁移最新的 hadoop 版本 (2.4.0)。在版本(0.20)中工作正常。为什么我在 2.4.0 版本中出现此错误?
Hadoop 版本 0.20
hadoop@ip-10-76-38-167:~$ /home/hadoop/bin/hadoop fs -cp s3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz /foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz
15/02/13 11:21:45 INFO s3native.NativeS3FileSystem: Opening 's3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz' for reading
Hadoop 版本 2.4.0
[hadoop@ip-10-169-19-123 ~]$ /home/hadoop/bin/hadoop fs -cp s3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz /foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz
15/02/13 11:21:37 INFO guice.EmrFSBaseModule: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as FileSystem implementation.
15/02/13 11:21:38 INFO fs.EmrFileSystem: Using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
cp: `foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz': No such file or directory
你需要这样尝试。添加“
/home/hadoop/bin/hadoop fs -cp "s3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz" "/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz"
我自己找到了答案。
Using `distcp` instead of `fs -cp`.
这个命令没有任何问题。
我正在尝试使用 python 将文件从 s3 复制到 hadoop 文件系统。我收到以下错误:
cp: `foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz': No such file or directory
我最近正在迁移最新的 hadoop 版本 (2.4.0)。在版本(0.20)中工作正常。为什么我在 2.4.0 版本中出现此错误?
Hadoop 版本 0.20
hadoop@ip-10-76-38-167:~$ /home/hadoop/bin/hadoop fs -cp s3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz /foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz
15/02/13 11:21:45 INFO s3native.NativeS3FileSystem: Opening 's3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz' for reading
Hadoop 版本 2.4.0
[hadoop@ip-10-169-19-123 ~]$ /home/hadoop/bin/hadoop fs -cp s3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz /foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz
15/02/13 11:21:37 INFO guice.EmrFSBaseModule: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as FileSystem implementation.
15/02/13 11:21:38 INFO fs.EmrFileSystem: Using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
cp: `foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz': No such file or directory
你需要这样尝试。添加“
/home/hadoop/bin/hadoop fs -cp "s3://test.com/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz" "/foo/ds=2015-02-13/ip-d1b-request-2015-02-13_10-00_10-09.txt.gz"
我自己找到了答案。
Using `distcp` instead of `fs -cp`.
这个命令没有任何问题。