AWS文件上传

Question

我想将几个文件从 hadoop 上传到 AWS 存储桶。我有 AWS 访问密钥、秘密密钥和 S3 导入路径。

我无法通过 AWS CLI 命令访问。我在 aws 凭证文件中设置了密钥。我试着做“aws s3 ls” 我收到错误

An error occurred (InvalidToken) when calling the ListBuckets operation: The provided token is malformed or otherwise invalid.

由于上面的代码不起作用，我尝试使用如下的 distcp 命令。

hadoop distcp -Dmapreduce.job.queuename=root.mr.sbg.sla -Dfs.s3a.proxy.host=qypprdproxy02.ie.xxx.net  -Dfs.s3a.proxy.port=80  -Dfs.s3a.endpoint=s3.us-west-2.amazonaws.com -Dfs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" -Dfs.s3a.access.key="AXXXXXXXXXXQ" -Dfs.s3a.secret.key="4I9nXXXXXXXXXXXXHA" -Dfs.s3a.session.token="FQoDYXdzECkaDNBtHNfS5sKxXqNdMyKeAuqLbVXG72KvcPmUtnpLGbM7UE59zjvNNo0u8mWlslCEvZcZLxXw1agAInzGH8vnGleqxjzuBBgXMXXXXXXXG0zpHA8eyrwCZqUBXSg9cdqevv1sFT8lUIEi5uTGLjHXgkQoBXXXXXXXXXXXXXXt80Rp4vb3P7k5N2AVZmuVvM/SEH/qMLiFabDbVliGXqw7MHXTXXXXXXXXXXXXXXXtW8JvmOFPR3nGdQ4VKzw0deSbNmL/BCivfh9pf7ubm5RFRSLxqcdoT7XAXIWf1jJguEGygcBkFRh2Ztvr8OYcG78hLEJX61ssbKWXokOKTBMnUxx4b0jIG1isXerDaO6RRVJdBrTXn2Somzigo4ZbL0wU=" TXXXX/Data/LiXXXXL/HS/ABC/part-1517397360173-r-00000 s3a://data-import-dev/1012018.csv

对于上面的命令，我也遇到了以下错误。

18/11/09 00:55:40 INFO http.AmazonHttpClient: Configuring Proxy. Proxy Host: qypprdproxy02.ie.XXXX.net Proxy Port: 80 18/11/09 00:55:40 WARN s3a.S3AFileSystem: Client: Amazon S3 error 400: 400 Bad Request; Bad Request (retryable)

com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg= at com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:312) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2815) at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2852) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:205) at org.apache.hadoop.tools.DistCp.run(DistCp.java:131) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:441) 18/11/09 00:55:40 ERROR tools.DistCp: Invalid arguments: org.apache.hadoop.fs.s3a.AWSS3IOException: doesBucketExist on segmentor-data-import-dev: com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg=: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:318) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2815) at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2852) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:205) at org.apache.hadoop.tools.DistCp.run(DistCp.java:131) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:441) Caused by: com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg= at com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:312) ... 11 more Invalid arguments: doesBucketExist on segmentor-data-import-dev: com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg=: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0) usage: distcp OPTIONS [source_path...] OPTIONS -append Reuse existing data in target files and append new data to them if possible -async Should distcp execution be blocking -atomic Commit all changes or none -bandwidth Specify bandwidth per map in MB -delete
Delete from target, files missing in source -diff
Use snapshot diff report to identify the difference between source and target -f List of files that need to be copied -filelimit (Deprecated!) Limit number of files copied to <= n -filters The path to a file containing a list of strings for paths to be excluded from the copy. -i Ignore failures during copy -log Folder on DFS where distcp execution logs are saved -m Max number of concurrent maps to use for copy -mapredSslConf Configuration for ssl config file, to use with hftps://. Must be in the classpath. -numListstatusThreads Number of threads to use for building file listing (max 40). -overwrite Choose to overwrite target files unconditionally, even if they exist. -p preserve status (rbugpcaxt)(replication, block-size, user, group, permission, checksum-type, ACL, XATTR, timestamps). If -p is specified with no , then preserves replication, block size, user, group, permission, checksum type and timestamps. raw.* xattrs are preserved when both the source and destination paths are in the /.reserved/raw hierarchy (HDFS only). raw.* xattrpreservation is independent of the -p flag. Refer to the DistCp documentation for more details. -rdiff Use target snapshot diff report to identify changes made on target -sizelimit (Deprecated!) Limit number of files copied to <= n bytes -skipcrccheck Whether to skip CRC checks between source and target paths. -strategy Copy strategy to use. Default is dividing work based on file sizes -tmp Intermediate work path to be used for atomic commit -update Update target, copying only missingfiles or directories

请让我知道如何实现这一目标。

Answer 1

我遇到了同样的问题。当手动修改 ~.aws 中的文件而不是通过 "aws configure" 命令修改文件时，可能会出现此问题。

您是否尝试过：

删除 "config" 和 "credentials" 文件（位于 ~.aws）
运行 "aws configure" 命令（重新创建您在 #1 中删除的文件）

这已经解决了我的问题。

这主要是因为我使用其他工具也修改了这些文件。

希望对您有所帮助。

AWS文件上传

AWS file upload

hadoop

amazon-s3

amazon-web-services

distcp

s3distcp