在 AWS EMR 上使用 qubole/streamx 时出现 AWS S3 访问问题
AWS S3 access issue when using qubole/streamx on AWS EMR
我正在使用 qubole/streamx 作为 kafka 接收器连接器来使用 kafka 中的数据并将它们存储在 AWS S3 中。
我在 AIM 中创建了一个用户,权限为 AmazonS3FullAccess
。然后在 hdfs-site.xml 中设置密钥 ID 和密钥,在 quickstart-s3.properties
.
中分配的目录
配置如下:
快速入门-s3.properties:
name=s3-sink
connector.class=com.qubole.streamx.s3.S3SinkConnector
format.class=com.qubole.streamx.SourceFormat
tasks.max=1
topics=test
flush.size=3
s3.url=s3://myemrbuckettest/raw_data
hadoop.conf.dir=/data/java/streamx/resource/hadoop-conf
hdfs-site.xml:
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>BALABALABALA</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>balabalabala</value>
并在使用 connect-standalone ./connect-standalone.properties ./quickstart-s3.properties
启动接收器连接器时出现此错误。
[2017-02-14 18:30:32,943] INFO GHFS version: 1.6.0-hadoop2 (com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase:597)
[2017-02-14 18:30:36,406] INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:85)
org.apache.kafka.connect.errors.ConnectException: org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myemrbuckettest/raw_data
at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:205)
at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:77)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:221)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myemrbuckettest/raw_data
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy42.retrieveMetadata(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdir(NativeS3FileSystem.java:601)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdirs(NativeS3FileSystem.java:594)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at com.qubole.streamx.s3.S3Storage.mkdirs(S3Storage.java:67)
at io.confluent.connect.hdfs.DataWriter.createDir(DataWriter.java:372)
at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:173)
... 10 more
Caused by: org.jets3t.service.impl.rest.HttpException
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:519)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
我使用的区域是cn-north-1。需要在 hdfs-site.xml 中指定区域信息,如下所示,否则它将默认连接到 s3.amazonaws.cn。
<property>
<name>fs.s3a.access.key</name>
<value>BALABALABALA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>balabalabalabala</value>
</property>
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<!--value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value-->
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.cn-north-1.amazonaws.com.cn</value>connect-standalone ./connect-standalone.properties ./quickstart-s3.properties
</property>
我正在使用 qubole/streamx 作为 kafka 接收器连接器来使用 kafka 中的数据并将它们存储在 AWS S3 中。
我在 AIM 中创建了一个用户,权限为 AmazonS3FullAccess
。然后在 hdfs-site.xml 中设置密钥 ID 和密钥,在 quickstart-s3.properties
.
配置如下:
快速入门-s3.properties:
name=s3-sink
connector.class=com.qubole.streamx.s3.S3SinkConnector
format.class=com.qubole.streamx.SourceFormat
tasks.max=1
topics=test
flush.size=3
s3.url=s3://myemrbuckettest/raw_data
hadoop.conf.dir=/data/java/streamx/resource/hadoop-conf
hdfs-site.xml:
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>BALABALABALA</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>balabalabala</value>
并在使用 connect-standalone ./connect-standalone.properties ./quickstart-s3.properties
启动接收器连接器时出现此错误。
[2017-02-14 18:30:32,943] INFO GHFS version: 1.6.0-hadoop2 (com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase:597)
[2017-02-14 18:30:36,406] INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:85)
org.apache.kafka.connect.errors.ConnectException: org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myemrbuckettest/raw_data
at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:205)
at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:77)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:221)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myemrbuckettest/raw_data
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy42.retrieveMetadata(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdir(NativeS3FileSystem.java:601)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdirs(NativeS3FileSystem.java:594)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at com.qubole.streamx.s3.S3Storage.mkdirs(S3Storage.java:67)
at io.confluent.connect.hdfs.DataWriter.createDir(DataWriter.java:372)
at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:173)
... 10 more
Caused by: org.jets3t.service.impl.rest.HttpException
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:519)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
我使用的区域是cn-north-1。需要在 hdfs-site.xml 中指定区域信息,如下所示,否则它将默认连接到 s3.amazonaws.cn。
<property>
<name>fs.s3a.access.key</name>
<value>BALABALABALA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>balabalabalabala</value>
</property>
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<!--value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value-->
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.cn-north-1.amazonaws.com.cn</value>connect-standalone ./connect-standalone.properties ./quickstart-s3.properties
</property>