使用 S3 的 Spark Structured Streaming 提交失败
Spark Structured Streaming with S3 fails comitting
我是 运行 Spark 2.2 集群上的结构化流作业,该集群是 运行 在 AWS 上。我在 eu-central-1 中使用 S3 存储桶进行检查点设置。
对工作人员的某些提交操作似乎随机失败并出现以下错误:
17/10/04 13:20:34 WARN TaskSetManager: Lost task 62.0 in stage 19.0 (TID 1946, 0.0.0.0, executor 0): java.lang.IllegalStateException: Error committing version 1 into HDFSStateStore[id=(op=0,part=62),dir=s3a://bucket/job/query/state/0/62]
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:198)
at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$$anon.hasNext(statefulOperators.scala:230)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$$anonfun.apply(HashAggregateExec.scala:99)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$$anonfun.apply(HashAggregateExec.scala:97)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$$anonfun$apply.apply(RDD.scala:797)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$$anonfun$apply.apply(RDD.scala:797)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: abcdef==
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
使用以下选项提交作业以允许 eu-central-1 存储桶:
--packages org.apache.hadoop:hadoop-aws:2.7.4
--conf spark.hadoop.fs.s3a.endpoint=s3.eu-central-1.amazonaws.com
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
--conf spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true
--conf spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true
--conf spark.hadoop.fs.s3a.access.key=xxxxx
--conf spark.hadoop.fs.s3a.secret.key=xxxxx
我已经尝试过生成不带特殊字符的访问密钥并使用实例策略,两者都具有相同的效果。
您的日志显示:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status
Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXX, AWS
Error Code: SignatureDoesNotMatch, AWS Error Message: The request
signature we calculated does not match the signature you provided.
Check your key and signing method., S3 Extended Request ID: abcdef==
该错误表示凭据不正确。
val credentials = new com.amazonaws.auth.BasicAWSCredentials(
"ACCESS_KEY_ID",
"SECRET_ACCESS_KEY"
)
用于调试目的
1) 访问 Key/Secret 密钥均有效
2) Bucket名称是否正确
3) 在CLI 中开启日志记录并与SDK 进行比较
4) 按照此处的说明启用 SDK 日志记录:
http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/java-dg-logging.html。
您需要提供 log4j jar 和示例 log4j.properties 文件。
http://docs.aws.amazon.com/ses/latest/DeveloperGuide/get-aws-keys.html
Hadoop 团队经常发生这种情况 provide a troubleshooting guide。
但正如 Yuval 所说:直接提交到 S3 太危险了,而且创建的数据越多越慢,列出不一致的风险意味着有时数据会丢失,至少在Apache Hadoop 2.6-2.8 版本的 S3A
我是 运行 Spark 2.2 集群上的结构化流作业,该集群是 运行 在 AWS 上。我在 eu-central-1 中使用 S3 存储桶进行检查点设置。 对工作人员的某些提交操作似乎随机失败并出现以下错误:
17/10/04 13:20:34 WARN TaskSetManager: Lost task 62.0 in stage 19.0 (TID 1946, 0.0.0.0, executor 0): java.lang.IllegalStateException: Error committing version 1 into HDFSStateStore[id=(op=0,part=62),dir=s3a://bucket/job/query/state/0/62]
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:198)
at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$$anon.hasNext(statefulOperators.scala:230)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$$anonfun.apply(HashAggregateExec.scala:99)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$$anonfun.apply(HashAggregateExec.scala:97)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$$anonfun$apply.apply(RDD.scala:797)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$$anonfun$apply.apply(RDD.scala:797)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: abcdef==
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
使用以下选项提交作业以允许 eu-central-1 存储桶:
--packages org.apache.hadoop:hadoop-aws:2.7.4
--conf spark.hadoop.fs.s3a.endpoint=s3.eu-central-1.amazonaws.com
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
--conf spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true
--conf spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true
--conf spark.hadoop.fs.s3a.access.key=xxxxx
--conf spark.hadoop.fs.s3a.secret.key=xxxxx
我已经尝试过生成不带特殊字符的访问密钥并使用实例策略,两者都具有相同的效果。
您的日志显示:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: abcdef==
该错误表示凭据不正确。
val credentials = new com.amazonaws.auth.BasicAWSCredentials(
"ACCESS_KEY_ID",
"SECRET_ACCESS_KEY"
)
用于调试目的
1) 访问 Key/Secret 密钥均有效
2) Bucket名称是否正确
3) 在CLI 中开启日志记录并与SDK 进行比较
4) 按照此处的说明启用 SDK 日志记录:
http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/java-dg-logging.html。
您需要提供 log4j jar 和示例 log4j.properties 文件。
http://docs.aws.amazon.com/ses/latest/DeveloperGuide/get-aws-keys.html
Hadoop 团队经常发生这种情况 provide a troubleshooting guide。
但正如 Yuval 所说:直接提交到 S3 太危险了,而且创建的数据越多越慢,列出不一致的风险意味着有时数据会丢失,至少在Apache Hadoop 2.6-2.8 版本的 S3A