AWS Data Pipeline 恢复到 DynamoDB table 错误 "is in status 'CANCELLED' with reason 'Job terminated' "

AWS Data Pipeline restore to DynamoDB table error with "is in status 'CANCELLED' with reason 'Job terminated' "

我已配置 AWS Data Pipeline 以将 DynamoDB table 导出到不同账户中的 S3 存储桶(使用模板)。导出工作正常,但是当我尝试将备份恢复到第二个帐户中的新 table 时遇到一些问题(也使用导入模板)。

我的这个任务的信息来源:https://aws.amazon.com/premiumsupport/knowledge-center/data-pipeline-account-access-dynamodb-s3/

  1. 我可以看到 AWS Data Pipeline 正在将数据恢复到新的 table(不确定是否正在恢复所有数据)但是执行状态为 CANCELED .

  2. activity 日志多次显示:EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'RUNNING' with reason 'null',

2.a) 然后取消的部分 EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'CANCELLED' with reason 'Job terminated'

查看下面的完整日志(只剩下几行,第 2 点有错误):

07 Sep 2020 12:52:04,844 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.EmrActivity@1d0415c
07 Sep 2020 12:52:04,887 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 private.com.amazonaws.services.datapipeline.factory.S3ClientFactory: Returning cached AmazonS3Client for the region [eu-west-1]
07 Sep 2020 12:52:04,945 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.EmrActivity: EMR transform starting.
07 Sep 2020 12:52:04,957 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client waiting for cluster to enter ready state for jobflow id 'j-11620944P11II'.
07 Sep 2020 12:52:04,957 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client checking if cluster is ready for jobflow with id 'j-11620944P11II'.
07 Sep 2020 12:52:05,141 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client reports that cluster with jobflow id 'j-11620944P11II' is ready.
07 Sep 2020 12:52:05,200 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client adding steps with request '{JobFlowId: j-11620944P11II,Steps: [{Name: df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1,ActionOnFailure: CONTINUE,HadoopJarStep: {Properties: [],Jar: s3://dynamodb-dpl-eu-west-1/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,Args: [org.apache.hadoop.dynamodb.tools.DynamoDBImport, s3://dynamodb-backup-imported/2020-09-06-12-19-11/, blabla-test6, 0.25]}}]}'
07 Sep 2020 12:53:05,352 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'RUNNING' with reason 'null'
07 Sep 2020 13:48:08,772 [WARN] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59' with jobFlowId 'j-11620944P11II' is in status 'WAITING' because of the step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' failures 'Job terminated'
07 Sep 2020 13:48:08,772 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'CANCELLED' with reason 'Job terminated'
07 Sep 2020 13:48:08,772 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI null
07 Sep 2020 13:48:08,777 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg :
07 Sep 2020 13:48:08,777 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: Collecting steps logs for cluster with AMI/ReleaseLabel emr-5.23.0
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelperFactory: Getting the helper for version 2.8.3
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Uploading step log details
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: path to step logss3n://srh-data-export-int2/df-06812232H5PDR4VVK472/EmrClusterForLoad/@EmrClusterForLoad_2020-09-07T12:45:59/@EmrClusterForLoad_2020-09-07T12:45:59_Attempt=1/j-11620944P11II/steps
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: step log file /mnt/taskRunner/output/logs/df-06812232H5PDR4VVK472/TableLoadActivity/@TableLoadActivity_2020-09-07T12:45:59/@TableLoadActivity_2020-09-07T12:45:59_Attempt=1/hadoop.jobs.log
07 Sep 2020 13:48:08,782 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done uploading hadoop log details
07 Sep 2020 13:48:08,842 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Field value updated 
07 Sep 2020 13:48:08,842 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done updating the field with value 
07 Sep 2020 13:48:08,844 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @TableLoadActivity_2020-09-07T12:45:59_Attempt=1
07 Sep 2020 13:48:08,845 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.TaskPoller: Work EmrActivity took 56:4 to complete
  1. 如果我转到依赖项“EmrClusterForLoad”,我会看到:
@failureReason Resource timeout due to terminateAfter configuration
@status TIMEDOUT

  1. 我的activity是图片上的,step字段有这个配置
s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBImport,#{input.directoryPath},#{output.tableName},#{output.writeThroughputPercent}

  1. 资源配置:

我想了解一下我在这里遗漏了什么,我该如何解决,或者是否是错误,因为我在新的[=]中看到了一些数据68=],所以恢复至少部分正常,但是日志中的这个:status 'WAITING' and reason 'Cluster ready after last step completed然后被取消意味着恢复可能没有完全完成。

然后我在某处读到此错误消息 @failureReason Resource timeout due to terminateAfter configuration 是关于添加一个名为 terminateAfter 的可选字段的问题,这在我的 Architect 视图中不可用。

自我回答

问题是我设置了 Terminate After 字段,因为我在下面的图片上收到了建议这样做的警告消息,所以我设置了 Terminate After 1 小时以及我使用它的原因时间是因为要导入的文件只有9,6 MB。处理小文件需要多少时间?

所以那个小文件的导入过程持续了大约 5 个小时。

调查结果:

为了缩短导入时间,我将 myDDBWriteThroughputRatio 值从 0.25 增加到 0.95,一开始我没有触及该参数,因为它是模板的默认值,AWS 文档有时会简化一个很多东西在很多情况下你必须通过反复试验才能发现。

更改该值后,导入持续大约一个小时,比 5 小时好得多,但仍然很慢,因为我们只讨论 9,6 MB

然后我在日志 is in status 'WAITING' and reason 'Cluster ready after last step completed.' 中看到了这个,这让我有点担心,因为我是使用这个工具的新手而且我没有完全理解消息,只是下面有人解释的来自 AWS

*

If you see that the EMR Cluster is in Waiting, Cluster ready after last steps, it means that cluster had executed the first request it has received and is waiting to execute the next request/activity on the cluster.

这些都是我的发现,希望这对其他人有帮助。

在我的例子中,它在我将容量更改为按需后工作。