AWS 数据管道无法通过无服务器 yaml 模板创建
AWS data pipeline unable to create through serverless yaml template
I was creating data pipeline for dynamo db export to s3. The template
given for serverless yaml is not working on "PAY_PER_REQUEST" billing
mode
Created one using aws console itr worked fine, exported its definition, tried to
create using same definition in serverless but it is giving me
following error
ServerlessError: An error occurred: UrlReportDataPipeline - Pipeline Definition failed to validate because of following Errors: [{ObjectId = 'TableBackupActivity', errors = [Object references invalid id: 's3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}']}] and Warnings: [].
谁能帮我解决这个问题。使用控制台创建的管道与 table 备份 activity.
中相同的步骤值完美配合
管道模板粘贴在下方
UrlReportDataPipeline:
Type: AWS::DataPipeline::Pipeline
Properties:
Name: ***pipeline name****
Activate: true
ParameterObjects:
- Id: "myDDBReadThroughputRatio"
Attributes:
- Key: "description"
StringValue: "DynamoDB read throughput ratio"
- Key: "type"
StringValue: "Double"
- Key: "default"
StringValue: "0.9"
- Id: "myOutputS3Loc"
Attributes:
- Key: "description"
StringValue: "S3 output bucket"
- Key: "type"
StringValue: "AWS::S3::ObjectKey"
- Key: "default"
StringValue:
!Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
- Id: "myDDBTableName"
Attributes:
- Key: "description"
StringValue: "DynamoDB Table Name"
- Key: "type"
StringValue: "String"
- Id: "myDDBRegion"
Attributes:
- Key: "description"
StringValue: "DynamoDB region"
ParameterValues:
- Id: "myDDBTableName"
StringValue:
Ref: "UrlReport"
- Id: "myDDBRegion"
StringValue: "eu-west-1"
PipelineObjects:
- Id: "S3BackupLocation"
Name: "Copy data to this S3 location"
Fields:
- Key: "type"
StringValue: "S3DataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "directoryPath"
StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
- Id: "DDBSourceTable"
Name: "DDBSourceTable"
Fields:
- Key: "tableName"
StringValue: "#{myDDBTableName}"
- Key: "type"
StringValue: "DynamoDBDataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "readThroughputPercent"
StringValue: "#{myDDBReadThroughputRatio}"
- Id: "DDBExportFormat"
Name: "DDBExportFormat"
Fields:
- Key: "type"
StringValue: "DynamoDBExportDataFormat"
- Id: "TableBackupActivity"
Name: "TableBackupActivity"
Fields:
- Key: "resizeClusterBeforeRunning"
StringValue: "true"
- Key: "type"
StringValue: "EmrActivity"
- Key: "input"
RefValue: "DDBSourceTable"
- Key: "runsOn"
RefValue: "EmrClusterForBackup"
- Key: "output"
RefValue: "S3BackupLocation"
- Key: "step"
RefValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
- Id: "DefaultSchedule"
Name: "Every 1 day"
Fields:
- Key: "occurrences"
StringValue: "1"
- Key: "startDateTime"
StringValue: "2020-09-17T1:00:00"
- Key: "type"
StringValue: "Schedule"
- Key: "period"
StringValue: "1 Day"
- Id: "Default"
Name: "Default"
Fields:
- Key: "type"
StringValue: "Default"
- Key: "scheduleType"
StringValue: "cron"
- Key: "failureAndRerunMode"
StringValue: "CASCADE"
- Key: "role"
StringValue: "DatapipelineDefaultRole"
- Key: "resourceRole"
StringValue: "DatapipelineDefaultResourceRole"
- Key: "schedule"
RefValue: "DefaultSchedule"
- Id: "EmrClusterForBackup"
Name: "EmrClusterForBackup"
Fields:
- Key: "terminateAfter"
StringValue: "2 Hours"
- Key: "masterInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceCount"
StringValue: "1"
- Key: "type"
StringValue: "EmrCluster"
- Key: "releaseLabel"
StringValue: "emr-5.23.0"
- Key: "region"
StringValue: "#{myDDBRegion}"
Step 有一个指向多个资源的 refValue,而且它们看起来也被指定为一个字符串。根据无服务器文档,refValue 是
您指定为同一管道定义中另一个对象的标识符的字段值。
如果您查看使用 S3BackupLocation 的位置,它会在 PipelineObjects 下创建,然后使用其 Id 进行引用。
对于步骤,您使用字符串作为其值的 refValue,该字符串随后带有逗号,因此它看起来像是在指定多个对象。
我不确定要执行哪一步,但如果您想使用 refValue,请在模板中的其他位置创建它并在此处使用它的 ID?
也可以尝试在此处使用字符串值而不是引用值
Guys I solved it with AWS support team. As of Today, following is the
yaml code which creates a data-pipleine on on-demand pay-per-request
dynamodb tables
You can also convert this to json if you want
UrlReportBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ***bucketname***
UrlReportDataPipeline:
Type: AWS::DataPipeline::Pipeline
Properties:
Name: ***pipelinename***
Activate: true
ParameterObjects:
- Id: "myDDBReadThroughputRatio"
Attributes:
- Key: "description"
StringValue: "DynamoDB read throughput ratio"
- Key: "type"
StringValue: "Double"
- Key: "default"
StringValue: "0.9"
- Id: "myOutputS3Loc"
Attributes:
- Key: "description"
StringValue: "S3 output bucket"
- Key: "type"
StringValue: "AWS::S3::ObjectKey"
- Key: "default"
StringValue:
!Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
- Id: "myDDBTableName"
Attributes:
- Key: "description"
StringValue: "DynamoDB Table Name"
- Key: "type"
StringValue: "String"
- Id: "myDDBRegion"
Attributes:
- Key: "description"
StringValue: "DynamoDB region"
ParameterValues:
- Id: "myDDBTableName"
StringValue:
Ref: "UrlReport"
- Id: "myDDBRegion"
StringValue: "eu-west-1"
PipelineObjects:
- Id: "S3BackupLocation"
Name: "Copy data to this S3 location"
Fields:
- Key: "type"
StringValue: "S3DataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "directoryPath"
StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
- Id: "DDBSourceTable"
Name: "DDBSourceTable"
Fields:
- Key: "tableName"
StringValue: "#{myDDBTableName}"
- Key: "type"
StringValue: "DynamoDBDataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "readThroughputPercent"
StringValue: "#{myDDBReadThroughputRatio}"
- Id: "DDBExportFormat"
Name: "DDBExportFormat"
Fields:
- Key: "type"
StringValue: "DynamoDBExportDataFormat"
- Id: "TableBackupActivity"
Name: "TableBackupActivity"
Fields:
- Key: "resizeClusterBeforeRunning"
StringValue: "true"
- Key: "type"
StringValue: "EmrActivity"
- Key: "input"
RefValue: "DDBSourceTable"
- Key: "runsOn"
RefValue: "EmrClusterForBackup"
- Key: "output"
RefValue: "S3BackupLocation"
- Key: "step"
StringValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{myDDBTableName},#{myDDBReadThroughputRatio}"
- Id: "DefaultSchedule"
Name: "Every 1 day"
Fields:
- Key: "occurrences"
StringValue: "1"
- Key: "startDateTime"
StringValue: "2020-09-23T1:00:00"
- Key: "type"
StringValue: "Schedule"
- Key: "period"
StringValue: "1 Day"
- Id: "Default"
Name: "Default"
Fields:
- Key: "type"
StringValue: "Default"
- Key: "scheduleType"
StringValue: "cron"
- Key: "failureAndRerunMode"
StringValue: "CASCADE"
- Key: "role"
StringValue: "DatapipelineDefaultRole"
- Key: "resourceRole"
StringValue: "DatapipelineDefaultResourceRole"
- Key: "schedule"
RefValue: "DefaultSchedule"
- Id: "EmrClusterForBackup"
Name: "EmrClusterForBackup"
Fields:
- Key: "terminateAfter"
StringValue: "2 Hours"
- Key: "masterInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceCount"
StringValue: "1"
- Key: "type"
StringValue: "EmrCluster"
- Key: "releaseLabel"
StringValue: "emr-5.23.0"
- Key: "region"
StringValue: "#{myDDBRegion}"
I was creating data pipeline for dynamo db export to s3. The template given for serverless yaml is not working on "PAY_PER_REQUEST" billing mode
Created one using aws console itr worked fine, exported its definition, tried to create using same definition in serverless but it is giving me following error
ServerlessError: An error occurred: UrlReportDataPipeline - Pipeline Definition failed to validate because of following Errors: [{ObjectId = 'TableBackupActivity', errors = [Object references invalid id: 's3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}']}] and Warnings: [].
谁能帮我解决这个问题。使用控制台创建的管道与 table 备份 activity.
中相同的步骤值完美配合管道模板粘贴在下方
UrlReportDataPipeline:
Type: AWS::DataPipeline::Pipeline
Properties:
Name: ***pipeline name****
Activate: true
ParameterObjects:
- Id: "myDDBReadThroughputRatio"
Attributes:
- Key: "description"
StringValue: "DynamoDB read throughput ratio"
- Key: "type"
StringValue: "Double"
- Key: "default"
StringValue: "0.9"
- Id: "myOutputS3Loc"
Attributes:
- Key: "description"
StringValue: "S3 output bucket"
- Key: "type"
StringValue: "AWS::S3::ObjectKey"
- Key: "default"
StringValue:
!Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
- Id: "myDDBTableName"
Attributes:
- Key: "description"
StringValue: "DynamoDB Table Name"
- Key: "type"
StringValue: "String"
- Id: "myDDBRegion"
Attributes:
- Key: "description"
StringValue: "DynamoDB region"
ParameterValues:
- Id: "myDDBTableName"
StringValue:
Ref: "UrlReport"
- Id: "myDDBRegion"
StringValue: "eu-west-1"
PipelineObjects:
- Id: "S3BackupLocation"
Name: "Copy data to this S3 location"
Fields:
- Key: "type"
StringValue: "S3DataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "directoryPath"
StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
- Id: "DDBSourceTable"
Name: "DDBSourceTable"
Fields:
- Key: "tableName"
StringValue: "#{myDDBTableName}"
- Key: "type"
StringValue: "DynamoDBDataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "readThroughputPercent"
StringValue: "#{myDDBReadThroughputRatio}"
- Id: "DDBExportFormat"
Name: "DDBExportFormat"
Fields:
- Key: "type"
StringValue: "DynamoDBExportDataFormat"
- Id: "TableBackupActivity"
Name: "TableBackupActivity"
Fields:
- Key: "resizeClusterBeforeRunning"
StringValue: "true"
- Key: "type"
StringValue: "EmrActivity"
- Key: "input"
RefValue: "DDBSourceTable"
- Key: "runsOn"
RefValue: "EmrClusterForBackup"
- Key: "output"
RefValue: "S3BackupLocation"
- Key: "step"
RefValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
- Id: "DefaultSchedule"
Name: "Every 1 day"
Fields:
- Key: "occurrences"
StringValue: "1"
- Key: "startDateTime"
StringValue: "2020-09-17T1:00:00"
- Key: "type"
StringValue: "Schedule"
- Key: "period"
StringValue: "1 Day"
- Id: "Default"
Name: "Default"
Fields:
- Key: "type"
StringValue: "Default"
- Key: "scheduleType"
StringValue: "cron"
- Key: "failureAndRerunMode"
StringValue: "CASCADE"
- Key: "role"
StringValue: "DatapipelineDefaultRole"
- Key: "resourceRole"
StringValue: "DatapipelineDefaultResourceRole"
- Key: "schedule"
RefValue: "DefaultSchedule"
- Id: "EmrClusterForBackup"
Name: "EmrClusterForBackup"
Fields:
- Key: "terminateAfter"
StringValue: "2 Hours"
- Key: "masterInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceCount"
StringValue: "1"
- Key: "type"
StringValue: "EmrCluster"
- Key: "releaseLabel"
StringValue: "emr-5.23.0"
- Key: "region"
StringValue: "#{myDDBRegion}"
Step 有一个指向多个资源的 refValue,而且它们看起来也被指定为一个字符串。根据无服务器文档,refValue 是
您指定为同一管道定义中另一个对象的标识符的字段值。
如果您查看使用 S3BackupLocation 的位置,它会在 PipelineObjects 下创建,然后使用其 Id 进行引用。
对于步骤,您使用字符串作为其值的 refValue,该字符串随后带有逗号,因此它看起来像是在指定多个对象。
我不确定要执行哪一步,但如果您想使用 refValue,请在模板中的其他位置创建它并在此处使用它的 ID?
也可以尝试在此处使用字符串值而不是引用值
Guys I solved it with AWS support team. As of Today, following is the yaml code which creates a data-pipleine on on-demand pay-per-request dynamodb tables
You can also convert this to json if you want
UrlReportBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ***bucketname***
UrlReportDataPipeline:
Type: AWS::DataPipeline::Pipeline
Properties:
Name: ***pipelinename***
Activate: true
ParameterObjects:
- Id: "myDDBReadThroughputRatio"
Attributes:
- Key: "description"
StringValue: "DynamoDB read throughput ratio"
- Key: "type"
StringValue: "Double"
- Key: "default"
StringValue: "0.9"
- Id: "myOutputS3Loc"
Attributes:
- Key: "description"
StringValue: "S3 output bucket"
- Key: "type"
StringValue: "AWS::S3::ObjectKey"
- Key: "default"
StringValue:
!Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
- Id: "myDDBTableName"
Attributes:
- Key: "description"
StringValue: "DynamoDB Table Name"
- Key: "type"
StringValue: "String"
- Id: "myDDBRegion"
Attributes:
- Key: "description"
StringValue: "DynamoDB region"
ParameterValues:
- Id: "myDDBTableName"
StringValue:
Ref: "UrlReport"
- Id: "myDDBRegion"
StringValue: "eu-west-1"
PipelineObjects:
- Id: "S3BackupLocation"
Name: "Copy data to this S3 location"
Fields:
- Key: "type"
StringValue: "S3DataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "directoryPath"
StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
- Id: "DDBSourceTable"
Name: "DDBSourceTable"
Fields:
- Key: "tableName"
StringValue: "#{myDDBTableName}"
- Key: "type"
StringValue: "DynamoDBDataNode"
- Key: "dataFormat"
RefValue: "DDBExportFormat"
- Key: "readThroughputPercent"
StringValue: "#{myDDBReadThroughputRatio}"
- Id: "DDBExportFormat"
Name: "DDBExportFormat"
Fields:
- Key: "type"
StringValue: "DynamoDBExportDataFormat"
- Id: "TableBackupActivity"
Name: "TableBackupActivity"
Fields:
- Key: "resizeClusterBeforeRunning"
StringValue: "true"
- Key: "type"
StringValue: "EmrActivity"
- Key: "input"
RefValue: "DDBSourceTable"
- Key: "runsOn"
RefValue: "EmrClusterForBackup"
- Key: "output"
RefValue: "S3BackupLocation"
- Key: "step"
StringValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{myDDBTableName},#{myDDBReadThroughputRatio}"
- Id: "DefaultSchedule"
Name: "Every 1 day"
Fields:
- Key: "occurrences"
StringValue: "1"
- Key: "startDateTime"
StringValue: "2020-09-23T1:00:00"
- Key: "type"
StringValue: "Schedule"
- Key: "period"
StringValue: "1 Day"
- Id: "Default"
Name: "Default"
Fields:
- Key: "type"
StringValue: "Default"
- Key: "scheduleType"
StringValue: "cron"
- Key: "failureAndRerunMode"
StringValue: "CASCADE"
- Key: "role"
StringValue: "DatapipelineDefaultRole"
- Key: "resourceRole"
StringValue: "DatapipelineDefaultResourceRole"
- Key: "schedule"
RefValue: "DefaultSchedule"
- Id: "EmrClusterForBackup"
Name: "EmrClusterForBackup"
Fields:
- Key: "terminateAfter"
StringValue: "2 Hours"
- Key: "masterInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceType"
StringValue: "m3.xlarge"
- Key: "coreInstanceCount"
StringValue: "1"
- Key: "type"
StringValue: "EmrCluster"
- Key: "releaseLabel"
StringValue: "emr-5.23.0"
- Key: "region"
StringValue: "#{myDDBRegion}"