将两个源字段中的任一个映射到单个目标字段
Map Either of Two Source Fields to a Single Target Field
我是 AWS Glue 的新手,我正在努力解决一个问题。我们最近更改了数据库中的一个字段名称,现在我不知道如何在 Glue 中创建映射以支持旧字段名称和新字段名称。
遗留映射类似于:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
我们规范化了 json 属性 个名字,json_property['Foo Bar']
变成了 json_property.foo_bar
。我试过这样做:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string"), ("json_property.foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
基本上我尝试将两个源字段映射到同一个目标字段。正如预期的那样,这在尝试 运行 作业时导致错误...
有什么方法可以让进程从源获取 json_property.foo_bar
或 json_property['Foo Bar']
(以存在者为准)并映射到 foo_bar
目标字段?
我通过在使用 ApplyMapping
之前添加一个映射步骤来解决这个问题,以便将旧字段名称映射到更新的字段名称
## @type: DataSource
## @args: [database = "s3 olap", table_name = "example", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3 olap, table_name = "example", transformation_ctx = "datasource0")
## @type: Map
## @args: [f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields"]
## @return: datasource_mapped
## @inputs: [frame = datasource0]
def MergeLegacyFields(rec):
if 'Foo Bar' in rec:
rec['foo_bar'] = rec['Foo Bar']
return rec
datasource_mapped = Map.apply(frame = datasource0, f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields")
## @type: ApplyMapping
## @args: [mapping = [("foo_bar", "string", "foo_bar", "timestamp")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource_mapped]
applymapping1 = ApplyMapping.apply(frame = datasource_mapped, mappings = [("foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
我是 AWS Glue 的新手,我正在努力解决一个问题。我们最近更改了数据库中的一个字段名称,现在我不知道如何在 Glue 中创建映射以支持旧字段名称和新字段名称。
遗留映射类似于:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
我们规范化了 json 属性 个名字,json_property['Foo Bar']
变成了 json_property.foo_bar
。我试过这样做:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string"), ("json_property.foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
基本上我尝试将两个源字段映射到同一个目标字段。正如预期的那样,这在尝试 运行 作业时导致错误...
有什么方法可以让进程从源获取 json_property.foo_bar
或 json_property['Foo Bar']
(以存在者为准)并映射到 foo_bar
目标字段?
我通过在使用 ApplyMapping
之前添加一个映射步骤来解决这个问题,以便将旧字段名称映射到更新的字段名称
## @type: DataSource
## @args: [database = "s3 olap", table_name = "example", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3 olap, table_name = "example", transformation_ctx = "datasource0")
## @type: Map
## @args: [f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields"]
## @return: datasource_mapped
## @inputs: [frame = datasource0]
def MergeLegacyFields(rec):
if 'Foo Bar' in rec:
rec['foo_bar'] = rec['Foo Bar']
return rec
datasource_mapped = Map.apply(frame = datasource0, f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields")
## @type: ApplyMapping
## @args: [mapping = [("foo_bar", "string", "foo_bar", "timestamp")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource_mapped]
applymapping1 = ApplyMapping.apply(frame = datasource_mapped, mappings = [("foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")