将两个源字段中的任一个映射到单个目标字段

Map Either of Two Source Fields to a Single Target Field

我是 AWS Glue 的新手,我正在努力解决一个问题。我们最近更改了数据库中的一个字段名称,现在我不知道如何在 Glue 中创建映射以支持旧字段名称和新字段名称。

遗留映射类似于:

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")

我们规范化了 json 属性 个名字,json_property['Foo Bar'] 变成了 json_property.foo_bar。我试过这样做:

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string"), ("json_property.foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")

基本上我尝试将两个源字段映射到同一个目标字段。正如预期的那样,这在尝试 运行 作业时导致错误...

有什么方法可以让进程从源获取 json_property.foo_barjson_property['Foo Bar'](以存在者为准)并映射到 foo_bar 目标字段?

我通过在使用 ApplyMapping 之前添加一个映射步骤来解决这个问题,以便将旧字段名称映射到更新的字段名称

## @type: DataSource
## @args: [database = "s3 olap", table_name = "example", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3 olap, table_name = "example", transformation_ctx = "datasource0")

## @type: Map
## @args: [f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields"]
## @return: datasource_mapped
## @inputs: [frame = datasource0]
def MergeLegacyFields(rec):
  if 'Foo Bar' in rec:
    rec['foo_bar'] = rec['Foo Bar']
  return rec

datasource_mapped = Map.apply(frame = datasource0, f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields")

## @type: ApplyMapping
## @args: [mapping = [("foo_bar", "string", "foo_bar", "timestamp")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource_mapped]
applymapping1 = ApplyMapping.apply(frame = datasource_mapped, mappings = [("foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")