时间戳未从 Glue 加载到 Redshift Table
Timestamp not Loading into Redshift Table from Glue
我在存储在 S3 中的 csv 文件中有一个 YYYY-MM-DD XX:XX:XX
格式的时间戳,但是当我使用时间戳数据类型使用 Glue 加载到 Redshift 数据库时,时间戳列为空。看来格式是有效的,但我也尝试了 YYYYMMDD XXXXXX
和 YYMMDD XX:XX:XX
格式,以防万一。
我在 Glue 中的映射从时间戳到时间戳,table 的列数据类型也是时间戳。 csv 格式的数据示例:
1,2016 Summer,2016-06-22 00:00:00
实际输出:
Line | Term | Date
-----+-------------+------------
1 | 2016 Summer |
预期输出:
Line | Term | Date
-----+-------------+---------------------
1 | 2016 Summer | 2016-06-22 00:00:00
看起来这应该是一个简单的任务,但我无法正确完成,所以如果其他人能发现我的错误,将不胜感激。
代码:
val datasource37 = glueContext.getCatalogSource(database = "data", tableName = "term", redshiftTmpDir = "", transformationContext = "datasource37").getDynamicFrame()
val applymapping37 = datasource37.applyMapping(mappings = Seq(("id", "bigint", "id", "bigint"), ("name", "string", "name", "varchar(256)"), ("date", "timestamp", "date_start", "timestamp")), caseSensitive = false, transformationContext = "applymapping37")
val resolvechoice37 = applymapping37.resolveChoice(choiceOption = Some(ChoiceOption("make_cols")), transformationContext = "resolvechoice37")
val dropnullfields37 = resolvechoice37.dropNulls(transformationContext = "dropnullfields37")
val datasink37 = glueContext.getJDBCSink(catalogConnection = "dataConnection", options = JsonOptions("""{"dbtable": "term", "database": "data"}"""), redshiftTmpDir = args("TempDir"), transformationContext = "datasink37").writeDynamicFrame(dropnullfields37)
我最终完成了从字符串 -> 时间戳的映射,并且成功了。 Glue 让它自动从时间戳 -> 时间戳映射,所以我认为它是正确的。
例如:
val applymapping37 = datasource37.applyMapping
(mappings = Seq(("id", "bigint", "id", "bigint"),
("name", "string", "name", "varchar(256)"),
("date", "string", "date_start", "timestamp")),
caseSensitive = false, transformationContext = "applymapping37")
我在存储在 S3 中的 csv 文件中有一个 YYYY-MM-DD XX:XX:XX
格式的时间戳,但是当我使用时间戳数据类型使用 Glue 加载到 Redshift 数据库时,时间戳列为空。看来格式是有效的,但我也尝试了 YYYYMMDD XXXXXX
和 YYMMDD XX:XX:XX
格式,以防万一。
我在 Glue 中的映射从时间戳到时间戳,table 的列数据类型也是时间戳。 csv 格式的数据示例:
1,2016 Summer,2016-06-22 00:00:00
实际输出:
Line | Term | Date
-----+-------------+------------
1 | 2016 Summer |
预期输出:
Line | Term | Date
-----+-------------+---------------------
1 | 2016 Summer | 2016-06-22 00:00:00
看起来这应该是一个简单的任务,但我无法正确完成,所以如果其他人能发现我的错误,将不胜感激。
代码:
val datasource37 = glueContext.getCatalogSource(database = "data", tableName = "term", redshiftTmpDir = "", transformationContext = "datasource37").getDynamicFrame()
val applymapping37 = datasource37.applyMapping(mappings = Seq(("id", "bigint", "id", "bigint"), ("name", "string", "name", "varchar(256)"), ("date", "timestamp", "date_start", "timestamp")), caseSensitive = false, transformationContext = "applymapping37")
val resolvechoice37 = applymapping37.resolveChoice(choiceOption = Some(ChoiceOption("make_cols")), transformationContext = "resolvechoice37")
val dropnullfields37 = resolvechoice37.dropNulls(transformationContext = "dropnullfields37")
val datasink37 = glueContext.getJDBCSink(catalogConnection = "dataConnection", options = JsonOptions("""{"dbtable": "term", "database": "data"}"""), redshiftTmpDir = args("TempDir"), transformationContext = "datasink37").writeDynamicFrame(dropnullfields37)
我最终完成了从字符串 -> 时间戳的映射,并且成功了。 Glue 让它自动从时间戳 -> 时间戳映射,所以我认为它是正确的。
例如:
val applymapping37 = datasource37.applyMapping
(mappings = Seq(("id", "bigint", "id", "bigint"),
("name", "string", "name", "varchar(256)"),
("date", "string", "date_start", "timestamp")),
caseSensitive = false, transformationContext = "applymapping37")