AWS 胶水作业将字符串映射到日期和时间格式,同时从 csv 转换为镶木地板
AWS glue job to map string to date and time format while converting from csv to parquet
在从 csv 转换为 parquet 时,使用 AWS glue ETL 作业跟随 csv 中的映射字段读取为日期和时间类型的字符串。
这是实际的 csv 文件
映射和转换后,提交的日期为空,时间与今天的日期连接
如何转换为正确的日期和时间格式?
它使用 presto 数据类型,因此数据格式应该正确
DATE Calendar date (year, month, day).
Example: DATE '2001-08-22'
TIME Time of day (hour, minute, second, millisecond) without a time
zone. Values of this type are parsed and rendered in the session time
zone.
Example: TIME '01:02:03.456'
TIMESTAMP Instant in time that includes the date and time of day
without a time zone. Values of this type are parsed and rendered in
the session time zone.
Example: TIMESTAMP '2001-08-22 03:04:05.321'
您可以使用:
from pyspark.sql.functions import to_timestamp, to_date, date_format
df = df.withColumn(col, to_timestamp(col, 'dd-MM-yyyy HH:mm'))
df = df.withColumn(col, to_date(col, 'dd-MM-yyyy'))
df = df.withColumn(col, date_format(col, 'HH:mm:ss'))
在从 csv 转换为 parquet 时,使用 AWS glue ETL 作业跟随 csv 中的映射字段读取为日期和时间类型的字符串。
映射和转换后,提交的日期为空,时间与今天的日期连接
如何转换为正确的日期和时间格式?
它使用 presto 数据类型,因此数据格式应该正确
DATE Calendar date (year, month, day).
Example: DATE '2001-08-22'
TIME Time of day (hour, minute, second, millisecond) without a time zone. Values of this type are parsed and rendered in the session time zone.
Example: TIME '01:02:03.456'
TIMESTAMP Instant in time that includes the date and time of day without a time zone. Values of this type are parsed and rendered in the session time zone.
Example: TIMESTAMP '2001-08-22 03:04:05.321'
您可以使用:
from pyspark.sql.functions import to_timestamp, to_date, date_format
df = df.withColumn(col, to_timestamp(col, 'dd-MM-yyyy HH:mm'))
df = df.withColumn(col, to_date(col, 'dd-MM-yyyy'))
df = df.withColumn(col, date_format(col, 'HH:mm:ss'))