在 sparklyr 中将 12 小时制转换为 24 小时制

convert 12 hour clock to 24 hour time in sparklyr

我正在尝试使用 sparklyr 将以下时间转换为 24 小时制:

2021-05-18 9:00:00 PM

我的预期结果:2021-05-18 21:00:00

我试过:

data %>% 
  mutate(datetime_24 = to_timestamp("datetime_12", "yyyy-MM-dd hh:mm:ss"))

data %>% 
  mutate(datetime_24 = to_date("datetime_12", "yyyy-MM-dd hh:mm:ss"))

两者都会导致 NULL。

我尝试了以下作为起点并得到了这个错误

data %>%
  mutate(datetime_24 = unix_timestamp(datetime_12, "yyyy-MM-dd hh:mm:ss"))

You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2021-05-18 9:00:00 PM' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

我也在 pyspark 中尝试了以下但得到了类似的错误:

from pyspark.sql.functions import from_unixtime, unix_timestamp, col

df_time = spark.table("data")

df_time_new = df_time.withColumn('datetime_24', \
             from_unixtime(unix_timestamp(col(('datetime_12')), "yyyy-mm-dd hh:mm:ss"), "yyyy-mm-dd HH:mm:ss"))

错误:

You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2021-05-18 9:00:00 PM' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. Caused by: DateTimeParseException: Text '2021-05-18 9:00:00 PM' could not be parsed at index 11

您可以将 spark.sql.legacy.timeParserPolicy 设置为 LEGACY,如下所示:

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

在此之后,您应该不会在解析日期时间时遇到错误。

由于 spark v3.0 (Read here) 中有一些与日期时间解析器相关的更改,您会收到该错误。

阅读日期时间模式here。根据这个,您可以使用模式 'a' 进行 PM 或 AM 解析。

to_date("datetime_12", "yyyy-MM-dd hh:mm:ss a")