PySpark:将字符串转换为时间戳会给出错误的时间

PySpark: casting string as timestamp gives wrong time

我使用以下代码将字符串类型时间 timstm_hm 转换为时间戳时间 timstm_hm_timestamp。这是代码。

from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))

这是结果。

-------------------------------------------------
|   timstm_hm         |   timstm_hm_timestamp   |  
-------------------------------------------------
|2018-02-08 11:04     | 2018-01-08 11:04:00     | 
-------------------------------------------------
|2018-02-27 20:34     | 2018-01-27 20:34:00     | 
-------------------------------------------------
|2018-02-23 19:47     | 2018-01-23 19:47:00     | 
-------------------------------------------------

为什么转换之间存在一个月的差异?这很奇怪,因为它适用于 1 月,但自 2 月起就不行了。

您只需要 mm替换为大写字母MM

请参阅 java 日期格式以获取更多信息:Javasimpledate

from pyspark.sql.functions import col, unix_timestamp
df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-MM-dd HH:mm").cast("timestamp")).show()

+----------------+-------------------+
|       timstm_hm|timstm_hm_timestamp|
+----------------+-------------------+
|2018-02-08 11:04|2018-02-08 11:04:00|
+----------------+-------------------+

此外,您可以使用 just to_timestampcapital MM.

from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()

+----------------+--------------------+
|       timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+