PySpark:将字符串转换为时间戳会给出错误的时间
PySpark: casting string as timestamp gives wrong time
我使用以下代码将字符串类型时间 timstm_hm
转换为时间戳时间 timstm_hm_timestamp
。这是代码。
from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))
这是结果。
-------------------------------------------------
| timstm_hm | timstm_hm_timestamp |
-------------------------------------------------
|2018-02-08 11:04 | 2018-01-08 11:04:00 |
-------------------------------------------------
|2018-02-27 20:34 | 2018-01-27 20:34:00 |
-------------------------------------------------
|2018-02-23 19:47 | 2018-01-23 19:47:00 |
-------------------------------------------------
为什么转换之间存在一个月的差异?这很奇怪,因为它适用于 1 月,但自 2 月起就不行了。
您只需要将 mm
替换为大写字母MM
。
请参阅 java 日期格式以获取更多信息:Javasimpledate
from pyspark.sql.functions import col, unix_timestamp
df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-MM-dd HH:mm").cast("timestamp")).show()
+----------------+-------------------+
| timstm_hm|timstm_hm_timestamp|
+----------------+-------------------+
|2018-02-08 11:04|2018-02-08 11:04:00|
+----------------+-------------------+
此外,您可以使用 just to_timestamp
和 capital MM
.
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
我使用以下代码将字符串类型时间 timstm_hm
转换为时间戳时间 timstm_hm_timestamp
。这是代码。
from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))
这是结果。
-------------------------------------------------
| timstm_hm | timstm_hm_timestamp |
-------------------------------------------------
|2018-02-08 11:04 | 2018-01-08 11:04:00 |
-------------------------------------------------
|2018-02-27 20:34 | 2018-01-27 20:34:00 |
-------------------------------------------------
|2018-02-23 19:47 | 2018-01-23 19:47:00 |
-------------------------------------------------
为什么转换之间存在一个月的差异?这很奇怪,因为它适用于 1 月,但自 2 月起就不行了。
您只需要将 mm
替换为大写字母MM
。
请参阅 java 日期格式以获取更多信息:Javasimpledate
from pyspark.sql.functions import col, unix_timestamp
df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-MM-dd HH:mm").cast("timestamp")).show()
+----------------+-------------------+
| timstm_hm|timstm_hm_timestamp|
+----------------+-------------------+
|2018-02-08 11:04|2018-02-08 11:04:00|
+----------------+-------------------+
此外,您可以使用 just to_timestamp
和 capital MM
.
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+