Pyspark date_trunc 不修改实际值

Pyspark date_trunc without modifying actual value

考虑下面的数据框

df:

time
2022-02-21T11:23:54

我必须把它转换成

time
2022-02-21T11:23:00

使用下面的代码后

df.withColumn("time_updated", date_trunc("minute", col("time"))).show(truncate = False)

我的输出

time
2022-02-21 11:23:00

所需的输出是

time
2022-02-21T11:23:00

有没有办法让数据保持不变,只是 update/truncate 秒??

您只是遇到了格式问题。您看到的输出是时间戳的字符串表示形式。检查您的输出格式:

from pyspark.sql import functions as F, Window as W, types as T

df = df.withColumn(
    "time_updated",
    F.date_format(F.col("time").cast("timestamp"), "YYYY-MM-dd'T'HH:mm:00"),
)

df.show(truncate=False)
+-------------------+-------------------+                                       
|time               |time_updated       |
+-------------------+-------------------+
|2022-02-21T11:23:54|2022-02-21T11:23:00|
+-------------------+-------------------+

df.printSchema()
root
 |-- time: string (nullable = true)
 |-- time_updated: string (nullable = true)