date_format 不处理带有 `00:00:00` 的时间戳

date_format doesn't handle timestamp with `00:00:00`

它将类型 timestamp2020-01-27 00:00:00 格式化为 2020-01-27 12:00:00 而不是 2020-01-27 00:00:00

  import spark.sqlContext.implicits._
  import java.sql.Timestamp
import org.apache.spark.sql.functions.typedLit


scala>   val stamp = typedLit(new Timestamp(1580105949000L))
stamp: org.apache.spark.sql.Column = TIMESTAMP('2020-01-27 00:19:09.0')


scala>   var df_test = Seq(5).toDF("seq").select(
     |     stamp.as("unixtime"),
     |     date_trunc("HOUR", stamp).as("date_trunc"),
     |     date_format(date_trunc("HOUR", stamp), "yyyy-MM-dd hh:mm:ss").as("hour")
     |   )
df_test: org.apache.spark.sql.DataFrame = [unixtime: timestamp, date_trunc: timestamp ... 1 more field]


scala> df_test.show
+-------------------+-------------------+-------------------+
|           unixtime|         date_trunc|               hour|
+-------------------+-------------------+-------------------+
|2020-01-27 00:19:09|2020-01-27 00:00:00|2020-01-27 12:00:00|
+-------------------+-------------------+-------------------+

你的模式应该是yyyy-MM-dd HH:mm:ss

date_format,根据 its documentation,使用 java.text.SimpleDateFormat 支持的说明符:

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
See SimpleDateFormat for valid date and time format patterns.

SimpleDateFormat 的文档可以找到 here

hh 用于 "Hour in am/pm (1-12)"。您正在寻找一天中的小时说明符,即 HH.