spark scala long 在镶木地板数据帧中转换为以毫秒为单位的时间戳

spark scala long converts to timestamp with milliseconds in parquet dataframe

谁能指导我如何将 long 转换为带毫秒的时间戳? 我知道如何做 yyyy-MM-dd HH:mm:ss 但是我想要毫秒 yyyy-MM-dd HH:mm:ss.SSS

我的拼花结构是这样的

|-- header: struct (nullable = true)
 |    |-- time: long (nullable = true)
...

时间的一个样本是 1600676073054:

Scala

scala> spark.sql("select from_unixtime(word) as ts, word from tmp_1").show(false)
+--------------------+-------------+
|ts                  |word         |
+--------------------+-------------+
|52693-05-28 18:30:54|1600676073054|
+--------------------+-------------+


scala> spark.sql("select from_unixtime(word/1000) as ts, word from tmp_1").show(false)
+-------------------+-------------+
|ts                 |word         |
+-------------------+-------------+
|2020-09-21 16:14:33|1600676073054|
+-------------------+-------------+


scala> spark.sql("select from_unixtime(word) as ts, word from tmp_1").show(false)
+--------------------+-------------+
|ts                  |word         |
+--------------------+-------------+
|52693-05-28 18:30:54|1600676073054|
+--------------------+-------------+

Sql 服务器

declare @StartDate datetime2(3) = '1970-01-01 00:00:00.000'
, @milliseconds bigint = 1600676073054
, @MillisecondsPerDay int = 60 * 60 * 24 * 1000 -- = 86400000

SELECT  DATEADD(MILLISECOND, TRY_CAST(@milliseconds % @millisecondsPerDay AS INT), DATEADD(DAY, TRY_CAST(@milliseconds / @millisecondsPerDay AS INT), @StartDate));
--2020-09-21 08:14:33.054

我想知道如何将 054 转换为毫秒。

谢谢。

Spark不支持纪元毫秒,所以需要除以1000。

val df = spark.createDataFrame(
    Seq(
        
       (1, "1600676073054")
    )
).toDF("id","long_timestamp")


 df.withColumn(
        "timestamp_mili",
        (col("long_timestamp")/1000).cast("timestamp") 
    ).show(false)
    
  //+---+--------------+-----------------------+
  //|id |long_timestamp|timestamp_mili         |
  //+---+--------------+-----------------------+
  //|1  |1600676073054 |2020-09-21 08:14:33.054|
  //+---+--------------+-----------------------+