在 spark sql 中进行 unix 时间转换后出现意外的错误结果

Unexpected incorrect result after unixtime conversion in sparksql

我有一个数据框,内容如下:

scala> patDF.show
+---------+-------+-----------+-------------+
|patientID|   name|dateOtBirth|lastVisitDate|
+---------+-------+-----------+-------------+
|     1001|Ah Teck| 1991-12-31|   2012-01-20|
|     1002|  Kumar| 2011-10-29|   2012-09-20|
|     1003|    Ali| 2011-01-30|   2012-10-21|
+---------+-------+-----------+-------------+

所有列都是字符串

我想获取 lastVisitDate 格式范围为 yyyy-mm-dd 的记录列表,现在,所以这是脚本:

patDF.registerTempTable("patients") 
val results2 = sqlContext.sql("SELECT * FROM patients WHERE from_unixtime(unix_timestamp(lastVisitDate, 'yyyy-mm-dd')) between '2012-09-15' and current_timestamp() order by lastVisitDate")
results2.show() 

什么都没有,估计应该有patientID为1002和1003的记录吧

所以我将查询修改为:

val results3 = sqlContext.sql("SELECT from_unixtime(unix_timestamp(lastVisitDate, 'yyyy-mm-dd')), * FROM patients")
results3.show() 

现在我得到:

+-------------------+---------+-------+-----------+-------------+
|                _c0|patientlD|   name|dateOtBirth|lastVisitDate|
+-------------------+---------+-------+-----------+-------------+
|2012-01-20 00:01:00|     1001|Ah Teck| 1991-12-31|   2012-01-20|
|2012-01-20 00:09:00|     1002|  Kumar| 2011-10-29|   2012-09-20|
|2012-01-21 00:10:00|     1003|    Ali| 2011-01-30|   2012-10-21|
+-------------------+---------+-------+-----------+-------------+

如果您查看第一列,您会发现所有月份都以某种方式更改为 01

代码有什么问题?

year-month-day 的正确格式应该是 yyyy-MM-dd:

val patDF = Seq(
  (1001, "Ah Teck", "1991-12-31", "2012-01-20"),
  (1002, "Kumar", "2011-10-29", "2012-09-20"),
  (1003, "Ali", "2011-01-30", "2012-10-21")
)toDF("patientID", "name", "dateOtBirth", "lastVisitDate")

patDF.createOrReplaceTempView("patTable")

val result1 = spark.sqlContext.sql("""
  select * from patTable where to_timestamp(lastVisitDate, 'yyyy-MM-dd')
    between '2012-09-15' and current_timestamp() order by lastVisitDate
""")

result1.show
// +---------+-----+-----------+-------------+
// |patientID| name|dateOtBirth|lastVisitDate|
// +---------+-----+-----------+-------------+
// |     1002|Kumar| 2011-10-29|   2012-09-20|
// |     1003|  Ali| 2011-01-30|   2012-10-21|
// +---------+-----+-----------+-------------+

您也可以使用 DataFrame API,如果需要:

val result2 = patDF.where(to_timestamp($"lastVisitDate", "yyyy-MM-dd").
    between(to_timestamp(lit("2012-09-15"), "yyyy-MM-dd"), current_timestamp())
  ).orderBy($"lastVisitDate")