spark date_format 结果显示为空

spark date_format results showing null

我有如下数据源:

order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00.0,11599,CLOSED
2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT
3,2013-07-25 00:00:00.0,12111,COMPLETE
4,2013-07-25 00:00:00.0,8827,CLOSED

我正在尝试使用以下查询仅将 CLOSED 订单转换为 mm/dd/yyyy,但输出为空。您能否协助使用 DSL 或 spark sql 方法获取所需的日期格式:

closed_df=ord_df.select(date_format(to_date('order_date','yyyy-mm-dd hh:mm:SS.a'),'mm/dd/yyyy') .\
                 alias("formate_date")).show()

#output:

|formate_date|
+------------+
|        null|
|        null|

ord_df.createOrReplaceTempView("orders")
cld_df = spark.sql( """select order_id, date_format(to_date("order_date","yyyy-mm-dd hh:mm:ss.a"),'mm/dd/yyyy') as order_date,\
                     order_customer_id, order_status \
                     from orders where order_status = 'CLOSED'""").show()

#output:

|order_id|order_date|order_customer_id|order_status|
+--------+----------+-----------------+------------+
|       1|      null|            11599|      CLOSED|
|       4|      null|             8827|      CLOSED

字符串 2013-07-25 00:00:00.0 的日期格式是 yyyy-MM-dd HH:mm:SS.s。同样,对于日期格式,格式为 MM/dd/yyyy。此处 Spark formatting doc 了解更多信息。


data = [(1, "2013-07-25 00:00:00.0", 11599, "CLOSED",),
        (2, "2013-07-25 00:00:00.0", 256, "PENDING_PAYMENT",),
        (3, "2013-07-25 00:00:00.0", 12111, "COMPLETE",),
        (4, "2013-07-25 00:00:00.0", 8827, "CLOSED",), ]

ord_df = spark.createDataFrame(data, ("order_id", "order_date", "order_customer_id", "order_status",))


from pyspark.sql.functions import to_date, date_format
closed_df = (ord_df.where("order_status = 'CLOSED'")
                   .select(date_format(to_date('order_date','yyyy-MM-dd HH:mm:SS.s'),'MM/dd/yyyy')
                      .alias("formate_date"))).show()

"""
+------------+
|formate_date|
+------------+
|  07/25/2013|
|  07/25/2013|
+------------+
"""

ord_df.createOrReplaceTempView("orders")
cld_df = spark.sql( """select order_id, date_format(to_date(order_date,"yyyy-MM-dd HH:mm:SS.s"), "MM/dd/yyyy") as order_date,\
                     order_customer_id, order_status \
                     from orders where order_status = 'CLOSED'""").show()

"""
+--------+----------+-----------------+------------+
|order_id|order_date|order_customer_id|order_status|
+--------+----------+-----------------+------------+
|       1|07/25/2013|            11599|      CLOSED|
|       4|07/25/2013|             8827|      CLOSED|
+--------+----------+-----------------+------------+
"""