在 Spark SQL 中将日期转换为整数 returns null

Casting date to integer returns null in Spark SQL

我想使用 Spark SQL 将日期列转换为整数。 我正在关注 ,但我想使用 Spark SQL 而不是 PySpark。

重现示例:

from pyspark.sql.types import *
import pyspark.sql.functions as F

# DUMMY DATA
simpleData = [("James",34,"2006-01-01","true","M",3000.60),
              ("Michael",33,"1980-01-10","true","F",3300.80),
              ("Robert",37,"1992-07-01","false","M",5000.50)
             ]

columns = ["firstname","age","jobStartDate","isGraduated","gender","salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)

df = df.withColumn("jobStartDate", df['jobStartDate'].cast(DateType()))
df = df.withColumn("jobStartDateAsInteger1", F.unix_timestamp(df['jobStartDate']))
display(df)

我想要的是做同样的转换,但是使用 Spark SQL。我正在使用以下代码:

df.createOrReplaceTempView("date_to_integer")

%sql
select
seg.*,
CAST (jobStartDate AS INTEGER) as JobStartDateAsInteger2 -- return null value
from date_to_integer seg

如何解决?

首先您需要将您的 jobStartDate CAST 设置为 DATE,然后使用 UNIX_TIMESTAMP 将其转换为 UNIX 整数。

SELECT
    seg.*,
    UNIX_TIMESTAMP(CAST (jobStartDate AS DATE)) AS JobStartDateAsInteger2
FROM date_to_integer seg