从给定日期列（pyspark）创建一个日期为过去 3 年的列？

Question

我想使用 pyspark 创建一个列，其中包含比给定列中的日期早 3 年的日期。日期列如下所示：

我想要这个结果：

         date         past date
        2018-08-01   2015-08-01
        2016-08-11   2013-08-11
        2014-09-18   2011-09-18
        2018-12-08   2015-12-08
        2011-12-18   2008-12-18

Answer 1

尝试在 pyspark 中使用 add_months 函数并将 12 与 -3 相乘！

Example:

l = l=[('2018-08-01',),('2016-08-11',)]
ll=["date"]
df=spark.createDataFrame(l,ll)
df.withColumn("past_date",add_months(col("`date`"),-3*12)).show()

RESULT:

+----------+----------+
|      date| past_date|
+----------+----------+
|2018-08-01|2015-08-01|
|2016-08-11|2013-08-11|
+----------+----------+

Answer 2

您可以使用 date_sub function.

这是 Scala 代码，非常适合 python。

df.withColumn("past_date",date_sub(col("date"), 1095))

从给定日期列（pyspark）创建一个日期为过去 3 年的列？

Create a column with date which is 3 years in the past from the given date column (pyspark)?

dataframe

apache-spark

spark-streaming

pyspark

pyspark-sql