时间序列中的 Pyspark 滚动总和,同时将顺序日期保持在行中

Pyspark Rolling Sum in timeseries while keeping the sequential date in row

在 PySpark 中,我想通过时间序列日期添加滚动总和 X 的列,同时保持所有日期可用,即使其中一些日期为零。

原始数据框:

ColA   ColB  ColC  Date(d-m-y) Value
A      A1    AA1   1-1-2021      1
A      A1    AA1   2-1-2021      2
A      A1    AA1   3-1-2021      3 
A      A1    AA1   4-1-2021      4
A      A1    AA1   5-1-2021      0
A      A1    AA1   6-1-2021      0
B      B1    AB1   1-1-2021      5
B      B1    AB1   2-1-2021      6
B      B1    AB1   3-1-2021      7
B      B1    AB1   4-1-2021      8
B      B1    AB1   5-1-2021      9
B      B1    AB1   6-1-2021      10

预期的数据帧

ColA   ColB  ColC  Date(d-m-y)  Value   Rolling2day    Rolling4day
A      A1    AA1   1-1-2021      1        1              1 (1)
A      A1    AA1   2-1-2021      2        3              3 (1+2)
A      A1    AA1   3-1-2021      3        5              6 (1+2+3)
A      A1    AA1   4-1-2021      4        7              10(1+2+3+4)
A      A1    AA1   5-1-2021      0        4              9 (2+3+4+0)
A      A1    AA1   6-1-2021      0        0              7 (3+4+0+0)
B      B1    AB1   1-1-2021      5        5              5
B      B1    AB1   2-1-2021      6        11             11
B      B1    AB1   3-1-2021      7        13             18
B      B1    AB1   4-1-2021      8        15             26
B      B1    AB1   5-1-2021      9        17             30(6+7+8+9) 
B      B1    AB1   6-1-2021      10       19             34 (7+8+9+10)
import pandas as pd
df1 = pd.read_csv("./your-localpath/tmp.csv")
df = spark.createDataFrame(df1)
df.selectExpr("*","sum(Value) over(partition by ColA,ColB,ColC order by `Date(d-m-y)` rows between 1 preceding and current row) as Rolling2day").show()
+----+----+----+-----------+-----+-----------+
|ColA|ColB|ColC|Date(d-m-y)|Value|Rolling2day|
+----+----+----+-----------+-----+-----------+
|   A|  A1| AA1|   1-1-2021|    1|          1|
|   A|  A1| AA1|   2-1-2021|    2|          3|
|   A|  A1| AA1|   3-1-2021|    3|          5|
|   A|  A1| AA1|   4-1-2021|    4|          7|
|   A|  A1| AA1|   5-1-2021|    0|          4|
|   A|  A1| AA1|   6-1-2021|    0|          0|
|   B|  B1| AB1|   1-1-2021|    5|          5|
|   B|  B1| AB1|   2-1-2021|    6|         11|
|   B|  B1| AB1|   3-1-2021|    7|         13|
|   B|  B1| AB1|   4-1-2021|    8|         15|
|   B|  B1| AB1|   5-1-2021|    9|         17|
|   B|  B1| AB1|   6-1-2021|   10|         19|
+----+----+----+-----------+-----+-----------+

如果你想滚四天,1改3就是你想要的