如何使用 fillna 不传播最后有效观察但传播上周有效观察

How to use fillna not to propagate last valid observation but to propagate last week valid observation

这是我的数据集的摘录: Dataset

这里是一个数据集的例子

df = pd.DataFrame(
{'vals': np.where(np.arange(35) < 30, np.arange(35), np.nan)},
index=pd.date_range('2021-01-01', freq='12H', periods=35))

                    vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 NaN
2021-01-16 12:00:00 NaN
2021-01-17 00:00:00 NaN
2021-01-17 12:00:00 NaN
2021-01-18 00:00:00 NaN

对于我想要的结果:

                        vals
    2021-01-01 00:00:00 0.0
    2021-01-01 12:00:00 1.0
    2021-01-02 00:00:00 2.0
    2021-01-02 12:00:00 3.0
    2021-01-03 00:00:00 4.0
    2021-01-03 12:00:00 5.0
    2021-01-04 00:00:00 6.0
    2021-01-04 12:00:00 7.0
    2021-01-05 00:00:00 8.0
    2021-01-05 12:00:00 9.0
    2021-01-06 00:00:00 10.0
    2021-01-06 12:00:00 11.0
    2021-01-07 00:00:00 12.0
    2021-01-07 12:00:00 13.0
    2021-01-08 00:00:00 14.0
    2021-01-08 12:00:00 15.0
    2021-01-09 00:00:00 16.0
    2021-01-09 12:00:00 17.0
    2021-01-10 00:00:00 18.0
    2021-01-10 12:00:00 19.0
    2021-01-11 00:00:00 20.0
    2021-01-11 12:00:00 21.0
    2021-01-12 00:00:00 22.0
    2021-01-12 12:00:00 23.0
    2021-01-13 00:00:00 24.0
    2021-01-13 12:00:00 25.0
    2021-01-14 00:00:00 26.0
    2021-01-14 12:00:00 27.0
    2021-01-15 00:00:00 28.0
    2021-01-15 12:00:00 29.0
    2021-01-16 00:00:00 16.0
    2021-01-16 12:00:00 17.0
    2021-01-17 00:00:00 18.0
    2021-01-17 12:00:00 19.0
    2021-01-18 00:00:00 20.0

我的问题:

我想用一周前观察到的同一列中的值填充 NaN 值。

df.fillna(method='ffill') 没有帮助,因为它根据最后一个值进行填充。有什么想法吗?

具有缺失值和日期时间索引的 DataFrame 的简单示例:

In [2]: df = pd.DataFrame(
   ...:     {'vals': np.where(np.arange(21) < 14, np.arange(21), np.nan)},    
   ...:     index=pd.date_range('2021-01-01', freq='D', periods=21),
   ...: )
   ...:

In [3]: df
Out[3]:
            vals
2021-01-01   0.0
2021-01-02   1.0
2021-01-03   2.0
2021-01-04   3.0
2021-01-05   4.0
2021-01-06   5.0
2021-01-07   6.0
2021-01-08   7.0
2021-01-09   8.0
2021-01-10   9.0
2021-01-11  10.0
2021-01-12  11.0
2021-01-13  12.0
2021-01-14  13.0
2021-01-15   NaN
2021-01-16   NaN
2021-01-17   NaN
2021-01-18   NaN
2021-01-19   NaN
2021-01-20   NaN
2021-01-21   NaN

您可以使用 pandas datetime components 在工作日分组,然后使用 ffill 在每个组内转发填充:

In [4]: df.groupby(df.index.weekday).ffill()
Out[4]:
            vals
2021-01-01   0.0
2021-01-02   1.0
2021-01-03   2.0
2021-01-04   3.0
2021-01-05   4.0
2021-01-06   5.0
2021-01-07   6.0
2021-01-08   7.0
2021-01-09   8.0
2021-01-10   9.0
2021-01-11  10.0
2021-01-12  11.0
2021-01-13  12.0
2021-01-14  13.0
2021-01-15   7.0
2021-01-16   8.0
2021-01-17   9.0
2021-01-18  10.0
2021-01-19  11.0
2021-01-20  12.0
2021-01-21  13.0