如何使用 fillna 不传播最后有效观察但传播上周有效观察
How to use fillna not to propagate last valid observation but to propagate last week valid observation
这是我的数据集的摘录:
Dataset
这里是一个数据集的例子
df = pd.DataFrame(
{'vals': np.where(np.arange(35) < 30, np.arange(35), np.nan)},
index=pd.date_range('2021-01-01', freq='12H', periods=35))
vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 NaN
2021-01-16 12:00:00 NaN
2021-01-17 00:00:00 NaN
2021-01-17 12:00:00 NaN
2021-01-18 00:00:00 NaN
对于我想要的结果:
vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 16.0
2021-01-16 12:00:00 17.0
2021-01-17 00:00:00 18.0
2021-01-17 12:00:00 19.0
2021-01-18 00:00:00 20.0
我的问题:
我想用一周前观察到的同一列中的值填充 NaN 值。
df.fillna(method='ffill') 没有帮助,因为它根据最后一个值进行填充。有什么想法吗?
具有缺失值和日期时间索引的 DataFrame 的简单示例:
In [2]: df = pd.DataFrame(
...: {'vals': np.where(np.arange(21) < 14, np.arange(21), np.nan)},
...: index=pd.date_range('2021-01-01', freq='D', periods=21),
...: )
...:
In [3]: df
Out[3]:
vals
2021-01-01 0.0
2021-01-02 1.0
2021-01-03 2.0
2021-01-04 3.0
2021-01-05 4.0
2021-01-06 5.0
2021-01-07 6.0
2021-01-08 7.0
2021-01-09 8.0
2021-01-10 9.0
2021-01-11 10.0
2021-01-12 11.0
2021-01-13 12.0
2021-01-14 13.0
2021-01-15 NaN
2021-01-16 NaN
2021-01-17 NaN
2021-01-18 NaN
2021-01-19 NaN
2021-01-20 NaN
2021-01-21 NaN
您可以使用 pandas datetime components 在工作日分组,然后使用 ffill 在每个组内转发填充:
In [4]: df.groupby(df.index.weekday).ffill()
Out[4]:
vals
2021-01-01 0.0
2021-01-02 1.0
2021-01-03 2.0
2021-01-04 3.0
2021-01-05 4.0
2021-01-06 5.0
2021-01-07 6.0
2021-01-08 7.0
2021-01-09 8.0
2021-01-10 9.0
2021-01-11 10.0
2021-01-12 11.0
2021-01-13 12.0
2021-01-14 13.0
2021-01-15 7.0
2021-01-16 8.0
2021-01-17 9.0
2021-01-18 10.0
2021-01-19 11.0
2021-01-20 12.0
2021-01-21 13.0
这是我的数据集的摘录: Dataset
这里是一个数据集的例子
df = pd.DataFrame(
{'vals': np.where(np.arange(35) < 30, np.arange(35), np.nan)},
index=pd.date_range('2021-01-01', freq='12H', periods=35))
vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 NaN
2021-01-16 12:00:00 NaN
2021-01-17 00:00:00 NaN
2021-01-17 12:00:00 NaN
2021-01-18 00:00:00 NaN
对于我想要的结果:
vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 16.0
2021-01-16 12:00:00 17.0
2021-01-17 00:00:00 18.0
2021-01-17 12:00:00 19.0
2021-01-18 00:00:00 20.0
我的问题:
我想用一周前观察到的同一列中的值填充 NaN 值。
df.fillna(method='ffill') 没有帮助,因为它根据最后一个值进行填充。有什么想法吗?
具有缺失值和日期时间索引的 DataFrame 的简单示例:
In [2]: df = pd.DataFrame(
...: {'vals': np.where(np.arange(21) < 14, np.arange(21), np.nan)},
...: index=pd.date_range('2021-01-01', freq='D', periods=21),
...: )
...:
In [3]: df
Out[3]:
vals
2021-01-01 0.0
2021-01-02 1.0
2021-01-03 2.0
2021-01-04 3.0
2021-01-05 4.0
2021-01-06 5.0
2021-01-07 6.0
2021-01-08 7.0
2021-01-09 8.0
2021-01-10 9.0
2021-01-11 10.0
2021-01-12 11.0
2021-01-13 12.0
2021-01-14 13.0
2021-01-15 NaN
2021-01-16 NaN
2021-01-17 NaN
2021-01-18 NaN
2021-01-19 NaN
2021-01-20 NaN
2021-01-21 NaN
您可以使用 pandas datetime components 在工作日分组,然后使用 ffill 在每个组内转发填充:
In [4]: df.groupby(df.index.weekday).ffill()
Out[4]:
vals
2021-01-01 0.0
2021-01-02 1.0
2021-01-03 2.0
2021-01-04 3.0
2021-01-05 4.0
2021-01-06 5.0
2021-01-07 6.0
2021-01-08 7.0
2021-01-09 8.0
2021-01-10 9.0
2021-01-11 10.0
2021-01-12 11.0
2021-01-13 12.0
2021-01-14 13.0
2021-01-15 7.0
2021-01-16 8.0
2021-01-17 9.0
2021-01-18 10.0
2021-01-19 11.0
2021-01-20 12.0
2021-01-21 13.0