Python pandas 根据间隔减少回填直到达到一定数量

Python pandas decrease backfill until reach a certain number based on interval

我有以下名为 df 的数据框,

date          flag1 flag2 flag3 flag4…
2020-12-31
2021-01-01                          
2021-01-02                   1
2021-01-03
2021-01-04
2021-01-05            1                
2021-01-06                        1
2021-01-07
2021-01-08
2021-01-09
2021-01-10
2021-01-11     1                  1
2021-01-12 

我想在任何列中出现 1 时进行回填,并向后填充直到出现数字,否则,回填到设定的数字。

所以假设要减少o的集合数是0,减量是0.1,它应该是这样的,

date         flag1  flag2  flag3  flag4…
2020-12-31           0.5    0.8    0.4
2021-01-01   0.0     0.6    0.9    0.5
2021-01-02   0.1     0.7    1.0    0.6
2021-01-03   0.2     0.8           0.7
2021-01-04   0.3     0.9           0.8
2021-01-05   0.4     1.0           0.9
2021-01-06   0.5                   1.0
2021-01-07   0.6                   0.6
2021-01-08   0.7                   0.7
2021-01-09   0.8                   0.8
2021-01-10   0.9                   0.9
2021-01-11   1.0                   1.0
2021-01-12 

pandas可以实现吗?我希望能够设置减量和限制,例如上面的值是 0.1 和 0。

我知道这个命令可以向后增加值:

df1 = df1[::-1].fillna(method='ffill')
(df1 + (df1 == df1.shift()).cumsum()).sort_index()

但这不是我想要的

您也可以尝试使用 iloc 根据列值等于 1.0 的索引更改值:

import pandas as pd
import numpy as np

def process_data(c, n):
  for idx in reversed(np.where(c==1)[0]):
    c.iloc[np.arange(idx)[::-1][:n.shape[0]]] = n[idx-1::-1][::-1]
    c.iat[idx] = 1.0
  return c
df = df.apply(lambda r: process_data(r, np.linspace(1.0, 0.0, num=11)[1:]))
             flag1  flag2  flag3  flag4
date                                   
2020-12-31     NaN    0.5    0.8    0.4
2021-01-01     0.0    0.6    0.9    0.5
2021-01-02     0.1    0.7    1.0    0.6
2021-01-03     0.2    0.8    NaN    0.7
2021-01-04     0.3    0.9    NaN    0.8
2021-01-05     0.4    1.0    NaN    0.9
2021-01-06     0.5    NaN    NaN    1.0
2021-01-07     0.6    NaN    NaN    0.6
2021-01-08     0.7    NaN    NaN    0.7
2021-01-09     0.8    NaN    NaN    0.8
2021-01-10     0.9    NaN    NaN    0.9
2021-01-11     1.0    NaN    NaN    1.0
2021-01-12     NaN    NaN    NaN    NaN

首先使用 ffill 和限制参数通过累加和创建组,然后每个组从右侧减去 10,除以 10 并设置 NaN如果缺少原始值:

decr = 0.1
vals = 10

f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
            flag1  flag2  flag3  flag4
date                                  
2020-12-31    NaN    0.5    0.8    0.4
2021-01-01    0.0    0.6    0.9    0.5
2021-01-02    0.1    0.7    1.0    0.6
2021-01-03    0.2    0.8    NaN    0.7
2021-01-04    0.3    0.9    NaN    0.8
2021-01-05    0.4    1.0    NaN    0.9
2021-01-06    0.5    NaN    NaN    1.0
2021-01-07    0.6    NaN    NaN    0.6
2021-01-08    0.7    NaN    NaN    0.7
2021-01-09    0.8    NaN    NaN    0.8
2021-01-10    0.9    NaN    NaN    0.9
2021-01-11    1.0    NaN    NaN    1.0
2021-01-12    NaN    NaN    NaN    NaN

更通用的解决方案:

decr = 0.1
start = 1
end = 0

r = np.arange(start, end, -decr)
print (r)
[1.  0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1]

vals = len(r)

f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.where(df1.eq(1)).cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
            flag1  flag2  flag3  flag4
date                                  
2020-12-31    NaN    0.5    0.8    0.4
2021-01-01    0.0    0.6    0.9    0.5
2021-01-02    0.1    0.7    1.0    0.6
2021-01-03    0.2    0.8    NaN    0.7
2021-01-04    0.3    0.9    NaN    0.8
2021-01-05    0.4    1.0    NaN    0.9
2021-01-06    0.5    NaN    NaN    1.0
2021-01-07    0.6    NaN    NaN    0.6
2021-01-08    0.7    NaN    NaN    0.7
2021-01-09    0.8    NaN    NaN    0.8
2021-01-10    0.9    NaN    NaN    0.9
2021-01-11    1.0    NaN    NaN    1.0
2021-01-12    NaN    NaN    NaN    NaN