Python pandas 根据间隔减少回填直到达到一定数量
Python pandas decrease backfill until reach a certain number based on interval
我有以下名为 df 的数据框,
date flag1 flag2 flag3 flag4…
2020-12-31
2021-01-01
2021-01-02 1
2021-01-03
2021-01-04
2021-01-05 1
2021-01-06 1
2021-01-07
2021-01-08
2021-01-09
2021-01-10
2021-01-11 1 1
2021-01-12
我想在任何列中出现 1 时进行回填,并向后填充直到出现数字,否则,回填到设定的数字。
所以假设要减少o的集合数是0,减量是0.1,它应该是这样的,
date flag1 flag2 flag3 flag4…
2020-12-31 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 0.7
2021-01-04 0.3 0.9 0.8
2021-01-05 0.4 1.0 0.9
2021-01-06 0.5 1.0
2021-01-07 0.6 0.6
2021-01-08 0.7 0.7
2021-01-09 0.8 0.8
2021-01-10 0.9 0.9
2021-01-11 1.0 1.0
2021-01-12
pandas可以实现吗?我希望能够设置减量和限制,例如上面的值是 0.1 和 0。
我知道这个命令可以向后增加值:
df1 = df1[::-1].fillna(method='ffill')
(df1 + (df1 == df1.shift()).cumsum()).sort_index()
但这不是我想要的
您也可以尝试使用 iloc
根据列值等于 1.0 的索引更改值:
import pandas as pd
import numpy as np
def process_data(c, n):
for idx in reversed(np.where(c==1)[0]):
c.iloc[np.arange(idx)[::-1][:n.shape[0]]] = n[idx-1::-1][::-1]
c.iat[idx] = 1.0
return c
df = df.apply(lambda r: process_data(r, np.linspace(1.0, 0.0, num=11)[1:]))
flag1 flag2 flag3 flag4
date
2020-12-31 NaN 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 NaN 0.7
2021-01-04 0.3 0.9 NaN 0.8
2021-01-05 0.4 1.0 NaN 0.9
2021-01-06 0.5 NaN NaN 1.0
2021-01-07 0.6 NaN NaN 0.6
2021-01-08 0.7 NaN NaN 0.7
2021-01-09 0.8 NaN NaN 0.8
2021-01-10 0.9 NaN NaN 0.9
2021-01-11 1.0 NaN NaN 1.0
2021-01-12 NaN NaN NaN NaN
首先使用 ffill
和限制参数通过累加和创建组,然后每个组从右侧减去 10
,除以 10
并设置 NaN
如果缺少原始值:
decr = 0.1
vals = 10
f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
flag1 flag2 flag3 flag4
date
2020-12-31 NaN 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 NaN 0.7
2021-01-04 0.3 0.9 NaN 0.8
2021-01-05 0.4 1.0 NaN 0.9
2021-01-06 0.5 NaN NaN 1.0
2021-01-07 0.6 NaN NaN 0.6
2021-01-08 0.7 NaN NaN 0.7
2021-01-09 0.8 NaN NaN 0.8
2021-01-10 0.9 NaN NaN 0.9
2021-01-11 1.0 NaN NaN 1.0
2021-01-12 NaN NaN NaN NaN
更通用的解决方案:
decr = 0.1
start = 1
end = 0
r = np.arange(start, end, -decr)
print (r)
[1. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1]
vals = len(r)
f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.where(df1.eq(1)).cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
flag1 flag2 flag3 flag4
date
2020-12-31 NaN 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 NaN 0.7
2021-01-04 0.3 0.9 NaN 0.8
2021-01-05 0.4 1.0 NaN 0.9
2021-01-06 0.5 NaN NaN 1.0
2021-01-07 0.6 NaN NaN 0.6
2021-01-08 0.7 NaN NaN 0.7
2021-01-09 0.8 NaN NaN 0.8
2021-01-10 0.9 NaN NaN 0.9
2021-01-11 1.0 NaN NaN 1.0
2021-01-12 NaN NaN NaN NaN
我有以下名为 df 的数据框,
date flag1 flag2 flag3 flag4…
2020-12-31
2021-01-01
2021-01-02 1
2021-01-03
2021-01-04
2021-01-05 1
2021-01-06 1
2021-01-07
2021-01-08
2021-01-09
2021-01-10
2021-01-11 1 1
2021-01-12
我想在任何列中出现 1 时进行回填,并向后填充直到出现数字,否则,回填到设定的数字。
所以假设要减少o的集合数是0,减量是0.1,它应该是这样的,
date flag1 flag2 flag3 flag4…
2020-12-31 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 0.7
2021-01-04 0.3 0.9 0.8
2021-01-05 0.4 1.0 0.9
2021-01-06 0.5 1.0
2021-01-07 0.6 0.6
2021-01-08 0.7 0.7
2021-01-09 0.8 0.8
2021-01-10 0.9 0.9
2021-01-11 1.0 1.0
2021-01-12
pandas可以实现吗?我希望能够设置减量和限制,例如上面的值是 0.1 和 0。
我知道这个命令可以向后增加值:
df1 = df1[::-1].fillna(method='ffill')
(df1 + (df1 == df1.shift()).cumsum()).sort_index()
但这不是我想要的
您也可以尝试使用 iloc
根据列值等于 1.0 的索引更改值:
import pandas as pd
import numpy as np
def process_data(c, n):
for idx in reversed(np.where(c==1)[0]):
c.iloc[np.arange(idx)[::-1][:n.shape[0]]] = n[idx-1::-1][::-1]
c.iat[idx] = 1.0
return c
df = df.apply(lambda r: process_data(r, np.linspace(1.0, 0.0, num=11)[1:]))
flag1 flag2 flag3 flag4
date
2020-12-31 NaN 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 NaN 0.7
2021-01-04 0.3 0.9 NaN 0.8
2021-01-05 0.4 1.0 NaN 0.9
2021-01-06 0.5 NaN NaN 1.0
2021-01-07 0.6 NaN NaN 0.6
2021-01-08 0.7 NaN NaN 0.7
2021-01-09 0.8 NaN NaN 0.8
2021-01-10 0.9 NaN NaN 0.9
2021-01-11 1.0 NaN NaN 1.0
2021-01-12 NaN NaN NaN NaN
首先使用 ffill
和限制参数通过累加和创建组,然后每个组从右侧减去 10
,除以 10
并设置 NaN
如果缺少原始值:
decr = 0.1
vals = 10
f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
flag1 flag2 flag3 flag4
date
2020-12-31 NaN 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 NaN 0.7
2021-01-04 0.3 0.9 NaN 0.8
2021-01-05 0.4 1.0 NaN 0.9
2021-01-06 0.5 NaN NaN 1.0
2021-01-07 0.6 NaN NaN 0.6
2021-01-08 0.7 NaN NaN 0.7
2021-01-09 0.8 NaN NaN 0.8
2021-01-10 0.9 NaN NaN 0.9
2021-01-11 1.0 NaN NaN 1.0
2021-01-12 NaN NaN NaN NaN
更通用的解决方案:
decr = 0.1
start = 1
end = 0
r = np.arange(start, end, -decr)
print (r)
[1. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1]
vals = len(r)
f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.where(df1.eq(1)).cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
flag1 flag2 flag3 flag4
date
2020-12-31 NaN 0.5 0.8 0.4
2021-01-01 0.0 0.6 0.9 0.5
2021-01-02 0.1 0.7 1.0 0.6
2021-01-03 0.2 0.8 NaN 0.7
2021-01-04 0.3 0.9 NaN 0.8
2021-01-05 0.4 1.0 NaN 0.9
2021-01-06 0.5 NaN NaN 1.0
2021-01-07 0.6 NaN NaN 0.6
2021-01-08 0.7 NaN NaN 0.7
2021-01-09 0.8 NaN NaN 0.8
2021-01-10 0.9 NaN NaN 0.9
2021-01-11 1.0 NaN NaN 1.0
2021-01-12 NaN NaN NaN NaN