Farword 填充基于其他列与分组依据
Farword fill based on other column with group by
我是 python 的新手,我在 forword 填充处有结构。
我有数据框(df_input)我需要按名称和类型列
将 d1 列填充到 d2 列组中的值
import pandas as pd
data_input = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault'],
'type':['Duster', 'Duster', 'Duster', 'Duster','Duster','Triber','Triber','Triber','Triber'],
'd1':['nan','10','nan','nan','nan','nan','20','nan','nan'],
'd2':['nan','nan','nan','200','nan','nan','nan','nan','200']}
df_input = pd.DataFrame(data_input)
data_out = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault'],
'type':['Duster', 'Duster', 'Duster', 'Duster','Duster','Triber','Triber','Triber','Triber'],
'd1':['nan','10','nan','nan','nan','nan','20','nan','nan'],
'd2':['nan','nan','nan','200','nan','nan','nan','nan','200'],
'Out_col':['nan','10','10','10','nan','nan','20','20','20']}
df_out = pd.DataFrame(data_out)
我试过以下方法
df_out['Out_col'] = df_out.groupby(["Name","type"])["d1"].ffill()
提前致谢!
使用:
#strings nans to NaNs missing values
df_input = df_input.replace('nan', np.nan)
您需要用 Series.mask
:
回填列 d2
的值来替换缺失值
s = df_input.groupby(["Name","type"])["d2"].bfill()
df_input['Out_col'] = df_input.groupby(["Name","type"])["d1"].ffill().mask(s.isna())
print (df_input)
Name type d1 d2 Out_col
0 Renault Duster NaN NaN NaN
1 Renault Duster 10 NaN 10
2 Renault Duster NaN NaN 10
3 Renault Duster NaN 200 10
4 Renault Duster NaN NaN NaN
5 Renault Triber NaN NaN NaN
6 Renault Triber 20 NaN 20
7 Renault Triber NaN NaN 20
8 Renault Triber NaN 200 20
我是 python 的新手,我在 forword 填充处有结构。 我有数据框(df_input)我需要按名称和类型列
将 d1 列填充到 d2 列组中的值import pandas as pd
data_input = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault'],
'type':['Duster', 'Duster', 'Duster', 'Duster','Duster','Triber','Triber','Triber','Triber'],
'd1':['nan','10','nan','nan','nan','nan','20','nan','nan'],
'd2':['nan','nan','nan','200','nan','nan','nan','nan','200']}
df_input = pd.DataFrame(data_input)
data_out = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault'],
'type':['Duster', 'Duster', 'Duster', 'Duster','Duster','Triber','Triber','Triber','Triber'],
'd1':['nan','10','nan','nan','nan','nan','20','nan','nan'],
'd2':['nan','nan','nan','200','nan','nan','nan','nan','200'],
'Out_col':['nan','10','10','10','nan','nan','20','20','20']}
df_out = pd.DataFrame(data_out)
我试过以下方法
df_out['Out_col'] = df_out.groupby(["Name","type"])["d1"].ffill()
提前致谢!
使用:
#strings nans to NaNs missing values
df_input = df_input.replace('nan', np.nan)
您需要用 Series.mask
:
d2
的值来替换缺失值
s = df_input.groupby(["Name","type"])["d2"].bfill()
df_input['Out_col'] = df_input.groupby(["Name","type"])["d1"].ffill().mask(s.isna())
print (df_input)
Name type d1 d2 Out_col
0 Renault Duster NaN NaN NaN
1 Renault Duster 10 NaN 10
2 Renault Duster NaN NaN 10
3 Renault Duster NaN 200 10
4 Renault Duster NaN NaN NaN
5 Renault Triber NaN NaN NaN
6 Renault Triber 20 NaN 20
7 Renault Triber NaN NaN 20
8 Renault Triber NaN 200 20