如何在 pandas 数据框中每组替换 nan / fillna?

How to replace nan / fillna per group in a pandas dataframe?

我有以下数据:

               type           group
0           Drought  Climatological
1               nan  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               nan   Technological
9         Explosion   Technological
10              nan  Meteorological
data_pd = pd.DataFrame({'type':['Drought','nan','Explosion','Ground movement','nan','Ash fall','Rockfall','Ash fall','nan','Explosion','nan'],  
                        'group':['Climatological','Climatological','Technological','Geophysical','Geophysical',  
                        'Geophysical','Geophysical','Geophysical','Technological','Technological','Meteorological']})

如何根据组替换 'nan'

以下是我目前的做法:

我想将 nan 与另一列中下一行的特定字符串匹配的字符串替换为一些替代字符串。

这是我的数据集中的一个数据样本,它在其中抓取工作这是来自 pd.to_dict() 的输出我想保留它,因为它是为了复制我的问题。:

for ty, go in zip(data_pd['type'].values, data_pd['group'].values):
    if ty == 'nan' and go == 'Climatological':
        #ty = ['Drought']
        print(ty) #prints nothing as it did not work

迭代这种任务,这是低效的!

您可以使用蒙版和 pandas.where 来应用您的过滤器:

data_pd['type'] = data_pd['type'].mask(data_pd['type'].eq('nan') & data_pd['group'].eq('Climatological'), 'Drought')

输出:

               type           group
0           Drought  Climatological
1           Drought  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               nan   Technological
9         Explosion   Technological
10              nan  Meteorological

更清洁的解决方案

如果您的 objective 是按组填写,您可以使用字典和 groupy:

subs = {'Climatological': 'Drought', 'Technological': 'foo'}

(data_pd.replace('nan', pd.NA)
        .groupby('group')
        .apply(lambda g: g.fillna(subs.get(g.name, 'nan')))
)

输出:

               type           group
0           Drought  Climatological
1           Drought  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               foo   Technological
9         Explosion   Technological
10              nan  Meteorological
for index, row in data_pd.iterrows():
    if row["type"] == 'nan' and row["group"] == 'Climatological':
        data_pd["type"][index] = "Drought"

为了更加简洁和用户友好,我尽量匹配您的代码。