如何在 pandas 数据框中每组替换 nan / fillna?
How to replace nan / fillna per group in a pandas dataframe?
我有以下数据:
type group
0 Drought Climatological
1 nan Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 nan Technological
9 Explosion Technological
10 nan Meteorological
data_pd = pd.DataFrame({'type':['Drought','nan','Explosion','Ground movement','nan','Ash fall','Rockfall','Ash fall','nan','Explosion','nan'],
'group':['Climatological','Climatological','Technological','Geophysical','Geophysical',
'Geophysical','Geophysical','Geophysical','Technological','Technological','Meteorological']})
如何根据组替换 'nan'
?
以下是我目前的做法:
我想将 nan
与另一列中下一行的特定字符串匹配的字符串替换为一些替代字符串。
这是我的数据集中的一个数据样本,它在其中抓取工作这是来自 pd.to_dict()
的输出我想保留它,因为它是为了复制我的问题。:
for ty, go in zip(data_pd['type'].values, data_pd['group'].values):
if ty == 'nan' and go == 'Climatological':
#ty = ['Drought']
print(ty) #prints nothing as it did not work
不不迭代这种任务,这是低效的!
您可以使用蒙版和 pandas.where
来应用您的过滤器:
data_pd['type'] = data_pd['type'].mask(data_pd['type'].eq('nan') & data_pd['group'].eq('Climatological'), 'Drought')
输出:
type group
0 Drought Climatological
1 Drought Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 nan Technological
9 Explosion Technological
10 nan Meteorological
更清洁的解决方案
如果您的 objective 是按组填写,您可以使用字典和 groupy
:
subs = {'Climatological': 'Drought', 'Technological': 'foo'}
(data_pd.replace('nan', pd.NA)
.groupby('group')
.apply(lambda g: g.fillna(subs.get(g.name, 'nan')))
)
输出:
type group
0 Drought Climatological
1 Drought Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 foo Technological
9 Explosion Technological
10 nan Meteorological
for index, row in data_pd.iterrows():
if row["type"] == 'nan' and row["group"] == 'Climatological':
data_pd["type"][index] = "Drought"
为了更加简洁和用户友好,我尽量匹配您的代码。
我有以下数据:
type group
0 Drought Climatological
1 nan Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 nan Technological
9 Explosion Technological
10 nan Meteorological
data_pd = pd.DataFrame({'type':['Drought','nan','Explosion','Ground movement','nan','Ash fall','Rockfall','Ash fall','nan','Explosion','nan'],
'group':['Climatological','Climatological','Technological','Geophysical','Geophysical',
'Geophysical','Geophysical','Geophysical','Technological','Technological','Meteorological']})
如何根据组替换 'nan'
?
以下是我目前的做法:
我想将 nan
与另一列中下一行的特定字符串匹配的字符串替换为一些替代字符串。
这是我的数据集中的一个数据样本,它在其中抓取工作这是来自 pd.to_dict()
的输出我想保留它,因为它是为了复制我的问题。:
for ty, go in zip(data_pd['type'].values, data_pd['group'].values):
if ty == 'nan' and go == 'Climatological':
#ty = ['Drought']
print(ty) #prints nothing as it did not work
不不迭代这种任务,这是低效的!
您可以使用蒙版和 pandas.where
来应用您的过滤器:
data_pd['type'] = data_pd['type'].mask(data_pd['type'].eq('nan') & data_pd['group'].eq('Climatological'), 'Drought')
输出:
type group
0 Drought Climatological
1 Drought Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 nan Technological
9 Explosion Technological
10 nan Meteorological
更清洁的解决方案
如果您的 objective 是按组填写,您可以使用字典和 groupy
:
subs = {'Climatological': 'Drought', 'Technological': 'foo'}
(data_pd.replace('nan', pd.NA)
.groupby('group')
.apply(lambda g: g.fillna(subs.get(g.name, 'nan')))
)
输出:
type group
0 Drought Climatological
1 Drought Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 foo Technological
9 Explosion Technological
10 nan Meteorological
for index, row in data_pd.iterrows():
if row["type"] == 'nan' and row["group"] == 'Climatological':
data_pd["type"][index] = "Drought"
为了更加简洁和用户友好,我尽量匹配您的代码。