根据 Python 中特定列的值替换缺失值

Replace missing values based on value of a specific column in Python

我想根据已提交列的值替换缺失值。

在下面找到我所拥有的:

Year Country Submitted Age12 Age14
2018 CHI 1 267 NaN
2019 CHI NaN NaN NaN
2020 CHI 1 244 203
2018 ALB 1 163 165
2019 ALB 1 NaN NaN
2020 ALB 1 161 NaN
2018 GER 1 451 381
2019 GER NaN NaN NaN
2020 GER 1 361 321

这是我想要的:

Year Country Submitted Age12 Age14
2018 CHI 1 267 NaN
2019 CHI NaN 267 NaN
2020 CHI 1 244 203
2018 ALB 1 163 165
2019 ALB 1 NaN NaN
2020 ALB 1 161 NaN
2018 GER 1 451 381
2019 GER NaN 451 381
2020 GER 1 361 321

我尝试使用命令 df.fillna(axis=0, method='ffill') 但这将所有值 NaN 替换为之前的值,但这不是我想要的,因为如果“已提交”列值为 1,则某些值应保留为 NaN。

仅当相应的“已提交”值为“NaN”时,我才想更改前一行的值。

谢谢

尝试将 where 与您所做的结合使用:

 df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))

这将仅在 Submitted 为空时替换条目。

您可以使用 np.where

进行条件 ffill()
import numpy as np
(
    df.assign(Age12=np.where(df.Submitted.isna(), df.Age12.ffill(), df.Age12))
    .assign(Age14=np.where(df.Submitted.isna(), df.Age14.ffill(), df.Age14))
)

SubmittedNaN时可以使用.filter() to select the related columns and put the columns in the list cols. Then, use .mask() to change the values of the selected columns by forward fill using ffill(),如下:

cols = df.filter(like='Age').columns

df[cols] = df[cols].mask(df['Submitted'].isna(), df[cols].ffill())

结果:

print(df)

   Year Country  Submitted  Age12  Age14
0  2018     CHI        1.0  267.0    NaN
1  2019     CHI        NaN  267.0    NaN
2  2020     CHI        1.0  244.0  203.0
3  2018     ALB        1.0  163.0  165.0
4  2019     ALB        1.0    NaN    NaN
5  2020     ALB        1.0  161.0    NaN
6  2018     GER        1.0  451.0  381.0
7  2019     GER        NaN  451.0  381.0
8  2020     GER        1.0  361.0  321.0

我刚刚使用 for 循环来检查和更新数据框中的值

import pandas as pd
new_data = [[2018,'CHI',1,267,30], [2019,'CHI','NaN','NaN','NaN'], [2020,'CHI',1,244,203]]
df = pd.DataFrame(new_data, columns = ['Year','Country','Submitted','Age12','Age14'])
prevValue12 = df.iloc[0]['Age12']
prevValue14 = df.iloc[0]['Age14']
for index, row in df.iterrows():
    if(row['Submitted']=='NaN'):
        df.at[index,'Age12']=prevValue12
        df.at[index,'Age14']=prevValue14
    prevValue12 = row['Age12']
    prevValue14 = row['Age14']
print(df)

输出

Year Country Submitted Age12 Age14
0  2018     CHI         1   267    30
1  2019     CHI       NaN   267    30
2  2020     CHI         1   244   203