根据 Python 中特定列的值替换缺失值
Replace missing values based on value of a specific column in Python
我想根据已提交列的值替换缺失值。
在下面找到我所拥有的:
Year
Country
Submitted
Age12
Age14
2018
CHI
1
267
NaN
2019
CHI
NaN
NaN
NaN
2020
CHI
1
244
203
2018
ALB
1
163
165
2019
ALB
1
NaN
NaN
2020
ALB
1
161
NaN
2018
GER
1
451
381
2019
GER
NaN
NaN
NaN
2020
GER
1
361
321
这是我想要的:
Year
Country
Submitted
Age12
Age14
2018
CHI
1
267
NaN
2019
CHI
NaN
267
NaN
2020
CHI
1
244
203
2018
ALB
1
163
165
2019
ALB
1
NaN
NaN
2020
ALB
1
161
NaN
2018
GER
1
451
381
2019
GER
NaN
451
381
2020
GER
1
361
321
我尝试使用命令 df.fillna(axis=0, method='ffill')
但这将所有值 NaN 替换为之前的值,但这不是我想要的,因为如果“已提交”列值为 1,则某些值应保留为 NaN。
仅当相应的“已提交”值为“NaN”时,我才想更改前一行的值。
谢谢
尝试将 where
与您所做的结合使用:
df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))
这将仅在 Submitted
为空时替换条目。
您可以使用 np.where
进行条件 ffill()
import numpy as np
(
df.assign(Age12=np.where(df.Submitted.isna(), df.Age12.ffill(), df.Age12))
.assign(Age14=np.where(df.Submitted.isna(), df.Age14.ffill(), df.Age14))
)
当Submitted
为NaN
时可以使用.filter()
to select the related columns and put the columns in the list cols
. Then, use .mask()
to change the values of the selected columns by forward fill using ffill()
,如下:
cols = df.filter(like='Age').columns
df[cols] = df[cols].mask(df['Submitted'].isna(), df[cols].ffill())
结果:
print(df)
Year Country Submitted Age12 Age14
0 2018 CHI 1.0 267.0 NaN
1 2019 CHI NaN 267.0 NaN
2 2020 CHI 1.0 244.0 203.0
3 2018 ALB 1.0 163.0 165.0
4 2019 ALB 1.0 NaN NaN
5 2020 ALB 1.0 161.0 NaN
6 2018 GER 1.0 451.0 381.0
7 2019 GER NaN 451.0 381.0
8 2020 GER 1.0 361.0 321.0
我刚刚使用 for 循环来检查和更新数据框中的值
import pandas as pd
new_data = [[2018,'CHI',1,267,30], [2019,'CHI','NaN','NaN','NaN'], [2020,'CHI',1,244,203]]
df = pd.DataFrame(new_data, columns = ['Year','Country','Submitted','Age12','Age14'])
prevValue12 = df.iloc[0]['Age12']
prevValue14 = df.iloc[0]['Age14']
for index, row in df.iterrows():
if(row['Submitted']=='NaN'):
df.at[index,'Age12']=prevValue12
df.at[index,'Age14']=prevValue14
prevValue12 = row['Age12']
prevValue14 = row['Age14']
print(df)
输出
Year Country Submitted Age12 Age14
0 2018 CHI 1 267 30
1 2019 CHI NaN 267 30
2 2020 CHI 1 244 203
我想根据已提交列的值替换缺失值。
在下面找到我所拥有的:
Year | Country | Submitted | Age12 | Age14 |
---|---|---|---|---|
2018 | CHI | 1 | 267 | NaN |
2019 | CHI | NaN | NaN | NaN |
2020 | CHI | 1 | 244 | 203 |
2018 | ALB | 1 | 163 | 165 |
2019 | ALB | 1 | NaN | NaN |
2020 | ALB | 1 | 161 | NaN |
2018 | GER | 1 | 451 | 381 |
2019 | GER | NaN | NaN | NaN |
2020 | GER | 1 | 361 | 321 |
这是我想要的:
Year | Country | Submitted | Age12 | Age14 |
---|---|---|---|---|
2018 | CHI | 1 | 267 | NaN |
2019 | CHI | NaN | 267 | NaN |
2020 | CHI | 1 | 244 | 203 |
2018 | ALB | 1 | 163 | 165 |
2019 | ALB | 1 | NaN | NaN |
2020 | ALB | 1 | 161 | NaN |
2018 | GER | 1 | 451 | 381 |
2019 | GER | NaN | 451 | 381 |
2020 | GER | 1 | 361 | 321 |
我尝试使用命令 df.fillna(axis=0, method='ffill') 但这将所有值 NaN 替换为之前的值,但这不是我想要的,因为如果“已提交”列值为 1,则某些值应保留为 NaN。
仅当相应的“已提交”值为“NaN”时,我才想更改前一行的值。
谢谢
尝试将 where
与您所做的结合使用:
df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))
这将仅在 Submitted
为空时替换条目。
您可以使用 np.where
ffill()
import numpy as np
(
df.assign(Age12=np.where(df.Submitted.isna(), df.Age12.ffill(), df.Age12))
.assign(Age14=np.where(df.Submitted.isna(), df.Age14.ffill(), df.Age14))
)
当Submitted
为NaN
时可以使用.filter()
to select the related columns and put the columns in the list cols
. Then, use .mask()
to change the values of the selected columns by forward fill using ffill()
,如下:
cols = df.filter(like='Age').columns
df[cols] = df[cols].mask(df['Submitted'].isna(), df[cols].ffill())
结果:
print(df)
Year Country Submitted Age12 Age14
0 2018 CHI 1.0 267.0 NaN
1 2019 CHI NaN 267.0 NaN
2 2020 CHI 1.0 244.0 203.0
3 2018 ALB 1.0 163.0 165.0
4 2019 ALB 1.0 NaN NaN
5 2020 ALB 1.0 161.0 NaN
6 2018 GER 1.0 451.0 381.0
7 2019 GER NaN 451.0 381.0
8 2020 GER 1.0 361.0 321.0
我刚刚使用 for 循环来检查和更新数据框中的值
import pandas as pd
new_data = [[2018,'CHI',1,267,30], [2019,'CHI','NaN','NaN','NaN'], [2020,'CHI',1,244,203]]
df = pd.DataFrame(new_data, columns = ['Year','Country','Submitted','Age12','Age14'])
prevValue12 = df.iloc[0]['Age12']
prevValue14 = df.iloc[0]['Age14']
for index, row in df.iterrows():
if(row['Submitted']=='NaN'):
df.at[index,'Age12']=prevValue12
df.at[index,'Age14']=prevValue14
prevValue12 = row['Age12']
prevValue14 = row['Age14']
print(df)
输出
Year Country Submitted Age12 Age14
0 2018 CHI 1 267 30
1 2019 CHI NaN 267 30
2 2020 CHI 1 244 203