根据一列到另一列填充缺失值
fill missing value based on one column to another
我有两列是这样的:
我想做的是假设 'age' 列值在 30-39 之间,我想填补 age_band = 30 的缺失值。就像假设 'age'列值在 80-89 之间,我想填充 age_band = 80 的缺失值。我如何在 pandas 数据帧中执行此操作?
我试过这样但是循环是 运行 就像永远
for ages in data['age']:
if 0<=ages<=9:
data['age_band']= data['age_band'].fillna(0)
elif 10<=ages<=19:
data['age_band']= data['age_band'].fillna(10)
elif 20<=ages<=29:
data['age_band']= data['age_band'].fillna(20)
elif 30<=ages<=39:
data['age_band']= data['age_band'].fillna(30)
elif 40<=ages<=49:
data['age_band']= data['age_band'].fillna(40)
elif 50<=ages<=59:
data['age_band']= data['age_band'].fillna(50)
elif 60<=ages<=69:
data['age_band']= data['age_band'].fillna(60)
elif 70<=ages<=79:
data['age_band']= data['age_band'].fillna(70)
elif 80<=ages<=89:
data['age_band']= data['age_band'].fillna(80)
elif 90<=ages<=99:
data['age_band']= data['age_band'].fillna(90)
elif 100<=ages<=109:
data['age_band']= data['age_band'].fillna(100)
请帮帮我
试试这个快捷方式:
data['age_band'] = data['age_band'].fillna(data['age'] // 10 * 10).astype(int)
print(data)
# Output
age age_band
0 93 90
1 46 40
2 50 50
3 56 50
4 89 80
5 19 10
6 25 20
7 17 10
8 54 50
9 42 40
设置:
import pandas as pd
import numpy as np
np.random.seed(2022)
data = pd.DataFrame({'age': np.random.randint(1, 111, 10), 'age_band': np.nan})
print(data)
# Output
age age_band
0 93 NaN
1 46 NaN
2 50 NaN
3 56 NaN
4 89 NaN
5 19 NaN
6 25 NaN
7 17 NaN
8 54 NaN
9 42 NaN
以上答案仅在年龄段相等时才有效,您可以尝试 pd.cut 这在所有情况下都有效。
您也可以对 pd.cut() 使用标签。以下示例包含 0-9 范围内的年龄。我们正在添加一个名为 'age alband' 的新列来对年龄
进行分类
bins表示区间:0-9为一个区间,10-19为一个区间,依此类推对应的标签为“0-9”等
bins = [0, 9,19,29,39,49,59,69,79,89,99,109]
labels = ["0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",">109"]
data['age_band']= pd.cut(data['age'], bins=bins, labels=labels)
我有两列是这样的:
我想做的是假设 'age' 列值在 30-39 之间,我想填补 age_band = 30 的缺失值。就像假设 'age'列值在 80-89 之间,我想填充 age_band = 80 的缺失值。我如何在 pandas 数据帧中执行此操作?
我试过这样但是循环是 运行 就像永远
for ages in data['age']:
if 0<=ages<=9:
data['age_band']= data['age_band'].fillna(0)
elif 10<=ages<=19:
data['age_band']= data['age_band'].fillna(10)
elif 20<=ages<=29:
data['age_band']= data['age_band'].fillna(20)
elif 30<=ages<=39:
data['age_band']= data['age_band'].fillna(30)
elif 40<=ages<=49:
data['age_band']= data['age_band'].fillna(40)
elif 50<=ages<=59:
data['age_band']= data['age_band'].fillna(50)
elif 60<=ages<=69:
data['age_band']= data['age_band'].fillna(60)
elif 70<=ages<=79:
data['age_band']= data['age_band'].fillna(70)
elif 80<=ages<=89:
data['age_band']= data['age_band'].fillna(80)
elif 90<=ages<=99:
data['age_band']= data['age_band'].fillna(90)
elif 100<=ages<=109:
data['age_band']= data['age_band'].fillna(100)
请帮帮我
试试这个快捷方式:
data['age_band'] = data['age_band'].fillna(data['age'] // 10 * 10).astype(int)
print(data)
# Output
age age_band
0 93 90
1 46 40
2 50 50
3 56 50
4 89 80
5 19 10
6 25 20
7 17 10
8 54 50
9 42 40
设置:
import pandas as pd
import numpy as np
np.random.seed(2022)
data = pd.DataFrame({'age': np.random.randint(1, 111, 10), 'age_band': np.nan})
print(data)
# Output
age age_band
0 93 NaN
1 46 NaN
2 50 NaN
3 56 NaN
4 89 NaN
5 19 NaN
6 25 NaN
7 17 NaN
8 54 NaN
9 42 NaN
以上答案仅在年龄段相等时才有效,您可以尝试 pd.cut 这在所有情况下都有效。
您也可以对 pd.cut() 使用标签。以下示例包含 0-9 范围内的年龄。我们正在添加一个名为 'age alband' 的新列来对年龄
进行分类bins表示区间:0-9为一个区间,10-19为一个区间,依此类推对应的标签为“0-9”等
bins = [0, 9,19,29,39,49,59,69,79,89,99,109]
labels = ["0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",">109"]
data['age_band']= pd.cut(data['age'], bins=bins, labels=labels)