分层或递归地填充列
Fill column hierarchically or recursively
我想根据 variable_1 创建一个新列 [new_var
]。如果 variable_1 为 NA,则使用 variable_2。如果两者都是NA,则保留为NA。
有没有比下面更聪明的方法呢?如果我有 4 或 5 个变量,解决方案将无法很好地扩展。
df['new_var'] = df['variable_1']
df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'new_var'] = df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'variable_2']
没有示例数据很难回答,但我想你应该简单地 pandas.where
:
df['new_var'] = df['variable_2'].where(df['variable_1'].isna())
使用np.where
:
import numpy as np
df['new_var'] = np.where(df['variable_1'].isna(), df['variable_2'], df['variable_1'])
使用bfill
:
The solution wouldn't scale up well if I had 4 or 5 variables.
cols = ['var1', 'var2', 'var3']
df['new_var'] = df[cols].bfill(axis=1)[cols[0]]
print(df)
# Output:
var1 var2 var3 new_var
0 3.0 4.0 9.0 3.0
1 NaN 8.0 5.0 8.0
2 NaN NaN 6.0 6.0
3 NaN NaN NaN NaN
设置:
df = pd.DataFrame({'var1': [3, np.NaN, np.NaN, np.NaN],
'var2': [4, 8, np.NaN, np.NaN],
'var3': [9, 5, 6, np.NaN]})
旧答案:仅适用于 2 个变量
使用fillna
:
df['new_var'] = df['var1'].fillna(df['var2'])
print(df)
# Output:
var1 var2 new_var
0 3.0 4.0 3.0
1 NaN 8.0 8.0
2 NaN NaN NaN
设置:
df = pd.DataFrame({'var1': [3, np.NaN, np.NaN], 'var2': [4, 8, np.NaN]})
更新
你也可以使用combine_first
:
df['new_var'] = df['var1'].combine_first(df['var2'])
print(df)
# Output:
var1 var2 new_var
0 3.0 4.0 3.0
1 NaN 8.0 8.0
2 NaN NaN NaN
我想根据 variable_1 创建一个新列 [new_var
]。如果 variable_1 为 NA,则使用 variable_2。如果两者都是NA,则保留为NA。
有没有比下面更聪明的方法呢?如果我有 4 或 5 个变量,解决方案将无法很好地扩展。
df['new_var'] = df['variable_1']
df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'new_var'] = df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'variable_2']
没有示例数据很难回答,但我想你应该简单地 pandas.where
:
df['new_var'] = df['variable_2'].where(df['variable_1'].isna())
使用np.where
:
import numpy as np
df['new_var'] = np.where(df['variable_1'].isna(), df['variable_2'], df['variable_1'])
使用bfill
:
The solution wouldn't scale up well if I had 4 or 5 variables.
cols = ['var1', 'var2', 'var3']
df['new_var'] = df[cols].bfill(axis=1)[cols[0]]
print(df)
# Output:
var1 var2 var3 new_var
0 3.0 4.0 9.0 3.0
1 NaN 8.0 5.0 8.0
2 NaN NaN 6.0 6.0
3 NaN NaN NaN NaN
设置:
df = pd.DataFrame({'var1': [3, np.NaN, np.NaN, np.NaN],
'var2': [4, 8, np.NaN, np.NaN],
'var3': [9, 5, 6, np.NaN]})
旧答案:仅适用于 2 个变量
使用fillna
:
df['new_var'] = df['var1'].fillna(df['var2'])
print(df)
# Output:
var1 var2 new_var
0 3.0 4.0 3.0
1 NaN 8.0 8.0
2 NaN NaN NaN
设置:
df = pd.DataFrame({'var1': [3, np.NaN, np.NaN], 'var2': [4, 8, np.NaN]})
更新
你也可以使用combine_first
:
df['new_var'] = df['var1'].combine_first(df['var2'])
print(df)
# Output:
var1 var2 new_var
0 3.0 4.0 3.0
1 NaN 8.0 8.0
2 NaN NaN NaN