分层或递归地填充列

Fill column hierarchically or recursively

我想根据 variable_1 创建一个新列 [new_var]。如果 variable_1 为 NA,则使用 variable_2。如果两者都是NA,则保留为NA。

有没有比下面更聪明的方法呢?如果我有 4 或 5 个变量,解决方案将无法很好地扩展。

df['new_var'] = df['variable_1']

df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'new_var'] = df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'variable_2']

没有示例数据很难回答,但我想你应该简单地 pandas.where:

df['new_var'] = df['variable_2'].where(df['variable_1'].isna())

使用np.where:

import numpy as np
df['new_var'] = np.where(df['variable_1'].isna(), df['variable_2'], df['variable_1'])

使用bfill:

The solution wouldn't scale up well if I had 4 or 5 variables.

cols = ['var1', 'var2', 'var3']
df['new_var'] = df[cols].bfill(axis=1)[cols[0]]
print(df)

# Output:
   var1  var2  var3  new_var
0   3.0   4.0   9.0      3.0
1   NaN   8.0   5.0      8.0
2   NaN   NaN   6.0      6.0
3   NaN   NaN   NaN      NaN

设置:

df = pd.DataFrame({'var1': [3, np.NaN, np.NaN, np.NaN],
                   'var2': [4, 8, np.NaN, np.NaN],
                   'var3': [9, 5, 6, np.NaN]})

旧答案:仅适用于 2 个变量

使用fillna:

df['new_var'] = df['var1'].fillna(df['var2'])
print(df)

# Output:
   var1  var2  new_var
0   3.0   4.0      3.0
1   NaN   8.0      8.0
2   NaN   NaN      NaN

设置:

df = pd.DataFrame({'var1': [3, np.NaN, np.NaN], 'var2': [4, 8, np.NaN]})

更新

你也可以使用combine_first:

df['new_var'] = df['var1'].combine_first(df['var2'])
print(df)

# Output:
   var1  var2  new_var
0   3.0   4.0      3.0
1   NaN   8.0      8.0
2   NaN   NaN      NaN