根据 pandas 中的可用性数据(这些列的缺失值或 NaN 值)计算两列的平均值

Calculate average of two columns based on the availability data (missing or NaN value of those columns) in pandas

我有 df 如下图

df:

player    goals_oct     goals_nov
messi     2             4
neymar    2             NaN
ronaldo   NaN           3
salah     NaN           NaN
levenoski 2             2

我想计算每个球员的平均进球数。当两个数据都可用时,这是 goals_octgoals_nov 的平均值,如果两者都不可用,则为 NaN

预期输出

player    goals_oct     goals_nov   avg_goals
messi     2             4           3
neymar    2             NaN         2 
ronaldo   NaN           3           3
salah     NaN           NaN         NaN
levenoski 2             0           1

我尝试了下面的代码,但是没有用

conditions_g = [(df['goals_oct'].isnull() and df['goals_nov'].notnull()), 
              (df['goals_oct'].notnull() and df['goals_nov'].isnull())]

choices_g = [df['goals_nov'], df['goals_oct']]

df['avg_goals']=np.select(conditions_g, choices_g, default=(df['goals_oct']+df['goals_nov'])/2)

只需使用mean(axis=1)。它将跳过 NaN:

columns = df.columns[1:] # all columns except the first
df['avg_goal'] = df[columns].mean(axis=1)

输出:

>>> df
      player  goals_oct  goals_nov  avg_goal
0      messi        2.0        4.0       3.0
1     neymar        2.0        NaN       2.0
2    ronaldo        NaN        3.0       3.0
3      salah        NaN        NaN       NaN
4  levenoski        2.0        2.0       2.0

尝试一下它会起作用

df["avg_goals"] = np.where(df.goals_oct.isnull(),
                           np.where(df.goals_nov.isnull(), np.NaN, df.goals_nov),
                           np.where(df.goals_nov.isnull(), df.goals_oct, (df.goals_oct + df.goals_nov) / 2))

如果你想将 0 视为 empty value 那么你可以 convert 0 to np.NaN 并尝试上面的语句它会起作用