根据 pandas 中的可用性数据(这些列的缺失值或 NaN 值)计算两列的平均值
Calculate average of two columns based on the availability data (missing or NaN value of those columns) in pandas
我有 df 如下图
df:
player goals_oct goals_nov
messi 2 4
neymar 2 NaN
ronaldo NaN 3
salah NaN NaN
levenoski 2 2
我想计算每个球员的平均进球数。当两个数据都可用时,这是 goals_oct
和 goals_nov
的平均值,如果两者都不可用,则为 NaN
预期输出
player goals_oct goals_nov avg_goals
messi 2 4 3
neymar 2 NaN 2
ronaldo NaN 3 3
salah NaN NaN NaN
levenoski 2 0 1
我尝试了下面的代码,但是没有用
conditions_g = [(df['goals_oct'].isnull() and df['goals_nov'].notnull()),
(df['goals_oct'].notnull() and df['goals_nov'].isnull())]
choices_g = [df['goals_nov'], df['goals_oct']]
df['avg_goals']=np.select(conditions_g, choices_g, default=(df['goals_oct']+df['goals_nov'])/2)
只需使用mean(axis=1)
。它将跳过 NaN:
columns = df.columns[1:] # all columns except the first
df['avg_goal'] = df[columns].mean(axis=1)
输出:
>>> df
player goals_oct goals_nov avg_goal
0 messi 2.0 4.0 3.0
1 neymar 2.0 NaN 2.0
2 ronaldo NaN 3.0 3.0
3 salah NaN NaN NaN
4 levenoski 2.0 2.0 2.0
尝试一下它会起作用
df["avg_goals"] = np.where(df.goals_oct.isnull(),
np.where(df.goals_nov.isnull(), np.NaN, df.goals_nov),
np.where(df.goals_nov.isnull(), df.goals_oct, (df.goals_oct + df.goals_nov) / 2))
如果你想将 0
视为 empty value
那么你可以 convert 0 to np.NaN
并尝试上面的语句它会起作用
我有 df 如下图
df:
player goals_oct goals_nov
messi 2 4
neymar 2 NaN
ronaldo NaN 3
salah NaN NaN
levenoski 2 2
我想计算每个球员的平均进球数。当两个数据都可用时,这是 goals_oct
和 goals_nov
的平均值,如果两者都不可用,则为 NaN
预期输出
player goals_oct goals_nov avg_goals
messi 2 4 3
neymar 2 NaN 2
ronaldo NaN 3 3
salah NaN NaN NaN
levenoski 2 0 1
我尝试了下面的代码,但是没有用
conditions_g = [(df['goals_oct'].isnull() and df['goals_nov'].notnull()),
(df['goals_oct'].notnull() and df['goals_nov'].isnull())]
choices_g = [df['goals_nov'], df['goals_oct']]
df['avg_goals']=np.select(conditions_g, choices_g, default=(df['goals_oct']+df['goals_nov'])/2)
只需使用mean(axis=1)
。它将跳过 NaN:
columns = df.columns[1:] # all columns except the first
df['avg_goal'] = df[columns].mean(axis=1)
输出:
>>> df
player goals_oct goals_nov avg_goal
0 messi 2.0 4.0 3.0
1 neymar 2.0 NaN 2.0
2 ronaldo NaN 3.0 3.0
3 salah NaN NaN NaN
4 levenoski 2.0 2.0 2.0
尝试一下它会起作用
df["avg_goals"] = np.where(df.goals_oct.isnull(),
np.where(df.goals_nov.isnull(), np.NaN, df.goals_nov),
np.where(df.goals_nov.isnull(), df.goals_oct, (df.goals_oct + df.goals_nov) / 2))
如果你想将 0
视为 empty value
那么你可以 convert 0 to np.NaN
并尝试上面的语句它会起作用