Pandas 最高 x 列的平均值

Pandas mean across highest x columns

我希望能够计算 4 周列的平均值,但如果 Top x 列中的数字小于 4,我只想使用最大 x 值来计算平均值(即如果 Top x = 3, 计算均值时舍弃最低周值。

示例数据框:

df = pd.DataFrame({'week 1' : [1.0, 5.0, 7.0, 6.0, np.nan],
               'week 2' : [3.0, np.nan, 9.0, 8.0, np.nan],
               'week 3' : [1.0, 2.0, 2.0, 1.0, 6.0],
               'week 4' : [np.nan, 4.0, 2.0, 7.0, 6.0],
               'Top x' : [3.0, 2.0, 4.0, 3.0, 3.0]})

     week 1  week 2  week 3  week 4  Top x
0     1.0     3.0     1.0     NaN    3.0
1     5.0     NaN     2.0     4.0    2.0
2     7.0     9.0     2.0     2.0    4.0
3     6.0     8.0     1.0     7.0    3.0
4     NaN     NaN     6.0     6.0    3.0

预期输出:

     week 1  week 2  week 3  week 4  Top x   Mean
0     1.0     3.0     1.0     NaN    3.0  1.666667
1     5.0     NaN     2.0     4.0    2.0  4.500000
2     7.0     9.0     2.0     2.0    4.0  5.000000
3     6.0     8.0     1.0     7.0    3.0  7.000000
4     NaN     NaN     6.0     6.0    3.0  6.000000

我不知道是否有办法向 pandas.mean() 函数添加一个函数,或者是否可以更简单地对周前 x 列求和(也许将每一行变成一个列表?)和除以前 x 列。

使用 DataFrame.melt with DataFrame.sort_values first and then compare counter by GroupBy.cumcount,fompare by Top x 并过滤,最后聚合 mean:

df1 = df.melt('Top x', ignore_index=False).sort_values('value', ascending=False)

df['Mean'] = (df1[df1.groupby(level=0).cumcount().lt(df1['Top x'])]
                     .groupby(level=0)['value'].mean())
print (df)
   week 1  week 2  week 3  week 4  Top x      Mean
0     1.0     3.0     1.0     NaN    3.0  1.666667
1     5.0     NaN     2.0     4.0    2.0  4.500000
2     7.0     9.0     2.0     2.0    4.0  5.000000
3     6.0     8.0     1.0     7.0    3.0  7.000000
4     NaN     NaN     6.0     6.0    3.0  6.000000

替代解决方案(在大型数据帧中应该更快)是按 DataFrame.rank with DataFrame.where:

测试排序值
df1 = df.drop('Top x', axis=1)
df['Mean'] = (df1.where(df1.rank(axis=1, method='first', ascending=False)
                           .le(df['Top x'], axis=0))
                 .mean(axis=1))
print (df)
   week 1  week 2  week 3  week 4  Top x      Mean
0     1.0     3.0     1.0     NaN    3.0  1.666667
1     5.0     NaN     2.0     4.0    2.0  4.500000
2     7.0     9.0     2.0     2.0    4.0  5.000000
3     6.0     8.0     1.0     7.0    3.0  7.000000
4     NaN     NaN     6.0     6.0    3.0  6.000000