Pandas 最高 x 列的平均值
Pandas mean across highest x columns
我希望能够计算 4 周列的平均值,但如果 Top x 列中的数字小于 4,我只想使用最大 x 值来计算平均值(即如果 Top x = 3, 计算均值时舍弃最低周值。
示例数据框:
df = pd.DataFrame({'week 1' : [1.0, 5.0, 7.0, 6.0, np.nan],
'week 2' : [3.0, np.nan, 9.0, 8.0, np.nan],
'week 3' : [1.0, 2.0, 2.0, 1.0, 6.0],
'week 4' : [np.nan, 4.0, 2.0, 7.0, 6.0],
'Top x' : [3.0, 2.0, 4.0, 3.0, 3.0]})
week 1 week 2 week 3 week 4 Top x
0 1.0 3.0 1.0 NaN 3.0
1 5.0 NaN 2.0 4.0 2.0
2 7.0 9.0 2.0 2.0 4.0
3 6.0 8.0 1.0 7.0 3.0
4 NaN NaN 6.0 6.0 3.0
预期输出:
week 1 week 2 week 3 week 4 Top x Mean
0 1.0 3.0 1.0 NaN 3.0 1.666667
1 5.0 NaN 2.0 4.0 2.0 4.500000
2 7.0 9.0 2.0 2.0 4.0 5.000000
3 6.0 8.0 1.0 7.0 3.0 7.000000
4 NaN NaN 6.0 6.0 3.0 6.000000
我不知道是否有办法向 pandas.mean()
函数添加一个函数,或者是否可以更简单地对周前 x 列求和(也许将每一行变成一个列表?)和除以前 x 列。
使用 DataFrame.melt
with DataFrame.sort_values
first and then compare counter by GroupBy.cumcount
,fompare by Top x
并过滤,最后聚合 mean
:
df1 = df.melt('Top x', ignore_index=False).sort_values('value', ascending=False)
df['Mean'] = (df1[df1.groupby(level=0).cumcount().lt(df1['Top x'])]
.groupby(level=0)['value'].mean())
print (df)
week 1 week 2 week 3 week 4 Top x Mean
0 1.0 3.0 1.0 NaN 3.0 1.666667
1 5.0 NaN 2.0 4.0 2.0 4.500000
2 7.0 9.0 2.0 2.0 4.0 5.000000
3 6.0 8.0 1.0 7.0 3.0 7.000000
4 NaN NaN 6.0 6.0 3.0 6.000000
替代解决方案(在大型数据帧中应该更快)是按 DataFrame.rank
with DataFrame.where
:
测试排序值
df1 = df.drop('Top x', axis=1)
df['Mean'] = (df1.where(df1.rank(axis=1, method='first', ascending=False)
.le(df['Top x'], axis=0))
.mean(axis=1))
print (df)
week 1 week 2 week 3 week 4 Top x Mean
0 1.0 3.0 1.0 NaN 3.0 1.666667
1 5.0 NaN 2.0 4.0 2.0 4.500000
2 7.0 9.0 2.0 2.0 4.0 5.000000
3 6.0 8.0 1.0 7.0 3.0 7.000000
4 NaN NaN 6.0 6.0 3.0 6.000000
我希望能够计算 4 周列的平均值,但如果 Top x 列中的数字小于 4,我只想使用最大 x 值来计算平均值(即如果 Top x = 3, 计算均值时舍弃最低周值。
示例数据框:
df = pd.DataFrame({'week 1' : [1.0, 5.0, 7.0, 6.0, np.nan],
'week 2' : [3.0, np.nan, 9.0, 8.0, np.nan],
'week 3' : [1.0, 2.0, 2.0, 1.0, 6.0],
'week 4' : [np.nan, 4.0, 2.0, 7.0, 6.0],
'Top x' : [3.0, 2.0, 4.0, 3.0, 3.0]})
week 1 week 2 week 3 week 4 Top x
0 1.0 3.0 1.0 NaN 3.0
1 5.0 NaN 2.0 4.0 2.0
2 7.0 9.0 2.0 2.0 4.0
3 6.0 8.0 1.0 7.0 3.0
4 NaN NaN 6.0 6.0 3.0
预期输出:
week 1 week 2 week 3 week 4 Top x Mean
0 1.0 3.0 1.0 NaN 3.0 1.666667
1 5.0 NaN 2.0 4.0 2.0 4.500000
2 7.0 9.0 2.0 2.0 4.0 5.000000
3 6.0 8.0 1.0 7.0 3.0 7.000000
4 NaN NaN 6.0 6.0 3.0 6.000000
我不知道是否有办法向 pandas.mean()
函数添加一个函数,或者是否可以更简单地对周前 x 列求和(也许将每一行变成一个列表?)和除以前 x 列。
使用 DataFrame.melt
with DataFrame.sort_values
first and then compare counter by GroupBy.cumcount
,fompare by Top x
并过滤,最后聚合 mean
:
df1 = df.melt('Top x', ignore_index=False).sort_values('value', ascending=False)
df['Mean'] = (df1[df1.groupby(level=0).cumcount().lt(df1['Top x'])]
.groupby(level=0)['value'].mean())
print (df)
week 1 week 2 week 3 week 4 Top x Mean
0 1.0 3.0 1.0 NaN 3.0 1.666667
1 5.0 NaN 2.0 4.0 2.0 4.500000
2 7.0 9.0 2.0 2.0 4.0 5.000000
3 6.0 8.0 1.0 7.0 3.0 7.000000
4 NaN NaN 6.0 6.0 3.0 6.000000
替代解决方案(在大型数据帧中应该更快)是按 DataFrame.rank
with DataFrame.where
:
df1 = df.drop('Top x', axis=1)
df['Mean'] = (df1.where(df1.rank(axis=1, method='first', ascending=False)
.le(df['Top x'], axis=0))
.mean(axis=1))
print (df)
week 1 week 2 week 3 week 4 Top x Mean
0 1.0 3.0 1.0 NaN 3.0 1.666667
1 5.0 NaN 2.0 4.0 2.0 4.500000
2 7.0 9.0 2.0 2.0 4.0 5.000000
3 6.0 8.0 1.0 7.0 3.0 7.000000
4 NaN NaN 6.0 6.0 3.0 6.000000