根据组值依次从组中提取前 n 个

Question

我正在计算一些足球统计数据。

我有以下数据框：

{'Player': {8: 'Darrel Williams',  2: 'Mark Ingram',  3: 'Michael Carter',  4: 'Najee Harris',  10: 'James Conner',  0: 'Buffalo Bills',  15: 'Davante Adams',  1: 'Aaron Rodgers',  5: 'Tyler Bass',  11: 'Corey Davis',  6: 'Van Jefferson',  14: 'Matt Ryan',  7: 'T.J. Hockenson',  9: 'Antonio Brown',  12: 'Alvin Kamara',  13: 'Tyler Boyd'}, 'Position': {8: 'RB',  2: 'RB',  3: 'RB',  4: 'RB',  10: 'RB',  0: 'DEF',  15: 'WR',  1: 'QB',  5: 'K',  11: 'WR',  6: 'WR',  14: 'QB',  7: 'TE',  9: 'WR',  12: 'RB',  13: 'WR'}, 'Score': {8: 24.9,  2: 18.8,  3: 16.2,  4: 15.3,  10: 13.9,  0: 12.0,  15: 11.3,  1: 10.48,  5: 9.0,  11: 8.8,  6: 6.9,  14: 1.68,  7: 0.0,  9: 0.0,  12: 0.0,  13: 0.0}}

Player	Position	Score
Darrel Williams	RB	24.9
Mark Ingram	RB	18.8
Michael Carter	RB	16.2
Najee Harris	RB	15.3
James Conner	RB	13.9
Buffalo Bills	DEF	12
Davante Adams	WR	11.3
Aaron Rodgers	QB	10.48
Tyler Bass	K	9
Corey Davis	WR	8.8
Van Jefferson	WR	6.9
Matt Ryan	QB	1.68
T.J. Hockenson	TE	0
Antonio Brown	WR	0
Alvin Kamara	RB	0
Tyler Boyd	WR	0

鉴于以下 requirements_dictionary，我要做的是为每个 key（Position 在数据框中）：

requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}

具有挑战性的是，对于最终的 key、FLEX，它与 dataframe 中的任何位置匹配，因为该值 可能是一个位置：RB, WR,或TE.

最终输出应如下所示：

Player	Position	Score
Darrel Williams	RB	24.9
Mark Ingram	RB	18.8
Michael Carter	RB	16.2
Najee Harris	RB	15.3
Buffalo Bills	DEF	12
Davante Adams	WR	11.3
Aaron Rodgers	QB	10.48
Tyler Bass	K	9
Corey Davis	WR	8.8
T.J. Hockenson	TE	0

既然是顶2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF又是2 FLEX.

我尝试了以下让我接近的代码：

all_points.groupby('Position')['Score'].nlargest(2)

Position    
DEF       0     12.00
K         5      9.00
QB        1     10.48
          14     1.68
RB        8     24.90
          2     18.80
TE        7      0.00
WR        15    11.30
          11     8.80
Name: Score, dtype: float64

但是，这并没有说明FLEX“位置”

我也可以循环遍历数据帧并手动执行此操作，但这似乎非常密集。

怎样才能达到预期的效果？

Answer 1

创建一个自定义函数，根据您的要求为每个组创建 select 玩家数量，并将此索引保持为 idx_best。然后排除所有已经 selected 的玩家和 select FLEX 其他玩家作为 idx_flex。最后提取这两个索引的并集。

FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])

idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index

out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)

输出：

>>> out
             Player Position  Score
8   Darrel Williams       RB  24.90
2       Mark Ingram       RB  18.80
3    Michael Carter       RB  16.20
4      Najee Harris       RB  15.30
0     Buffalo Bills      DEF  12.00
15    Davante Adams       WR  11.30
1     Aaron Rodgers       QB  10.48
5        Tyler Bass        K   9.00
11      Corey Davis       WR   8.80
7    T.J. Hockenson       TE   0.00

Answer 2

使用需求字典获取等于某个位置的行，然后按分数排序并获得等于该位置的字典值的头部。 Flex 在 RB、WR、TE 中排名前 2。我连接弹性结果。我的解决方案更直观、更合乎逻辑

txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""

df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
    df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)

position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))

#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)

输出

       Player Position  Score
6    Aaron Rodgers       QB  10.48
0  Darrel Williams       RB  24.90
1      Mark Ingram       RB  18.80
5    Davante Adams       WR  11.30
8      Corey Davis       WR   8.80
9   T.J. Hockenson       TE   0.00
7       Tyler Bass        K   9.00
4    Buffalo Bills      DEF  12.00

根据组值依次从组中提取前 n 个

Sequentially extract top n from group depending on group value

python

aggregate

pandas

pandas-groupby