根据组值依次从组中提取前 n 个

Sequentially extract top n from group depending on group value

我正在计算一些足球统计数据。

我有以下数据框:

{'Player': {8: 'Darrel Williams',  2: 'Mark Ingram',  3: 'Michael Carter',  4: 'Najee Harris',  10: 'James Conner',  0: 'Buffalo Bills',  15: 'Davante Adams',  1: 'Aaron Rodgers',  5: 'Tyler Bass',  11: 'Corey Davis',  6: 'Van Jefferson',  14: 'Matt Ryan',  7: 'T.J. Hockenson',  9: 'Antonio Brown',  12: 'Alvin Kamara',  13: 'Tyler Boyd'}, 'Position': {8: 'RB',  2: 'RB',  3: 'RB',  4: 'RB',  10: 'RB',  0: 'DEF',  15: 'WR',  1: 'QB',  5: 'K',  11: 'WR',  6: 'WR',  14: 'QB',  7: 'TE',  9: 'WR',  12: 'RB',  13: 'WR'}, 'Score': {8: 24.9,  2: 18.8,  3: 16.2,  4: 15.3,  10: 13.9,  0: 12.0,  15: 11.3,  1: 10.48,  5: 9.0,  11: 8.8,  6: 6.9,  14: 1.68,  7: 0.0,  9: 0.0,  12: 0.0,  13: 0.0}}
Player Position Score
Darrel Williams RB 24.9
Mark Ingram RB 18.8
Michael Carter RB 16.2
Najee Harris RB 15.3
James Conner RB 13.9
Buffalo Bills DEF 12
Davante Adams WR 11.3
Aaron Rodgers QB 10.48
Tyler Bass K 9
Corey Davis WR 8.8
Van Jefferson WR 6.9
Matt Ryan QB 1.68
T.J. Hockenson TE 0
Antonio Brown WR 0
Alvin Kamara RB 0
Tyler Boyd WR 0

鉴于以下 requirements_dictionary,我要做的是为每个 keyPosition 在数据框中):

requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}

具有挑战性的是,对于最终的 keyFLEX,它与 dataframe 中的任何位置匹配,因为该值 可能是一个位置:RB, WR,TE.

最终输出应如下所示:

Player Position Score
Darrel Williams RB 24.9
Mark Ingram RB 18.8
Michael Carter RB 16.2
Najee Harris RB 15.3
Buffalo Bills DEF 12
Davante Adams WR 11.3
Aaron Rodgers QB 10.48
Tyler Bass K 9
Corey Davis WR 8.8
T.J. Hockenson TE 0

既然是顶2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF又是2 FLEX.

我尝试了以下让我接近的代码:

all_points.groupby('Position')['Score'].nlargest(2)

Position    
DEF       0     12.00
K         5      9.00
QB        1     10.48
          14     1.68
RB        8     24.90
          2     18.80
TE        7      0.00
WR        15    11.30
          11     8.80
Name: Score, dtype: float64

但是,这并没有说明FLEX“位置”

我也可以循环遍历数据帧并手动执行此操作,但这似乎非常密集。

怎样才能达到预期的效果?

创建一个自定义函数,根据您的要求为每个组创建 select 玩家数量,并将此索引保持为 idx_best。然后排除所有已经 selected 的玩家和 select FLEX 其他玩家作为 idx_flex。最后提取这两个索引的并集。

FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])

idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index

out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)

输出:

>>> out
             Player Position  Score
8   Darrel Williams       RB  24.90
2       Mark Ingram       RB  18.80
3    Michael Carter       RB  16.20
4      Najee Harris       RB  15.30
0     Buffalo Bills      DEF  12.00
15    Davante Adams       WR  11.30
1     Aaron Rodgers       QB  10.48
5        Tyler Bass        K   9.00
11      Corey Davis       WR   8.80
7    T.J. Hockenson       TE   0.00

使用需求字典获取等于某个位置的行,然后按分数排序并获得等于该位置的字典值的头部。 Flex 在 RB、WR、TE 中排名前 2。我连接弹性结果。我的解决方案更直观、更合乎逻辑

txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""

df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
    df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)

position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))

#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)

输出

       Player Position  Score
6    Aaron Rodgers       QB  10.48
0  Darrel Williams       RB  24.90
1      Mark Ingram       RB  18.80
5    Davante Adams       WR  11.30
8      Corey Davis       WR   8.80
9   T.J. Hockenson       TE   0.00
7       Tyler Bass        K   9.00
4    Buffalo Bills      DEF  12.00​