根据组值依次从组中提取前 n 个
Sequentially extract top n from group depending on group value
我正在计算一些足球统计数据。
我有以下数据框:
{'Player': {8: 'Darrel Williams', 2: 'Mark Ingram', 3: 'Michael Carter', 4: 'Najee Harris', 10: 'James Conner', 0: 'Buffalo Bills', 15: 'Davante Adams', 1: 'Aaron Rodgers', 5: 'Tyler Bass', 11: 'Corey Davis', 6: 'Van Jefferson', 14: 'Matt Ryan', 7: 'T.J. Hockenson', 9: 'Antonio Brown', 12: 'Alvin Kamara', 13: 'Tyler Boyd'}, 'Position': {8: 'RB', 2: 'RB', 3: 'RB', 4: 'RB', 10: 'RB', 0: 'DEF', 15: 'WR', 1: 'QB', 5: 'K', 11: 'WR', 6: 'WR', 14: 'QB', 7: 'TE', 9: 'WR', 12: 'RB', 13: 'WR'}, 'Score': {8: 24.9, 2: 18.8, 3: 16.2, 4: 15.3, 10: 13.9, 0: 12.0, 15: 11.3, 1: 10.48, 5: 9.0, 11: 8.8, 6: 6.9, 14: 1.68, 7: 0.0, 9: 0.0, 12: 0.0, 13: 0.0}}
Player
Position
Score
Darrel Williams
RB
24.9
Mark Ingram
RB
18.8
Michael Carter
RB
16.2
Najee Harris
RB
15.3
James Conner
RB
13.9
Buffalo Bills
DEF
12
Davante Adams
WR
11.3
Aaron Rodgers
QB
10.48
Tyler Bass
K
9
Corey Davis
WR
8.8
Van Jefferson
WR
6.9
Matt Ryan
QB
1.68
T.J. Hockenson
TE
0
Antonio Brown
WR
0
Alvin Kamara
RB
0
Tyler Boyd
WR
0
鉴于以下 requirements_dictionary
,我要做的是为每个 key
(Position
在数据框中):
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
具有挑战性的是,对于最终的 key
、FLEX
,它与 dataframe 中的任何位置匹配,因为该值 可能是一个位置:RB, WR,
或TE
.
最终输出应如下所示:
Player
Position
Score
Darrel Williams
RB
24.9
Mark Ingram
RB
18.8
Michael Carter
RB
16.2
Najee Harris
RB
15.3
Buffalo Bills
DEF
12
Davante Adams
WR
11.3
Aaron Rodgers
QB
10.48
Tyler Bass
K
9
Corey Davis
WR
8.8
T.J. Hockenson
TE
0
既然是顶2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF
又是2 FLEX
.
我尝试了以下让我接近的代码:
all_points.groupby('Position')['Score'].nlargest(2)
Position
DEF 0 12.00
K 5 9.00
QB 1 10.48
14 1.68
RB 8 24.90
2 18.80
TE 7 0.00
WR 15 11.30
11 8.80
Name: Score, dtype: float64
但是,这并没有说明FLEX
“位置”
我也可以循环遍历数据帧并手动执行此操作,但这似乎非常密集。
怎样才能达到预期的效果?
创建一个自定义函数,根据您的要求为每个组创建 select 玩家数量,并将此索引保持为 idx_best
。然后排除所有已经 selected 的玩家和 select FLEX 其他玩家作为 idx_flex
。最后提取这两个索引的并集。
FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])
idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index
out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)
输出:
>>> out
Player Position Score
8 Darrel Williams RB 24.90
2 Mark Ingram RB 18.80
3 Michael Carter RB 16.20
4 Najee Harris RB 15.30
0 Buffalo Bills DEF 12.00
15 Davante Adams WR 11.30
1 Aaron Rodgers QB 10.48
5 Tyler Bass K 9.00
11 Corey Davis WR 8.80
7 T.J. Hockenson TE 0.00
使用需求字典获取等于某个位置的行,然后按分数排序并获得等于该位置的字典值的头部。 Flex 在 RB、WR、TE 中排名前 2。我连接弹性结果。我的解决方案更直观、更合乎逻辑
txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""
df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)
position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)
输出
Player Position Score
6 Aaron Rodgers QB 10.48
0 Darrel Williams RB 24.90
1 Mark Ingram RB 18.80
5 Davante Adams WR 11.30
8 Corey Davis WR 8.80
9 T.J. Hockenson TE 0.00
7 Tyler Bass K 9.00
4 Buffalo Bills DEF 12.00
我正在计算一些足球统计数据。
我有以下数据框:
{'Player': {8: 'Darrel Williams', 2: 'Mark Ingram', 3: 'Michael Carter', 4: 'Najee Harris', 10: 'James Conner', 0: 'Buffalo Bills', 15: 'Davante Adams', 1: 'Aaron Rodgers', 5: 'Tyler Bass', 11: 'Corey Davis', 6: 'Van Jefferson', 14: 'Matt Ryan', 7: 'T.J. Hockenson', 9: 'Antonio Brown', 12: 'Alvin Kamara', 13: 'Tyler Boyd'}, 'Position': {8: 'RB', 2: 'RB', 3: 'RB', 4: 'RB', 10: 'RB', 0: 'DEF', 15: 'WR', 1: 'QB', 5: 'K', 11: 'WR', 6: 'WR', 14: 'QB', 7: 'TE', 9: 'WR', 12: 'RB', 13: 'WR'}, 'Score': {8: 24.9, 2: 18.8, 3: 16.2, 4: 15.3, 10: 13.9, 0: 12.0, 15: 11.3, 1: 10.48, 5: 9.0, 11: 8.8, 6: 6.9, 14: 1.68, 7: 0.0, 9: 0.0, 12: 0.0, 13: 0.0}}
Player | Position | Score |
---|---|---|
Darrel Williams | RB | 24.9 |
Mark Ingram | RB | 18.8 |
Michael Carter | RB | 16.2 |
Najee Harris | RB | 15.3 |
James Conner | RB | 13.9 |
Buffalo Bills | DEF | 12 |
Davante Adams | WR | 11.3 |
Aaron Rodgers | QB | 10.48 |
Tyler Bass | K | 9 |
Corey Davis | WR | 8.8 |
Van Jefferson | WR | 6.9 |
Matt Ryan | QB | 1.68 |
T.J. Hockenson | TE | 0 |
Antonio Brown | WR | 0 |
Alvin Kamara | RB | 0 |
Tyler Boyd | WR | 0 |
鉴于以下 requirements_dictionary
,我要做的是为每个 key
(Position
在数据框中):
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
具有挑战性的是,对于最终的 key
、FLEX
,它与 dataframe 中的任何位置匹配,因为该值 可能是一个位置:RB, WR,
或TE
.
最终输出应如下所示:
Player | Position | Score |
---|---|---|
Darrel Williams | RB | 24.9 |
Mark Ingram | RB | 18.8 |
Michael Carter | RB | 16.2 |
Najee Harris | RB | 15.3 |
Buffalo Bills | DEF | 12 |
Davante Adams | WR | 11.3 |
Aaron Rodgers | QB | 10.48 |
Tyler Bass | K | 9 |
Corey Davis | WR | 8.8 |
T.J. Hockenson | TE | 0 |
既然是顶2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF
又是2 FLEX
.
我尝试了以下让我接近的代码:
all_points.groupby('Position')['Score'].nlargest(2)
Position
DEF 0 12.00
K 5 9.00
QB 1 10.48
14 1.68
RB 8 24.90
2 18.80
TE 7 0.00
WR 15 11.30
11 8.80
Name: Score, dtype: float64
但是,这并没有说明FLEX
“位置”
我也可以循环遍历数据帧并手动执行此操作,但这似乎非常密集。
怎样才能达到预期的效果?
创建一个自定义函数,根据您的要求为每个组创建 select 玩家数量,并将此索引保持为 idx_best
。然后排除所有已经 selected 的玩家和 select FLEX 其他玩家作为 idx_flex
。最后提取这两个索引的并集。
FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])
idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index
out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)
输出:
>>> out
Player Position Score
8 Darrel Williams RB 24.90
2 Mark Ingram RB 18.80
3 Michael Carter RB 16.20
4 Najee Harris RB 15.30
0 Buffalo Bills DEF 12.00
15 Davante Adams WR 11.30
1 Aaron Rodgers QB 10.48
5 Tyler Bass K 9.00
11 Corey Davis WR 8.80
7 T.J. Hockenson TE 0.00
使用需求字典获取等于某个位置的行,然后按分数排序并获得等于该位置的字典值的头部。 Flex 在 RB、WR、TE 中排名前 2。我连接弹性结果。我的解决方案更直观、更合乎逻辑
txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""
df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)
position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)
输出
Player Position Score
6 Aaron Rodgers QB 10.48
0 Darrel Williams RB 24.90
1 Mark Ingram RB 18.80
5 Davante Adams WR 11.30
8 Corey Davis WR 8.80
9 T.J. Hockenson TE 0.00
7 Tyler Bass K 9.00
4 Buffalo Bills DEF 12.00