如何通过对多个变量进行分组来创建新的 pandas 数据框?

How do I create new pandas dataframe by grouping multiple variables?

我很难整理我的数据。在我花时间尝试解决这个问题的时候,我可以手动创建一个新的 .csv 文件,但我需要通过代码来完成。我有一个 150 年前球员的棒球薪水大数据集。 This is what my dataset looks like.

我想创建一个新的数据框,添加给定球队在给定年份的个人球员薪水,按球队和年份组织。使用以下技术,我想出了这个:team_salaries_groupby_team = salaries.groupby(['teamID','yearID']).agg({'salary' : ['sum']}),输出:my output。在屏幕上它看起来有点像我想要的,但我想要一个包含三列的数据框(加上左侧的索引)。我真的不能对这个输出做我想做的那种分析。

最后,我也试过这个方法:new_column = salaries['teamID'] + salaries['yearID'].astype(str) salaries['teamyear'] = new_column salaries teamyear = salaries.groupby(['teamyear']).agg({'salary' : ['sum']}) print(teamyear)。 Another output 它添加了给定年份每支球队的个人球员薪水,但现在我不知道如何将年份分开并将其放入自己的列中。请帮忙?

你只需要reset_index()

这里是示例代码:

salaries = pd.DataFrame(columns=['yearID','teamID','igID','playerID','salary'])

salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'C','salary':5000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)

salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'C','salary':50000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)

之后,groupbyreset_index

sample_df = salaries.groupby(['teamID', 'yearID']).salary.sum().reset_index() 

这是您要找的吗?