如何通过对多个变量进行分组来创建新的 pandas 数据框?
How do I create new pandas dataframe by grouping multiple variables?
我很难整理我的数据。在我花时间尝试解决这个问题的时候,我可以手动创建一个新的 .csv 文件,但我需要通过代码来完成。我有一个 150 年前球员的棒球薪水大数据集。
This is what my dataset looks like.
我想创建一个新的数据框,添加给定球队在给定年份的个人球员薪水,按球队和年份组织。使用以下技术,我想出了这个:team_salaries_groupby_team = salaries.groupby(['teamID','yearID']).agg({'salary' : ['sum']})
,输出:my output。在屏幕上它看起来有点像我想要的,但我想要一个包含三列的数据框(加上左侧的索引)。我真的不能对这个输出做我想做的那种分析。
最后,我也试过这个方法:new_column = salaries['teamID'] + salaries['yearID'].astype(str) salaries['teamyear'] = new_column salaries teamyear = salaries.groupby(['teamyear']).agg({'salary' : ['sum']}) print(teamyear)
。 Another output 它添加了给定年份每支球队的个人球员薪水,但现在我不知道如何将年份分开并将其放入自己的列中。请帮忙?
你只需要reset_index()
这里是示例代码:
salaries = pd.DataFrame(columns=['yearID','teamID','igID','playerID','salary'])
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'C','salary':5000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'C','salary':50000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
之后,groupby
和reset_index
sample_df = salaries.groupby(['teamID', 'yearID']).salary.sum().reset_index()
这是您要找的吗?
我很难整理我的数据。在我花时间尝试解决这个问题的时候,我可以手动创建一个新的 .csv 文件,但我需要通过代码来完成。我有一个 150 年前球员的棒球薪水大数据集。 This is what my dataset looks like.
我想创建一个新的数据框,添加给定球队在给定年份的个人球员薪水,按球队和年份组织。使用以下技术,我想出了这个:team_salaries_groupby_team = salaries.groupby(['teamID','yearID']).agg({'salary' : ['sum']})
,输出:my output。在屏幕上它看起来有点像我想要的,但我想要一个包含三列的数据框(加上左侧的索引)。我真的不能对这个输出做我想做的那种分析。
最后,我也试过这个方法:new_column = salaries['teamID'] + salaries['yearID'].astype(str) salaries['teamyear'] = new_column salaries teamyear = salaries.groupby(['teamyear']).agg({'salary' : ['sum']}) print(teamyear)
。 Another output 它添加了给定年份每支球队的个人球员薪水,但现在我不知道如何将年份分开并将其放入自己的列中。请帮忙?
你只需要reset_index()
这里是示例代码:
salaries = pd.DataFrame(columns=['yearID','teamID','igID','playerID','salary'])
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'C','salary':5000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'C','salary':50000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
之后,groupby
和reset_index
sample_df = salaries.groupby(['teamID', 'yearID']).salary.sum().reset_index()
这是您要找的吗?