pandas:如何将行分组到新列中
pandas: how to group rows into a new column
我有一个简单的数据集,形式如下:
author_id,Publisher,Title
1,Archie Publications,Archie
1,Marvel,A-Team
1,NOW,The Green Hornet
2,Archie Publications,Betty & Veronica
2,Marvel,Absolute Carnage
2,NOW,Little Monsters
2,NOW,The Green Hornet
2,NOW,Kata
3,Archie Publications,Archie & Jughead
3,Marvel,Absolute Carnage
3,NOW,Fright Night
4,Archie Publications,Archie
4,Archie Publications,Jughead
4,Marvel,A+X
5,Marvel,A-Next
5,NOW,The Green Hornet
5,NOW,Little Monsters
我可以使用 df3=pandas.read_csv("comics.csv",index_col=['author_id','Publisher'])
成功读取它,这给了我(预期的)数据帧:
我想要的是将行“组合”到一个新列中,将其命名为 titles
,以便生成的数据帧采用以下形式:
author_id publisher titles
1 Archie Publications [Archie]
1 Marvel [A-Team]
1 NOW [The Green Hornet]
2 Archie Publications [Betty & Veronica]
2 Marvel [Absolute Carnage]
3 NOW [Little Monsters, The Green Hornet, Kata]
...
4 Archie Publications [Archie, Jughead]
...
例如对于作者 #2,他们现在制作了 3 个标题,所以我想要在新列“标题”中列出这些标题。
我不知道如何进行这种转换..有什么建议吗?
我想这就是你想要的
df.groupby(['author_id', 'Publisher']).agg({'Title': list})
Title
author_id Publisher
1 Archie Publications [Archie]
Marvel [A-Team]
NOW [The Green Hornet]
2 Archie Publications [Betty & Veronica]
Marvel [Absolute Carnage]
NOW [Little Monsters, The Green Hornet, Kata]
3 Archie Publications [Archie & Jughead]
Marvel [Absolute Carnage]
NOW [Fright Night]
4 Archie Publications [Archie, Jughead]
Marvel [A+X]
5 Marvel [A-Next]
NOW [The Green Hornet, Little Monsters]
我有一个简单的数据集,形式如下:
author_id,Publisher,Title
1,Archie Publications,Archie
1,Marvel,A-Team
1,NOW,The Green Hornet
2,Archie Publications,Betty & Veronica
2,Marvel,Absolute Carnage
2,NOW,Little Monsters
2,NOW,The Green Hornet
2,NOW,Kata
3,Archie Publications,Archie & Jughead
3,Marvel,Absolute Carnage
3,NOW,Fright Night
4,Archie Publications,Archie
4,Archie Publications,Jughead
4,Marvel,A+X
5,Marvel,A-Next
5,NOW,The Green Hornet
5,NOW,Little Monsters
我可以使用 df3=pandas.read_csv("comics.csv",index_col=['author_id','Publisher'])
成功读取它,这给了我(预期的)数据帧:
我想要的是将行“组合”到一个新列中,将其命名为 titles
,以便生成的数据帧采用以下形式:
author_id publisher titles
1 Archie Publications [Archie]
1 Marvel [A-Team]
1 NOW [The Green Hornet]
2 Archie Publications [Betty & Veronica]
2 Marvel [Absolute Carnage]
3 NOW [Little Monsters, The Green Hornet, Kata]
...
4 Archie Publications [Archie, Jughead]
...
例如对于作者 #2,他们现在制作了 3 个标题,所以我想要在新列“标题”中列出这些标题。
我不知道如何进行这种转换..有什么建议吗?
我想这就是你想要的
df.groupby(['author_id', 'Publisher']).agg({'Title': list})
Title
author_id Publisher
1 Archie Publications [Archie]
Marvel [A-Team]
NOW [The Green Hornet]
2 Archie Publications [Betty & Veronica]
Marvel [Absolute Carnage]
NOW [Little Monsters, The Green Hornet, Kata]
3 Archie Publications [Archie & Jughead]
Marvel [Absolute Carnage]
NOW [Fright Night]
4 Archie Publications [Archie, Jughead]
Marvel [A+X]
5 Marvel [A-Next]
NOW [The Green Hornet, Little Monsters]