如何在python/pandas中将元素放入多个类别中?
How to put element into multiple categories in python/pandas?
例如,
input.csv如下:
Song Name
Genre
7 Rings
'dance pop', 'pop', 'post-teen pop'
Run
'dance pop', 'piano rock', 'pop', 'pop rock'
Dance Monkey
'australian pop', 'pop'
All Of Me
'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary'
我想以一种可以得到如下内容的方式对其进行分组:
pop: ['7 Rings', 'Run', 'Dance Monkey', 'All Of Me']
dance pop : ['7 Rings','Run']
r&b: ['All Of Me']
甚至把它放到另一个 table/dataframe/csv 中,例如:
pop
dance pop
r&b
neo soul
pop rock
7 Rings
7 Rings
All Of Me
All Of Me
Run
Run
Run
Dance Monkey
All Of Me
有办法吗?
编辑:
尝试了 mozway 的建议,我得到了一个 table,它看起来像这样:
genreExplode=df.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head()
Genre
'dance pop', 'pop', 'post-teen pop'
'dance pop', 'piano rock', 'pop', 'pop rock'
'australian pop', 'pop'
'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary'
index
0
7 Rings
Run
Dance Monkey
All Of Me
编辑 2:
发现问题了,Genre 列中的对象看起来像列表,但实际上是字符串。
genrelist=df['Genre'].tolist() ##first make a list of the Genre column
genrelist_new=[] ## new list to hold lists
import ast ## found this online
for x in genrelist:
x=ast.literal_eval(x) ##this loop takes the string objects that look like list in genrelist and converts them into list
genrelist_new.append(x) ##then add the converted list and put into a list
df['Genre']=genrelist_new ##replace old Genre column of strings that look like lists to new column of real lists
genreExplode=spotData.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head() ## this result is what I was looking for!
解决方法,将字符串转换成真正的列表,这样Genre列就是列表的列表。
那么@mozway 的代码就可以完美运行了。
假设“流派”包含列表(例如、['dance pop', 'pop', 'post-teen pop']
)。您可以 explode
和 pivot
:
(df.explode('Genre')
.assign(index=lambda d: d.groupby('Genre').cumcount())
.pivot(index='index', columns='Genre', values='Song')
.fillna('')
)
输出:
Genre australian pop dance pop neo soul piano rock pop pop rock pop soul post-teen pop r&b urban contemporary
index
0 Dance Monkey 7 Rings All Of Me Run 7 Rings Run All Of Me 7 Rings All Of Me All Of Me
1 Run Run
2 Dance Monkey
3 All Of Me
例如,
input.csv如下:
Song Name | Genre |
---|---|
7 Rings | 'dance pop', 'pop', 'post-teen pop' |
Run | 'dance pop', 'piano rock', 'pop', 'pop rock' |
Dance Monkey | 'australian pop', 'pop' |
All Of Me | 'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary' |
我想以一种可以得到如下内容的方式对其进行分组:
pop: ['7 Rings', 'Run', 'Dance Monkey', 'All Of Me']
dance pop : ['7 Rings','Run']
r&b: ['All Of Me']
甚至把它放到另一个 table/dataframe/csv 中,例如:
pop | dance pop | r&b | neo soul | pop rock |
---|---|---|---|---|
7 Rings | 7 Rings | All Of Me | All Of Me | Run |
Run | Run | |||
Dance Monkey | ||||
All Of Me |
有办法吗?
编辑:
尝试了 mozway 的建议,我得到了一个 table,它看起来像这样:
genreExplode=df.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head()
Genre | 'dance pop', 'pop', 'post-teen pop' | 'dance pop', 'piano rock', 'pop', 'pop rock' | 'australian pop', 'pop' | 'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary' |
---|---|---|---|---|
index | ||||
0 | 7 Rings | Run | Dance Monkey | All Of Me |
编辑 2:
发现问题了,Genre 列中的对象看起来像列表,但实际上是字符串。
genrelist=df['Genre'].tolist() ##first make a list of the Genre column
genrelist_new=[] ## new list to hold lists
import ast ## found this online
for x in genrelist:
x=ast.literal_eval(x) ##this loop takes the string objects that look like list in genrelist and converts them into list
genrelist_new.append(x) ##then add the converted list and put into a list
df['Genre']=genrelist_new ##replace old Genre column of strings that look like lists to new column of real lists
genreExplode=spotData.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head() ## this result is what I was looking for!
解决方法,将字符串转换成真正的列表,这样Genre列就是列表的列表。
那么@mozway 的代码就可以完美运行了。
假设“流派”包含列表(例如、['dance pop', 'pop', 'post-teen pop']
)。您可以 explode
和 pivot
:
(df.explode('Genre')
.assign(index=lambda d: d.groupby('Genre').cumcount())
.pivot(index='index', columns='Genre', values='Song')
.fillna('')
)
输出:
Genre australian pop dance pop neo soul piano rock pop pop rock pop soul post-teen pop r&b urban contemporary
index
0 Dance Monkey 7 Rings All Of Me Run 7 Rings Run All Of Me 7 Rings All Of Me All Of Me
1 Run Run
2 Dance Monkey
3 All Of Me