具有过滤条件的 groupby 函数
grouby function with filtered conditions
f = pd.DataFrame({'Movie': ['name1','name2','name3']
'genre': [['comedy', 'action'];['comedy','scifi'];
['thriller','action']]
'distributor': ['disney', 'disney','parmount'})
#如果 genre 中有多个值,现在名称是 genre[0] 和 genre[1] 的一部分,如果我使用 groupby
res = f[f['distributor'] == 'disney'].groupby(['genre'])
期望的输出
只想要迪士尼推出的电影
distributor genre count of movies
disney action 1
disney comedy 2
disney scifi 1
分解你的列表然后计算值:
out = df.loc[df['distributor'] == 'disney', 'genre'].explode().value_counts()
print(out)
# Output
comedy 2
action 1
scifi 1
Name: genre, dtype: int64
更新
out = (df.explode('genre').query("distributor == 'disney'")
.value_counts(['distributor', 'genre'], sort=False)
.rename('count').reset_index())
print(out)
# Output
distributor genre count
0 disney action 1
1 disney comedy 2
2 disney scifi 1
更新 2
您的 genre
列似乎不包含列表,而是包含字符串。在使用上面的代码之前,尝试将此列转换为带有 ast.literal_eval
的列表:
import ast
df['genre'] = df['genre'].str.replace(';', ',').apply(ast.literal_eval)
# OR
df['genre'] = pd.eval(df['genre'].str.replace(';', ','))
# Execute now df.explode(...)...
使用重新构想 pandas 的 API 的 datar
简单明了:
>>> import pandas as pd
>>> df = pd.DataFrame({'Movie': ['name1','name2','name3'],
... 'genre': [['comedy', 'action'], ['comedy','scifi'],
... ['thriller','action']],
... 'distributor': ['disney', 'disney','parmount']})
>>> df
Movie genre distributor
0 name1 [comedy, action] disney
1 name2 [comedy, scifi] disney
2 name3 [thriller, action] paramount
>>>
>>> from datar.all import f, filter, unchop, count
[2022-03-31 11:47:44][datar][WARNING] Builtin name "filter" has been overriden by datar.
>>> (
... df
... >> filter(f.distributor == "disney")
... >> unchop(f.genre)
... >> count(f.distributor, f.genre)
... )
distributor genre n
<object> <object> <int64>
0 disney comedy 2
1 disney action 1
2 disney scifi 1
[TibbleGrouped: distributor (n=1)]
f = pd.DataFrame({'Movie': ['name1','name2','name3']
'genre': [['comedy', 'action'];['comedy','scifi'];
['thriller','action']]
'distributor': ['disney', 'disney','parmount'})
#如果 genre 中有多个值,现在名称是 genre[0] 和 genre[1] 的一部分,如果我使用 groupby
res = f[f['distributor'] == 'disney'].groupby(['genre'])
期望的输出
只想要迪士尼推出的电影
distributor genre count of movies
disney action 1
disney comedy 2
disney scifi 1
分解你的列表然后计算值:
out = df.loc[df['distributor'] == 'disney', 'genre'].explode().value_counts()
print(out)
# Output
comedy 2
action 1
scifi 1
Name: genre, dtype: int64
更新
out = (df.explode('genre').query("distributor == 'disney'")
.value_counts(['distributor', 'genre'], sort=False)
.rename('count').reset_index())
print(out)
# Output
distributor genre count
0 disney action 1
1 disney comedy 2
2 disney scifi 1
更新 2
您的 genre
列似乎不包含列表,而是包含字符串。在使用上面的代码之前,尝试将此列转换为带有 ast.literal_eval
的列表:
import ast
df['genre'] = df['genre'].str.replace(';', ',').apply(ast.literal_eval)
# OR
df['genre'] = pd.eval(df['genre'].str.replace(';', ','))
# Execute now df.explode(...)...
使用重新构想 pandas 的 API 的 datar
简单明了:
>>> import pandas as pd
>>> df = pd.DataFrame({'Movie': ['name1','name2','name3'],
... 'genre': [['comedy', 'action'], ['comedy','scifi'],
... ['thriller','action']],
... 'distributor': ['disney', 'disney','parmount']})
>>> df
Movie genre distributor
0 name1 [comedy, action] disney
1 name2 [comedy, scifi] disney
2 name3 [thriller, action] paramount
>>>
>>> from datar.all import f, filter, unchop, count
[2022-03-31 11:47:44][datar][WARNING] Builtin name "filter" has been overriden by datar.
>>> (
... df
... >> filter(f.distributor == "disney")
... >> unchop(f.genre)
... >> count(f.distributor, f.genre)
... )
distributor genre n
<object> <object> <int64>
0 disney comedy 2
1 disney action 1
2 disney scifi 1
[TibbleGrouped: distributor (n=1)]