如何使用 Pandas 找到最频繁和最不频繁的计数?
How can I find the count of the most frequent and least frequent using Pandas?
问题:如何找到最频繁和最不频繁的次数?
我想要的输出是:
cast count
Alan Marriott 100
Jandino Asporaa 78
...
Peter 1
#1 尝试:
df.groupby(by=['cast','show_id']).count()
输出:
cast show_id type title director country date_added release_year rating duration listed_in description
4Minute 80161826 1 1 0 1 1 1 1 1 1 1
50 Cent 70199239 1 1 1 1 1 1 1 1 1 1
A.J LoCascio 80141858 1 1 1 1 1 1 1 1 1 1
#2 尝试:
df.groupby(cast)[show_id].count()
输出:
NameError: name 'cast' is not defined
#3 尝试:
df.groupby(by='cast')
输出:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f2f3894bcd0>
数据集样本:
import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
这应该有效:
df.groupby('cast')['show_id'].count().nlargest()
这将return每个组的计数,按计数降序排列:
cast count
Alan Marriott 100
Jandino Asporaa 78
...
Peter 1
问题:如何找到最频繁和最不频繁的次数?
我想要的输出是:
cast count
Alan Marriott 100
Jandino Asporaa 78
...
Peter 1
#1 尝试:
df.groupby(by=['cast','show_id']).count()
输出:
cast show_id type title director country date_added release_year rating duration listed_in description
4Minute 80161826 1 1 0 1 1 1 1 1 1 1
50 Cent 70199239 1 1 1 1 1 1 1 1 1 1
A.J LoCascio 80141858 1 1 1 1 1 1 1 1 1 1
#2 尝试:
df.groupby(cast)[show_id].count()
输出:
NameError: name 'cast' is not defined
#3 尝试:
df.groupby(by='cast')
输出:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f2f3894bcd0>
数据集样本:
import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
这应该有效:
df.groupby('cast')['show_id'].count().nlargest()
这将return每个组的计数,按计数降序排列:
cast count
Alan Marriott 100
Jandino Asporaa 78
...
Peter 1