pandas：如何绘制pandas中电影数量与IMDB电影类型的饼图？

Question

我有以下数据集：

import pandas as pd
import numpy as np 
%matplotlib inline

df = pd.DataFrame({'movie' : ['A', 'B','C','D'], 
                   'genres': ['Science Fiction|Romance|Family', 'Action|Romance',
                              'Family|Drama','Mystery|Science Fiction|Drama']},
                  index=range(4))
df

我的尝试

# Parse unique genre from all the movies
gen = []
for g in df['genres']:
    gg = g.split('|')
    gen = gen + gg
    gen = list(set(gen))

print(gen)

df['genres'].value_counts().plot(kind='pie')

我得到了这张图片：

但我想为每个不同的流派制作饼图。

我们如何获得每种独特类型的电影数量的类型？

Answer 1

所以，单行解决方案：

df.genres.str.get_dummies().sum().plot.pie(label='Genre', autopct='%1.0f%%')

结果：

TL;DR

首先，将您的类别列转换为虚拟对象：

df = pd.concat([df.drop('genres', axis=1), df.genres.str.get_dummies()], axis=1)

结果：

  movie  a  b  c  d  e  f  g
0     A  1  1  1  0  0  0  0
1     B  0  0  1  0  1  0  0
2     C  0  0  0  0  0  1  1
3     D  1  1  0  1  1  0  0

然后计算每个类别的出现次数：

counts = df.drop('movie', axis=1).sum()

结果：

最后绘制饼图：

counts.plot.pie()

Answer 2

你可以用 expand=True 做 .str.split()，这会给你一个 DataFrame 的所有流派。如果您随后将其叠加，您将获得所有类型的价值计数。

df.genres.str.split('|', expand=True).stack().value_counts().plot(kind='pie', label='Genre')

计算计数可能有点慢，因此同一图的更快实现是（添加百分比）：

from itertools import chain
from collections import Counter
import matplotlib.pyplot as plt

cts = Counter(chain.from_iterable(df.genres.str.split('|').values))
_ = plt.pie(cts.values(), labels=cts.keys(), autopct='%1.0f%%')
_ = plt.ylabel('Genres')

pandas：如何绘制pandas中电影数量与IMDB电影类型的饼图？

pandas: How to plot the pie diagram for the movie counts versus genre of IMDB movies in pandas?

python

plot

imdb

matplotlib

pandas