使用 pandas 绘制特定国家/地区名称随时间变化的计数值

Question

我有一个数据框 df，其中包含有关一家公司的信息、他们所在的国家/地区以及他们成立的年份。我现在需要在数据集中（1995 年至 2015 年之间）在一条线上绘制每个国家/地区每年成立的公司数量的发展情况，但是我设法创建的只是一个饼图，其中包含每个国家/地区资助的公司总数，但是不包括 year_founded 信息。

数据如下所示：

Company	Country	Year_founded
A	USA	1996
B	NLD	2004
C	CAN	2014
D	USA	2000
E	NLD	1999
F	CAN	2000
etc.

理想情况下，我想在每个国家/地区使用不同线条的折线图中绘制每个国家/地区的公司总数。

关于如何解决这个问题有什么想法吗？

Answer 1

IIUC，可以使用crosstab和plot.line:

ax = pd.crosstab(df['Year_founded'], df['Country']).plot.line()
ax.set_ylabel('Number of founded companies')
from matplotlib.ticker import MaxNLocator
ax.xaxis.set_major_locator(MaxNLocator(integer=True))

输出：

交叉表：

Country       CAN  NLD  USA
Year_founded               
1996            0    0    1
1999            0    1    0
2000            1    0    1
2004            0    1    0
2014            1    0    0

Answer 2

您可以使用 groupby 和 reindex，这样 1995-2015 年的所有年份都在您的图表中：

data = df.groupby(["Country", "Year_founded"])["Company"].count().unstack(0).reindex(range(1995,2016)).fillna(0)

>>> data.plot()

>>> data
Country       CAN  NLD  USA
Year_founded               
1995          0.0  0.0  0.0
1996          0.0  0.0  1.0
1997          0.0  0.0  0.0
1998          0.0  0.0  0.0
1999          0.0  1.0  0.0
2000          1.0  0.0  1.0
2001          0.0  0.0  0.0
2002          0.0  0.0  0.0
2003          0.0  0.0  0.0
2004          0.0  1.0  0.0
2005          0.0  0.0  0.0
2006          0.0  0.0  0.0
2007          0.0  0.0  0.0
2008          0.0  0.0  0.0
2009          0.0  0.0  0.0
2010          0.0  0.0  0.0
2011          0.0  0.0  0.0
2012          0.0  0.0  0.0
2013          0.0  0.0  0.0
2014          1.0  0.0  0.0
2015          0.0  0.0  0.0

使用 pandas 绘制特定国家/地区名称随时间变化的计数值

Plotting count values over time for specific country names with pandas

python

linechart

dataframe

pandas