在 Pandas 中制作堆叠条形图时出现关键错误
Key Error while making Stacked Bar Graph in Pandas
我正在尝试制作带有散景的堆叠条形图。我一直收到 KeyError: '1'
但不知道为什么。我的 pivot_table 看起来像这样:
pivot_table.head(3)
Out[23]:
Month 1 2 3 4 5 6 7 8 9 10 11 12
CompanyName
Company1 11 3 2 3 5 7 3 6 8 3 5 8
Company2 3 1 2 18 3 4 5 4 5 5 3 2
Company3 2 6 1 3 2 0 5 6 4 8 4 7
这是我的代码:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Bar, output_file, show
import datetime as datetime
df = pd.read_csv('MYDATA.csv', usecols=[1, 16, 18]) #One is CompanyName, 16 is recvd_dttm, 18 is machinetype
# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - datetime.timedelta(days=365)
# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
df = df.set_index('recvd_dttm')
df.index = pd.to_datetime(df.index, format='%m/%d/%Y %H:%M')
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']
pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)
s = pivot_table.sum().sort(ascending=False,inplace=False)
pivot_table = pivot_table.ix[:,s.index[:40]]
pivot_table = pivot_table.transpose()
pivot_table = pivot_table.reset_index()
pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']]
Companies = list(pivot_table['CompanyName'])
months = ["1","2","3","4","5","6","7","8","9","10","11","12"]
pivot_table = pivot_table.set_index('CompanyName')
pivot_table.to_csv('pivot_table.csv')
# get the months
Jan = pivot_table['1'].astype(float).values
Feb = pivot_table['2'].astype(float).values
Mar = pivot_table['3'].astype(float).values
Apr = pivot_table['4'].astype(float).values
May = pivot_table['5'].astype(float).values
Jun = pivot_table['6'].astype(float).values
Jul = pivot_table['7'].astype(float).values
Aug = pivot_table['8'].astype(float).values
Sep = pivot_table['9'].astype(float).values
Oct = pivot_table['10'].astype(float).values
Nov = pivot_table['11'].astype(float).values
Dec = pivot_table['12'].astype(float).values
# build a dict containing the grouped data
months = OrderedDict(Jan=Jan, Feb=Feb, Mar=Mar, Apr=Apr, May=May,Jun=Jun,Jul=Jul,Aug=Aug,Sep=Sep,Oct=Oct,Nov=Nov,Dec=Dec)
# any of the following commented are also alid Bar inputs
#medals = pd.DataFrame(medals)
#medals = list(medals.values())
output_file("stacked_bar.html")
bar = Bar(months, Companies, title="Stacked bars", stacked=True)
show(bar)
我可以在 matplotlib 中做得很好,但我喜欢散景中的 hovertool 功能。
如果我做了 import matplotlib.pyplot as plt
并添加了这些行,我会得到一个堆积条形图。
plot = pivot_table.plot(kind='bar',stacked=True)
show(plot)
我认为关键错误来自于我获取 OrderedDict 的月份?我不知道如何解决这个问题。基本上我试图摆脱这个例子:http://docs.bokeh.org/en/latest/docs/gallery/stacked_bar_chart.html
似乎如果我使用 Jan = pivot_table[1].astype(float).values
而不是 Jan = pivot_table['1'].astype(float).values
,就可以了
我正在尝试制作带有散景的堆叠条形图。我一直收到 KeyError: '1'
但不知道为什么。我的 pivot_table 看起来像这样:
pivot_table.head(3)
Out[23]:
Month 1 2 3 4 5 6 7 8 9 10 11 12
CompanyName
Company1 11 3 2 3 5 7 3 6 8 3 5 8
Company2 3 1 2 18 3 4 5 4 5 5 3 2
Company3 2 6 1 3 2 0 5 6 4 8 4 7
这是我的代码:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Bar, output_file, show
import datetime as datetime
df = pd.read_csv('MYDATA.csv', usecols=[1, 16, 18]) #One is CompanyName, 16 is recvd_dttm, 18 is machinetype
# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - datetime.timedelta(days=365)
# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
df = df.set_index('recvd_dttm')
df.index = pd.to_datetime(df.index, format='%m/%d/%Y %H:%M')
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']
pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)
s = pivot_table.sum().sort(ascending=False,inplace=False)
pivot_table = pivot_table.ix[:,s.index[:40]]
pivot_table = pivot_table.transpose()
pivot_table = pivot_table.reset_index()
pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']]
Companies = list(pivot_table['CompanyName'])
months = ["1","2","3","4","5","6","7","8","9","10","11","12"]
pivot_table = pivot_table.set_index('CompanyName')
pivot_table.to_csv('pivot_table.csv')
# get the months
Jan = pivot_table['1'].astype(float).values
Feb = pivot_table['2'].astype(float).values
Mar = pivot_table['3'].astype(float).values
Apr = pivot_table['4'].astype(float).values
May = pivot_table['5'].astype(float).values
Jun = pivot_table['6'].astype(float).values
Jul = pivot_table['7'].astype(float).values
Aug = pivot_table['8'].astype(float).values
Sep = pivot_table['9'].astype(float).values
Oct = pivot_table['10'].astype(float).values
Nov = pivot_table['11'].astype(float).values
Dec = pivot_table['12'].astype(float).values
# build a dict containing the grouped data
months = OrderedDict(Jan=Jan, Feb=Feb, Mar=Mar, Apr=Apr, May=May,Jun=Jun,Jul=Jul,Aug=Aug,Sep=Sep,Oct=Oct,Nov=Nov,Dec=Dec)
# any of the following commented are also alid Bar inputs
#medals = pd.DataFrame(medals)
#medals = list(medals.values())
output_file("stacked_bar.html")
bar = Bar(months, Companies, title="Stacked bars", stacked=True)
show(bar)
我可以在 matplotlib 中做得很好,但我喜欢散景中的 hovertool 功能。
如果我做了 import matplotlib.pyplot as plt
并添加了这些行,我会得到一个堆积条形图。
plot = pivot_table.plot(kind='bar',stacked=True)
show(plot)
我认为关键错误来自于我获取 OrderedDict 的月份?我不知道如何解决这个问题。基本上我试图摆脱这个例子:http://docs.bokeh.org/en/latest/docs/gallery/stacked_bar_chart.html
似乎如果我使用 Jan = pivot_table[1].astype(float).values
而不是 Jan = pivot_table['1'].astype(float).values
,就可以了