在数据帧上使用 for 循环绘制直方图时出现 KeyError
KeyError when using for loop on dataframe to plot histograms
我的数据框类似于:
df = pd.DataFrame({'Date': ['2016-01-05', '2016-01-05', '2016-01-05', '2016-01-05', '2016-01-08', '2016-01-08', '2016-02-01'], 'Count': [1, 2, 2, 3, 2, 0, 2]})
我正在尝试为每个唯一 Date
绘制 Count
的直方图
我试过:
for date in df.Date.unique():
plt.hist([df[df.Date == '%s' %(date)]['Count']])
plt.title('%s' %(date))
这导致
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-17-971a1cf07250> in <module>()
1 for date in df.Date.unique():
----> 2 plt.hist([df[df.Date == '%s' %(date)]['Count']])
3 plt.title('%s' %(date))
c:~\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, data, **kwargs)
2963 histtype=histtype, align=align, orientation=orientation,
2964 rwidth=rwidth, log=log, color=color, label=label,
-> 2965 stacked=stacked, data=data, **kwargs)
2966 finally:
2967 ax.hold(washold)
c:~\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1816 warnings.warn(msg % (label_namer, func.__name__),
1817 RuntimeWarning, stacklevel=2)
-> 1818 return func(ax, *args, **kwargs)
1819 pre_doc = inner.__doc__
1820 if pre_doc is None:
c:~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5925
5926 # basic input validation
-> 5927 flat = np.ravel(x)
5928
5929 input_empty = len(flat) == 0
c:~\anaconda3\lib\site-packages\numpy\core\fromnumeric.py in ravel(a, order)
1482 return asarray(a).ravel(order=order)
1483 else:
-> 1484 return asanyarray(a).ravel(order=order)
1485
1486
c:~\anaconda3\lib\site-packages\numpy\core\numeric.py in asanyarray(a, dtype, order)
581
582 """
--> 583 return array(a, dtype, copy=False, order=order, subok=True)
584
585
c:~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
581 key = com._apply_if_callable(key, self)
582 try:
--> 583 result = self.index.get_value(self, key)
584
585 if not lib.isscalar(result):
c:~\anaconda3\lib\site-packages\pandas\indexes\base.py in get_value(self, series, key)
1978 try:
1979 return self._engine.get_value(s, k,
-> 1980 tz=getattr(series.dtype, 'tz', None))
1981 except KeyError as e1:
1982 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3332)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3035)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6610)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6554)()
KeyError: 0
但是当我尝试简单打印时,没有问题:
for date in df.Date.unique():
print([df[df.Date == '%s' %(date)]['Count']])
[0 1
1 2
2 2
3 3
Name: Count, dtype: int64]
[4 2
5 0
Name: Count, dtype: int64]
[6 2
Name: Count, dtype: int64]
按照我这里的方式在我的数据帧上调用 plt.hist
有什么问题?
您正在传递一个数据帧列表,这会导致此处出现问题。您可以解构一个 groupby
对象并分别绘制每个对象。
gps = df.groupby('Date').Count
_, axes = plt.subplots(nrows=gps.ngroups)
for (_, g), ax in zip(df.groupby('Date').Count, axes):
g.plot.hist(ax=ax)
plt.show()
如果您的图表需要更多糖分,请查看可视化文档。
基本上你的代码中有太多的两个方括号。
plt.hist([series]) # <- wrong
plt.hist(series) # <- correct
在第一种情况下,matplotlib 会尝试绘制一个非数字元素列表的直方图。那不行。
相反,移除括号并直接提供系列,效果很好
for date in df.Date.unique():
plt.hist(df[df.Date == '%s' %(date)]['Count'])
plt.title('%s' %(date))
现在这将在同一图中创建所有直方图。不确定这是否需要。如果没有,请考虑非常短的替代方案:
df.hist(by="Date")
我的数据框类似于:
df = pd.DataFrame({'Date': ['2016-01-05', '2016-01-05', '2016-01-05', '2016-01-05', '2016-01-08', '2016-01-08', '2016-02-01'], 'Count': [1, 2, 2, 3, 2, 0, 2]})
我正在尝试为每个唯一 Date
Count
的直方图
我试过:
for date in df.Date.unique():
plt.hist([df[df.Date == '%s' %(date)]['Count']])
plt.title('%s' %(date))
这导致
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-17-971a1cf07250> in <module>()
1 for date in df.Date.unique():
----> 2 plt.hist([df[df.Date == '%s' %(date)]['Count']])
3 plt.title('%s' %(date))
c:~\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, data, **kwargs)
2963 histtype=histtype, align=align, orientation=orientation,
2964 rwidth=rwidth, log=log, color=color, label=label,
-> 2965 stacked=stacked, data=data, **kwargs)
2966 finally:
2967 ax.hold(washold)
c:~\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1816 warnings.warn(msg % (label_namer, func.__name__),
1817 RuntimeWarning, stacklevel=2)
-> 1818 return func(ax, *args, **kwargs)
1819 pre_doc = inner.__doc__
1820 if pre_doc is None:
c:~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5925
5926 # basic input validation
-> 5927 flat = np.ravel(x)
5928
5929 input_empty = len(flat) == 0
c:~\anaconda3\lib\site-packages\numpy\core\fromnumeric.py in ravel(a, order)
1482 return asarray(a).ravel(order=order)
1483 else:
-> 1484 return asanyarray(a).ravel(order=order)
1485
1486
c:~\anaconda3\lib\site-packages\numpy\core\numeric.py in asanyarray(a, dtype, order)
581
582 """
--> 583 return array(a, dtype, copy=False, order=order, subok=True)
584
585
c:~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
581 key = com._apply_if_callable(key, self)
582 try:
--> 583 result = self.index.get_value(self, key)
584
585 if not lib.isscalar(result):
c:~\anaconda3\lib\site-packages\pandas\indexes\base.py in get_value(self, series, key)
1978 try:
1979 return self._engine.get_value(s, k,
-> 1980 tz=getattr(series.dtype, 'tz', None))
1981 except KeyError as e1:
1982 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3332)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3035)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6610)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6554)()
KeyError: 0
但是当我尝试简单打印时,没有问题:
for date in df.Date.unique():
print([df[df.Date == '%s' %(date)]['Count']])
[0 1
1 2
2 2
3 3
Name: Count, dtype: int64]
[4 2
5 0
Name: Count, dtype: int64]
[6 2
Name: Count, dtype: int64]
按照我这里的方式在我的数据帧上调用 plt.hist
有什么问题?
您正在传递一个数据帧列表,这会导致此处出现问题。您可以解构一个 groupby
对象并分别绘制每个对象。
gps = df.groupby('Date').Count
_, axes = plt.subplots(nrows=gps.ngroups)
for (_, g), ax in zip(df.groupby('Date').Count, axes):
g.plot.hist(ax=ax)
plt.show()
如果您的图表需要更多糖分,请查看可视化文档。
基本上你的代码中有太多的两个方括号。
plt.hist([series]) # <- wrong
plt.hist(series) # <- correct
在第一种情况下,matplotlib 会尝试绘制一个非数字元素列表的直方图。那不行。
相反,移除括号并直接提供系列,效果很好
for date in df.Date.unique():
plt.hist(df[df.Date == '%s' %(date)]['Count'])
plt.title('%s' %(date))
现在这将在同一图中创建所有直方图。不确定这是否需要。如果没有,请考虑非常短的替代方案:
df.hist(by="Date")