Pandas/pyplot 直方图：可以绘制 df 但不能绘制子集

Question

df 是一个巨大的数据框。我只需要 Zcoord > 1.

的子集

df = pandas.DataFrame(first)
df.columns = ['Xcoord', 'Ycoord', 'Zcoord', 'Angle']
df0 = df[df.Zcoord>1]

绘制 df 直方图的完全相同的代码 不适用于 df0。

plot1 = plt.figure(1)
plt.hist(df0.Zcoord, bins=100, normed=False)
plt.show()

Ipython吐出KeyError:0。

python 2.7.9 蟒蛇，ipython 2.2.0，OS 10.9.4

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-42-71643df3888f> in <module>()
      1 plot1 = plt.figure(1)
----> 2 plt.hist(df0.Zcoord, bins=100, normed=False)
      3 
      4 plt.show()
      5 from matplotlib.backends.backend_pdf import PdfPages

/Users/Kit/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
   2888                       histtype=histtype, align=align, orientation=orientation,
   2889                       rwidth=rwidth, log=log, color=color, label=label,
-> 2890                       stacked=stacked, **kwargs)
   2891         draw_if_interactive()
   2892     finally:

/Users/Kit/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5560         # Massage 'x' for processing.
   5561         # NOTE: Be sure any changes here is also done below to 'weights'
-> 5562         if isinstance(x, np.ndarray) or not iterable(x[0]):
   5563             # TODO: support masked arrays;
   5564             x = np.asarray(x)

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             result = self.index.get_value(self, key)
    485 
    486             if not np.isscalar(result):

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1194 
   1195         try:
-> 1196             return self._engine.get_value(s, k)
   1197         except KeyError as e1:
   1198             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2993)()

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2808)()

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3440)()

KeyError: 0

Answer 1

您正在将 pandas.Series 传递给 matplotlib (df0.Zcoord)。然而，目前，matplotlib 对于是否喜欢被输入 pandas 数据类型（而不是 numpy ndarray 的数据类型）有点优柔寡断。

在 matplotlib 源代码的某个时刻，直方图函数可能正在尝试获取 "first item I've been asked to deal with"，并且它可能通过调用 input[0] 来实现，其中 input 是它被要求咀嚼的任何东西。如果 input 是 numpy.ndarray 那么一切正常。但是，如果 input 是 pandas.Series 或（甚至更糟）pandas.DataFrame，表达式 input[0] 将具有非常不同的含义。在这种情况下，根据您提供给 plt.hist 的数据结构，在尝试对您的输入进行索引时很可能会出现 KeyError。

在您的特定情况下，这可能在整个 df 上工作正常，因为 df 可能有一个整数索引 ([0, 1, 2, ..., len(df)-1])，这是默认的行索引一个DataFrame。但是，当您在 df 内 select 生成 df0 时，结果会以 df 的索引子集的索引结束（也许它会结束 [3, 6, 9, 12, ...]).所以在 df 上一切正常（索引包含 0），但在 df0 上会出现块（具有讽刺意味的是，鉴于它的名字，0 没有出现在指数）。

快速修复...而不是

plt.hist(df0.Zcoord, bins=100, normed=False)

运行这个

plt.hist(df0.Zcoord.values, bins=100, normed=False)

我猜一切都会好起来的。

Pandas/pyplot 直方图：可以绘制 df 但不能绘制子集

Pandas / pyplot histogram: can plot df but not subset

python

matplotlib

python-2.7

pandas

ipython-notebook