'Could not interpret input' 绘制 groupby 时 Seaborn 出错
'Could not interpret input' error with Seaborn when plotting groupbys
假设我有这个数据框
d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
'Value' : [30, 20, 10, 40, 40, 50],
'Field' : [50, 70, 10, 20, 30, 30] }
df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df
Field Program Value
Path Detail
abc foo 50 prog1 30
bar 70 prog1 20
ghi bar 10 prog1 10
foo 20 prog2 40
jkl foo 30 prog3 40
foo 30 prog3 50
我可以聚合它没问题(如果有更好的方法,顺便说一句,我很想知道!)
df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count
Program Value
prog1 3
prog3 2
prog2 1
df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean
Program Value
prog3 45
prog2 40
prog1 20
我可以从 Pandas 开始绘制,没问题...
df_mean.plot(kind='bar')
但是为什么我在 seaborn 中尝试时会出现这个错误?
sns.factorplot('Program',data=df_mean)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)
C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
2673 # facets to ensure representation of all data in the final plot
2674 p = _CategoricalPlotter()
-> 2675 p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
2676 order = p.group_names
2677 hue_order = p.hue_names
C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
143 if isinstance(input, string_types):
144 err = "Could not interperet input '{}'".format(input)
--> 145 raise ValueError(err)
146
147 # Figure out the plotting orientation
ValueError: Could not interperet input 'Program'
您得到异常的原因是 Program
在您的 group_by
操作后成为数据帧 df_mean
和 df_count
的索引。
如果您想从 df_mean
获取 factorplot
,一个简单的解决方案是将索引添加为列,
In [7]:
df_mean['Program'] = df_mean.index
In [8]:
%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)
不过,您可以更简单地让 factorplot
为您计算,
sns.factorplot(x='Program', y='Value', data=df)
你会得到同样的结果。
评论后编辑
确实,您对参数 as_index
提出了很好的观点;默认情况下它设置为 True,在这种情况下 Program
成为索引的一部分,如您的问题。
In [14]:
df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean
Out[14]:
Value
Program
prog3 45
prog2 40
prog1 20
明确一点,这样 Program
不再是列,而是成为索引。技巧 df_mean['Program'] = df_mean.index
实际上保持索引不变,并为索引添加一个新列,因此 Program
现在是重复的。
In [15]:
df_mean['Program'] = df_mean.index
df_mean
Out[15]:
Value Program
Program
prog3 45 prog3
prog2 40 prog2
prog1 20 prog1
但是,如果将 as_index
设置为 False,则会得到 Program
作为列,加上一个新的自动增量索引,
In [16]:
df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean
Out[16]:
Program Value
2 prog3 45
1 prog2 40
0 prog1 20
这样你可以直接把它喂给 seaborn
。不过,您可以使用 df
并获得相同的结果。
假设我有这个数据框
d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
'Value' : [30, 20, 10, 40, 40, 50],
'Field' : [50, 70, 10, 20, 30, 30] }
df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df
Field Program Value
Path Detail
abc foo 50 prog1 30
bar 70 prog1 20
ghi bar 10 prog1 10
foo 20 prog2 40
jkl foo 30 prog3 40
foo 30 prog3 50
我可以聚合它没问题(如果有更好的方法,顺便说一句,我很想知道!)
df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count
Program Value
prog1 3
prog3 2
prog2 1
df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean
Program Value
prog3 45
prog2 40
prog1 20
我可以从 Pandas 开始绘制,没问题...
df_mean.plot(kind='bar')
但是为什么我在 seaborn 中尝试时会出现这个错误?
sns.factorplot('Program',data=df_mean)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)
C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
2673 # facets to ensure representation of all data in the final plot
2674 p = _CategoricalPlotter()
-> 2675 p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
2676 order = p.group_names
2677 hue_order = p.hue_names
C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
143 if isinstance(input, string_types):
144 err = "Could not interperet input '{}'".format(input)
--> 145 raise ValueError(err)
146
147 # Figure out the plotting orientation
ValueError: Could not interperet input 'Program'
您得到异常的原因是 Program
在您的 group_by
操作后成为数据帧 df_mean
和 df_count
的索引。
如果您想从 df_mean
获取 factorplot
,一个简单的解决方案是将索引添加为列,
In [7]:
df_mean['Program'] = df_mean.index
In [8]:
%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)
不过,您可以更简单地让 factorplot
为您计算,
sns.factorplot(x='Program', y='Value', data=df)
你会得到同样的结果。
评论后编辑
确实,您对参数 as_index
提出了很好的观点;默认情况下它设置为 True,在这种情况下 Program
成为索引的一部分,如您的问题。
In [14]:
df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean
Out[14]:
Value
Program
prog3 45
prog2 40
prog1 20
明确一点,这样 Program
不再是列,而是成为索引。技巧 df_mean['Program'] = df_mean.index
实际上保持索引不变,并为索引添加一个新列,因此 Program
现在是重复的。
In [15]:
df_mean['Program'] = df_mean.index
df_mean
Out[15]:
Value Program
Program
prog3 45 prog3
prog2 40 prog2
prog1 20 prog1
但是,如果将 as_index
设置为 False,则会得到 Program
作为列,加上一个新的自动增量索引,
In [16]:
df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean
Out[16]:
Program Value
2 prog3 45
1 prog2 40
0 prog1 20
这样你可以直接把它喂给 seaborn
。不过,您可以使用 df
并获得相同的结果。