pandas.io.formats.style.Styler.format 中的 subset 参数有什么作用?
What does the subset argument do in pandas.io.formats.style.Styler.format?
pandas.io.formats.style.Styler.format
的 public 文档说
subset : IndexSlice
An argument to DataFrame.loc
that restricts which elements formatter
is applied to.
但是 looking at the code,这并不完全正确...这是什么 _non_reducing_slice
东西?
if subset is None:
row_locs = range(len(self.data))
col_locs = range(len(self.data.columns))
else:
subset = _non_reducing_slice(subset)
if len(subset) == 1:
subset = subset, self.data.columns
sub_df = self.data.loc[subset]
用例:我想格式化一个特定的行,但是当我天真地按照文档使用与 .loc[]
:
一起工作的东西时,我得到了一个错误
>>> import pandas as pd
>>>
>>> df = pd.DataFrame([dict(a=1,b=2,c=3),dict(a=3,b=5,c=4)])
>>> df = df.set_index('a')
>>> print df
b c
a
1 2 3
3 5 4
>>> def J(x):
... return '!!!%s!!!' % x
...
>>> df.style.format(J, subset=[3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\lib\site-packages\pandas\io\formats\style.py", line 372, in format
sub_df = self.data.loc[subset]
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
return self._getitem_tuple(key)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 841, in _getitem_tuple
self._has_valid_tuple(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 189, in _has_valid_tuple
if not self._has_valid_type(k, i):
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
(key, self.obj._get_axis_name(axis)))
KeyError: 'None of [[3]] are in the [columns]'
>>> df.loc[3]
b 5
c 4
Name: 3, dtype: int64
>>> df.loc[[3]]
b c
a
3 5 4
好的,我尝试使用 IndexSlice
,它看起来很不稳定——在某些情况下有效,在其他情况下无效,至少在 Pandas 0.20.3:
Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> idx = pd.IndexSlice
>>> r = np.arange(16).astype(int)
>>> colors = 'red green blue yellow'.split()
>>> df = pd.DataFrame(dict(a=[colors[i] for i in r//4], b=r%4, c=r*100)).set_index(['a','b'])
>>> print df
c
a b
red 0 0
1 100
2 200
3 300
green 0 400
1 500
2 600
3 700
blue 0 800
1 900
2 1000
3 1100
yellow 0 1200
1 1300
2 1400
3 1500
>>> df.loc[idx['yellow']]
c
b
0 1200
1 1300
2 1400
3 1500
>>> def J(x):
... return '!!!%s!!!' % x
...
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\lib\site-packages\pandas\io\formats\style.py", line 372, in format
sub_df = self.data.loc[subset]
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
return self._getitem_tuple(key)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 836, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 948, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1023, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1541, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1081, in _getitem_iterable
self._has_valid_type(key, axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
(key, self.obj._get_axis_name(axis)))
KeyError: "None of [['yellow']] are in the [columns]"
>>> pd.__version__
u'0.20.3'
在 pandas 0.24.2 中我得到了类似的错误但略有不同:
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\lib\site-packages\pandas\io\formats\style.py", line 401, in format
sub_df = self.data.loc[subset]
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
return self._getitem_tuple(key)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 969, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1048, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1902, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1205, in _getitem_iterable
raise_missing=False)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
raise_missing=raise_missing)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)))
KeyError: u"None of [Index([u'yellow'], dtype='object')] are in the [columns]"
>>> pd.__version__
u'0.24.2'
哦等等 -- 我没有指定足够的索引信息;这有效:
df.style.format(J,idx['yellow',:])
它确实做了它应该做的。
df = pd.DataFrame(np.arange(16).reshape(4,4))
df.style.background_gradient(subset=[0,1])
df.style.background_gradient()
给出:
分别
我同意你表现出的行为并不理想。
>>> df = (pandas.DataFrame([dict(a=1,b=2,c=3),
dict(a=3,b=5,c=4)])
.set_index('a'))
>>> df.loc[[3]]
b c
a
3 5 4
>>> df.style.format('{:.2f}', subset=[3])
Traceback (most recent call last)
...
KeyError: "None of [Int64Index([3], dtype='int64')] are in the [columns]"
您可以通过将完整格式的 pandas.IndexSlice
作为子集参数传递来解决此问题:
>>> df.style.format('{:.2f}', subset=pandas.IndexSlice[[3], :])
既然你问了_non_reducing_slice()
是做什么的,那么它的目标是合理的(确保一个子集不会降维到Series)。它的实现将列表视为 列名称 :
的序列
From pandas/core/indexing.py:
def _non_reducing_slice(slice_):
"""
Ensurse that a slice doesn't reduce to a Series or Scalar.
Any user-paseed `subset` should have this called on it
to make sure we're always working with DataFrames.
"""
# default to column slice, like DataFrame
# ['A', 'B'] -> IndexSlices[:, ['A', 'B']]
kinds = (ABCSeries, np.ndarray, Index, list, str)
if isinstance(slice_, kinds):
slice_ = IndexSlice[:, slice_]
...
我想知道是否可以改进文档:在这种情况下,subset=[3]
引发的异常与 df[[3]]
而不是 df.loc[[3]]
的行为相匹配。
pandas.io.formats.style.Styler.format
的 public 文档说
subset : IndexSlice
An argument toDataFrame.loc
that restricts which elementsformatter
is applied to.
但是 looking at the code,这并不完全正确...这是什么 _non_reducing_slice
东西?
if subset is None:
row_locs = range(len(self.data))
col_locs = range(len(self.data.columns))
else:
subset = _non_reducing_slice(subset)
if len(subset) == 1:
subset = subset, self.data.columns
sub_df = self.data.loc[subset]
用例:我想格式化一个特定的行,但是当我天真地按照文档使用与 .loc[]
:
>>> import pandas as pd
>>>
>>> df = pd.DataFrame([dict(a=1,b=2,c=3),dict(a=3,b=5,c=4)])
>>> df = df.set_index('a')
>>> print df
b c
a
1 2 3
3 5 4
>>> def J(x):
... return '!!!%s!!!' % x
...
>>> df.style.format(J, subset=[3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\lib\site-packages\pandas\io\formats\style.py", line 372, in format
sub_df = self.data.loc[subset]
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
return self._getitem_tuple(key)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 841, in _getitem_tuple
self._has_valid_tuple(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 189, in _has_valid_tuple
if not self._has_valid_type(k, i):
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
(key, self.obj._get_axis_name(axis)))
KeyError: 'None of [[3]] are in the [columns]'
>>> df.loc[3]
b 5
c 4
Name: 3, dtype: int64
>>> df.loc[[3]]
b c
a
3 5 4
好的,我尝试使用 IndexSlice
,它看起来很不稳定——在某些情况下有效,在其他情况下无效,至少在 Pandas 0.20.3:
Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> idx = pd.IndexSlice
>>> r = np.arange(16).astype(int)
>>> colors = 'red green blue yellow'.split()
>>> df = pd.DataFrame(dict(a=[colors[i] for i in r//4], b=r%4, c=r*100)).set_index(['a','b'])
>>> print df
c
a b
red 0 0
1 100
2 200
3 300
green 0 400
1 500
2 600
3 700
blue 0 800
1 900
2 1000
3 1100
yellow 0 1200
1 1300
2 1400
3 1500
>>> df.loc[idx['yellow']]
c
b
0 1200
1 1300
2 1400
3 1500
>>> def J(x):
... return '!!!%s!!!' % x
...
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\lib\site-packages\pandas\io\formats\style.py", line 372, in format
sub_df = self.data.loc[subset]
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
return self._getitem_tuple(key)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 836, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 948, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1023, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1541, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1081, in _getitem_iterable
self._has_valid_type(key, axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
(key, self.obj._get_axis_name(axis)))
KeyError: "None of [['yellow']] are in the [columns]"
>>> pd.__version__
u'0.20.3'
在 pandas 0.24.2 中我得到了类似的错误但略有不同:
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\lib\site-packages\pandas\io\formats\style.py", line 401, in format
sub_df = self.data.loc[subset]
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
return self._getitem_tuple(key)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 969, in _getitem_lowerdim
return self._getitem_nested_tuple(tup)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1048, in _getitem_nested_tuple
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1902, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1205, in _getitem_iterable
raise_missing=False)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
raise_missing=raise_missing)
File "c:\app\python\anaconda\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)))
KeyError: u"None of [Index([u'yellow'], dtype='object')] are in the [columns]"
>>> pd.__version__
u'0.24.2'
哦等等 -- 我没有指定足够的索引信息;这有效:
df.style.format(J,idx['yellow',:])
它确实做了它应该做的。
df = pd.DataFrame(np.arange(16).reshape(4,4))
df.style.background_gradient(subset=[0,1])
df.style.background_gradient()
给出:
分别
我同意你表现出的行为并不理想。
>>> df = (pandas.DataFrame([dict(a=1,b=2,c=3),
dict(a=3,b=5,c=4)])
.set_index('a'))
>>> df.loc[[3]]
b c
a
3 5 4
>>> df.style.format('{:.2f}', subset=[3])
Traceback (most recent call last)
...
KeyError: "None of [Int64Index([3], dtype='int64')] are in the [columns]"
您可以通过将完整格式的 pandas.IndexSlice
作为子集参数传递来解决此问题:
>>> df.style.format('{:.2f}', subset=pandas.IndexSlice[[3], :])
既然你问了_non_reducing_slice()
是做什么的,那么它的目标是合理的(确保一个子集不会降维到Series)。它的实现将列表视为 列名称 :
From pandas/core/indexing.py:
def _non_reducing_slice(slice_): """ Ensurse that a slice doesn't reduce to a Series or Scalar. Any user-paseed `subset` should have this called on it to make sure we're always working with DataFrames. """ # default to column slice, like DataFrame # ['A', 'B'] -> IndexSlices[:, ['A', 'B']] kinds = (ABCSeries, np.ndarray, Index, list, str) if isinstance(slice_, kinds): slice_ = IndexSlice[:, slice_] ...
我想知道是否可以改进文档:在这种情况下,subset=[3]
引发的异常与 df[[3]]
而不是 df.loc[[3]]
的行为相匹配。