"unfair" pandas categorical.from_codes
"unfair" pandas categorical.from_codes
我必须为分类数据分配标签。让我们考虑虹膜示例:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
iris = load_iris()
print "targets: ", np.unique(iris.target)
print "targets: ", iris.target.shape
print "target_names: ", np.unique(iris.target_names)
print "target_names: ", iris.target_names.shape
它将被打印:
targets: [0 1 2] targets: (150L,) target_names: ['setosa'
'versicolor' 'virginica'] target_names: (3L,)
为了生成所需的标签,我使用 pandas.Categorical.from_codes:
print pd.Categorical.from_codes(iris.target, iris.target_names)
[setosa, setosa, setosa, setosa, setosa, ..., virginica, virginica,
virginica, virginica, virginica] Length: 150 Categories (3, object):
[setosa, versicolor, virginica]
让我们换个例子试试:
# I define new targets
target = np.array([123,123,54,123,123,54,2,54,2])
target = np.array([1,1,3,1,1,3,2,3,2])
target_names = np.array(['paglia','gioele','papa'])
#---
print "targets: ", np.unique(target)
print "targets: ", target.shape
print "target_names: ", np.unique(target_names)
print "target_names: ", target_names.shape
如果我再次尝试转换标签中的分类值:
print pd.Categorical.from_codes(target, target_names)
我收到错误消息:
C:\Users\ianni\Anaconda2\lib\site-packages\pandas\core\categorical.pyc
in from_codes(cls, codes, categories, ordered)
459
460 if len(codes) and (codes.max() >= len(categories) or codes.min() < -1):
--> 461 raise ValueError("codes need to be between -1 and "
462 "len(categories)-1")
463
ValueError: codes need to be between -1 and len(categories)-1
你知道为什么吗?
Do you know why?
如果您仔细查看错误回溯:
In [128]: pd.Categorical.from_codes(target, target_names)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-128-c2b4f6ac2369> in <module>()
----> 1 pd.Categorical.from_codes(target, target_names)
~\Anaconda3_5.0\envs\py36\lib\site-packages\pandas\core\categorical.py in from_codes(cls, codes, categories, ordered)
619
620 if len(codes) and (codes.max() >= len(categories) or codes.min() < -1):
--> 621 raise ValueError("codes need to be between -1 and "
622 "len(categories)-1")
623
ValueError: codes need to be between -1 and len(categories)-1
您会看到满足以下条件:
codes.max() >= len(categories)
你的情况:
In [133]: target.max() >= len(target_names)
Out[133]: True
换句话说,pd.Categorical.from_codes()
期望 codes
作为从 0
到 len(categories) - 1
的连续数字
解决方法:
In [173]: target
Out[173]: array([123, 123, 54, 123, 123, 54, 2, 54, 2])
帮助指令:
In [174]: mapping = dict(zip(np.unique(target), np.arange(len(target_names))))
In [175]: mapping
Out[175]: {2: 0, 54: 1, 123: 2}
In [176]: reverse_mapping = {v:k for k,v in mapping.items()}
In [177]: reverse_mapping
Out[177]: {0: 2, 1: 54, 2: 123}
构建分类系列:
In [178]: ser = pd.Categorical.from_codes(pd.Series(target).map(mapping), target_names)
In [179]: ser
Out[179]:
[papa, papa, gioele, papa, papa, gioele, paglia, gioele, paglia]
Categories (3, object): [paglia, gioele, papa]
反向映射:
In [180]: pd.Series(ser.codes).map(reverse_mapping)
Out[180]:
0 123
1 123
2 54
3 123
4 123
5 54
6 2
7 54
8 2
dtype: int64
我必须为分类数据分配标签。让我们考虑虹膜示例:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
iris = load_iris()
print "targets: ", np.unique(iris.target)
print "targets: ", iris.target.shape
print "target_names: ", np.unique(iris.target_names)
print "target_names: ", iris.target_names.shape
它将被打印:
targets: [0 1 2] targets: (150L,) target_names: ['setosa' 'versicolor' 'virginica'] target_names: (3L,)
为了生成所需的标签,我使用 pandas.Categorical.from_codes:
print pd.Categorical.from_codes(iris.target, iris.target_names)
[setosa, setosa, setosa, setosa, setosa, ..., virginica, virginica, virginica, virginica, virginica] Length: 150 Categories (3, object): [setosa, versicolor, virginica]
让我们换个例子试试:
# I define new targets
target = np.array([123,123,54,123,123,54,2,54,2])
target = np.array([1,1,3,1,1,3,2,3,2])
target_names = np.array(['paglia','gioele','papa'])
#---
print "targets: ", np.unique(target)
print "targets: ", target.shape
print "target_names: ", np.unique(target_names)
print "target_names: ", target_names.shape
如果我再次尝试转换标签中的分类值:
print pd.Categorical.from_codes(target, target_names)
我收到错误消息:
C:\Users\ianni\Anaconda2\lib\site-packages\pandas\core\categorical.pyc in from_codes(cls, codes, categories, ordered) 459 460 if len(codes) and (codes.max() >= len(categories) or codes.min() < -1): --> 461 raise ValueError("codes need to be between -1 and " 462 "len(categories)-1") 463
ValueError: codes need to be between -1 and len(categories)-1
你知道为什么吗?
Do you know why?
如果您仔细查看错误回溯:
In [128]: pd.Categorical.from_codes(target, target_names)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-128-c2b4f6ac2369> in <module>()
----> 1 pd.Categorical.from_codes(target, target_names)
~\Anaconda3_5.0\envs\py36\lib\site-packages\pandas\core\categorical.py in from_codes(cls, codes, categories, ordered)
619
620 if len(codes) and (codes.max() >= len(categories) or codes.min() < -1):
--> 621 raise ValueError("codes need to be between -1 and "
622 "len(categories)-1")
623
ValueError: codes need to be between -1 and len(categories)-1
您会看到满足以下条件:
codes.max() >= len(categories)
你的情况:
In [133]: target.max() >= len(target_names)
Out[133]: True
换句话说,pd.Categorical.from_codes()
期望 codes
作为从 0
到 len(categories) - 1
解决方法:
In [173]: target
Out[173]: array([123, 123, 54, 123, 123, 54, 2, 54, 2])
帮助指令:
In [174]: mapping = dict(zip(np.unique(target), np.arange(len(target_names))))
In [175]: mapping
Out[175]: {2: 0, 54: 1, 123: 2}
In [176]: reverse_mapping = {v:k for k,v in mapping.items()}
In [177]: reverse_mapping
Out[177]: {0: 2, 1: 54, 2: 123}
构建分类系列:
In [178]: ser = pd.Categorical.from_codes(pd.Series(target).map(mapping), target_names)
In [179]: ser
Out[179]:
[papa, papa, gioele, papa, papa, gioele, paglia, gioele, paglia]
Categories (3, object): [paglia, gioele, papa]
反向映射:
In [180]: pd.Series(ser.codes).map(reverse_mapping)
Out[180]:
0 123
1 123
2 54
3 123
4 123
5 54
6 2
7 54
8 2
dtype: int64