OneHotEncoded 功能在输入到分类器时导致错误
OneHotEncoded features causing error when input to Classifier
我正在尝试准备数据以输入决策树和多项朴素贝叶斯分类器。
这是我的数据的样子(pandas 数据框)
Label Feat1 Feat2 Feat3 Feat4
0 1 3 2 1
1 0 1 1 2
2 2 2 1 1
3 3 3 2 3
我已将数据拆分为 dataLabel 和 dataFeatures。
使用 dataLabel.ravel()
准备数据标签
我需要离散化特征,以便分类器将它们视为分类而非数字。
我正在尝试使用 OneHotEncoder
来做到这一点
enc = OneHotEncoder()
enc.fit(dataFeatures)
chk = enc.transform(dataFeatures)
from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
from sklearn import metrics
from sklearn.cross_validation import cross_val_score
scores = cross_val_score(mnb, Y, chk, cv=10, scoring='accuracy')
我收到这个错误 - bad input shape (64, 16)
这是label和input的形状
dataLabel.shape = 72
chk.shape = 72,16
为什么分类器不接受单一编码的特征?
编辑 - 整个堆栈跟踪代码
/root/anaconda2/lib/python2.7/site-packages/sklearn/utils /validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda2/lib/python2.7/site-packages/sklearn /cross_validation.py", line 1433, in cross_val_score
for train, test in cv)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__
self.results = batch()
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 527, in fit
X, y = check_X_y(X, y, 'csr')
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 515, in check_X_y
y = column_or_1d(y, warn=True)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 551, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: 错误的输入形状 (64, 16)
首先,你必须交换chk
和Y
考虑cross_val_score
documentation. Next, you didn't specify what is Y
so I hope it's a 1d-array. And the last instead of using separately it's better to combine all transformers within one classifier using Pipeline
。像那样:
from sklearn import metrics
from sklearn.cross_validation import cross_val_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
clf = Pipeline([
('transformer', OneHotEncoder()),
('estimator', MultinomialNB()),
])
scores = cross_val_score(clf, dataFeatures.values, Y, cv=10, scoring='accuracy')
我正在尝试准备数据以输入决策树和多项朴素贝叶斯分类器。
这是我的数据的样子(pandas 数据框)
Label Feat1 Feat2 Feat3 Feat4
0 1 3 2 1
1 0 1 1 2
2 2 2 1 1
3 3 3 2 3
我已将数据拆分为 dataLabel 和 dataFeatures。
使用 dataLabel.ravel()
我需要离散化特征,以便分类器将它们视为分类而非数字。
我正在尝试使用 OneHotEncoder
enc = OneHotEncoder()
enc.fit(dataFeatures)
chk = enc.transform(dataFeatures)
from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
from sklearn import metrics
from sklearn.cross_validation import cross_val_score
scores = cross_val_score(mnb, Y, chk, cv=10, scoring='accuracy')
我收到这个错误 - bad input shape (64, 16)
这是label和input的形状
dataLabel.shape = 72
chk.shape = 72,16
为什么分类器不接受单一编码的特征?
编辑 - 整个堆栈跟踪代码
/root/anaconda2/lib/python2.7/site-packages/sklearn/utils /validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda2/lib/python2.7/site-packages/sklearn /cross_validation.py", line 1433, in cross_val_score
for train, test in cv)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__
self.results = batch()
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 527, in fit
X, y = check_X_y(X, y, 'csr')
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 515, in check_X_y
y = column_or_1d(y, warn=True)
File "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 551, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: 错误的输入形状 (64, 16)
首先,你必须交换chk
和Y
考虑cross_val_score
documentation. Next, you didn't specify what is Y
so I hope it's a 1d-array. And the last instead of using separately it's better to combine all transformers within one classifier using Pipeline
。像那样:
from sklearn import metrics
from sklearn.cross_validation import cross_val_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
clf = Pipeline([
('transformer', OneHotEncoder()),
('estimator', MultinomialNB()),
])
scores = cross_val_score(clf, dataFeatures.values, Y, cv=10, scoring='accuracy')