无法实施伯努利朴素贝叶斯分类器
Trouble implementing Bernoulli Naive Bayes Classifier
我正在尝试从 scikit-learn
库中实现一个 Bernoulli Naive Bayes
分类器用于文本分类。但是我被这个错误困住了
ValueError: Expected 2D array, got 1D array instead:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
详细错误
Traceback (most recent call last):
File "BNB.py", line 27, in <module>
clf.fit(train_data, train_labels)
File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 579, in fit
X, y = check_X_y(X, y, 'csr')
File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 573, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=['Apple' 'Banana' 'Cherry' 'Grape' 'Guava' 'Lemon' 'Mangos' 'Orange'
'Strawberry' 'Watermelon' 'Potato' 'Spinach' 'Carrot' 'Onion' 'Cabbage'
'Barccoli' 'Tomatoe' 'Pea' 'Cucumber' 'Eggplant'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
"BNB.py"
from sklearn.naive_bayes import BernoulliNB
dataPos = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos',
'Orange', 'Strawberry', 'Watermelon']
dataNeg = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 'Barccoli',
'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']
def get_data():
examples = []
labels = []
for item in dataPos:
examples.append(item)
labels.append('positive')
for item in dataNeg:
examples.append(item)
labels.append('negative')
return examples, labels
train_data, train_labels = get_data()
# Train
clf = BernoulliNB()
clf.fit(train_data, train_labels)
# Predict
print(clf.predict('Apple Banana'))
print(clf.predict_proba('Apple Banana'))
如果将简单的 python 列表传递给 scikit_learn,它将被解释为形状 (n, ) 的数组。您可能想要做的是将示例和标签的列表转换为 numpy 数组,并将 reshape/resize 转换为形状为 (n, 1) 的线向量。
例如:
import numpy as np
examples = np.array(['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos','Orange', 'Strawberry', 'Watermelon'])
examples.shape # returns (10, ), a 1D-array
examples.resize((10,1))
examples.shape # returns (10, 1), which is a 2-D array
或者对于更简单的解决方案,您可以简单地提供 fit 方法:
clf.fit([train_data], [train_labels])
但是既然你已经有一个专门的方法来格式化数据,为什么不在那里使用 numpy 和 return 具有正确维度的列表。
希望这对您有所帮助。
我建议在 sklearn
中使用 LabelBinarizer
from sklearn.naive_bayes import BernoulliNB
import numpy as np
from sklearn import preprocessing
dataPos = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos',
'Orange', 'Strawberry', 'Watermelon']
dataNeg = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 'Barccoli',
'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']
Y=[0]*10+[1]*10
Y=np.array(Y)
lb = preprocessing.LabelBinarizer()
X = lb.fit_transform(dataPos+dataNeg)
clf = BernoulliNB()
clf.fit(X, Y)
test_sample = lb.transform([['Apple'],['Banana'],['Spinach']])
print clf.predict(test_sample)
您的代码出错是因为在执行 clf.fit(X,Y)
时,X 需要是二维数组。每行对应一个特征向量。
我正在尝试从 scikit-learn
库中实现一个 Bernoulli Naive Bayes
分类器用于文本分类。但是我被这个错误困住了
ValueError: Expected 2D array, got 1D array instead:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
详细错误
Traceback (most recent call last):
File "BNB.py", line 27, in <module>
clf.fit(train_data, train_labels)
File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 579, in fit
X, y = check_X_y(X, y, 'csr')
File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 573, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=['Apple' 'Banana' 'Cherry' 'Grape' 'Guava' 'Lemon' 'Mangos' 'Orange'
'Strawberry' 'Watermelon' 'Potato' 'Spinach' 'Carrot' 'Onion' 'Cabbage'
'Barccoli' 'Tomatoe' 'Pea' 'Cucumber' 'Eggplant'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
"BNB.py"
from sklearn.naive_bayes import BernoulliNB
dataPos = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos',
'Orange', 'Strawberry', 'Watermelon']
dataNeg = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 'Barccoli',
'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']
def get_data():
examples = []
labels = []
for item in dataPos:
examples.append(item)
labels.append('positive')
for item in dataNeg:
examples.append(item)
labels.append('negative')
return examples, labels
train_data, train_labels = get_data()
# Train
clf = BernoulliNB()
clf.fit(train_data, train_labels)
# Predict
print(clf.predict('Apple Banana'))
print(clf.predict_proba('Apple Banana'))
如果将简单的 python 列表传递给 scikit_learn,它将被解释为形状 (n, ) 的数组。您可能想要做的是将示例和标签的列表转换为 numpy 数组,并将 reshape/resize 转换为形状为 (n, 1) 的线向量。 例如:
import numpy as np
examples = np.array(['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos','Orange', 'Strawberry', 'Watermelon'])
examples.shape # returns (10, ), a 1D-array
examples.resize((10,1))
examples.shape # returns (10, 1), which is a 2-D array
或者对于更简单的解决方案,您可以简单地提供 fit 方法:
clf.fit([train_data], [train_labels])
但是既然你已经有一个专门的方法来格式化数据,为什么不在那里使用 numpy 和 return 具有正确维度的列表。
希望这对您有所帮助。
我建议在 sklearn
中使用 LabelBinarizerfrom sklearn.naive_bayes import BernoulliNB
import numpy as np
from sklearn import preprocessing
dataPos = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos',
'Orange', 'Strawberry', 'Watermelon']
dataNeg = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 'Barccoli',
'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']
Y=[0]*10+[1]*10
Y=np.array(Y)
lb = preprocessing.LabelBinarizer()
X = lb.fit_transform(dataPos+dataNeg)
clf = BernoulliNB()
clf.fit(X, Y)
test_sample = lb.transform([['Apple'],['Banana'],['Spinach']])
print clf.predict(test_sample)
您的代码出错是因为在执行 clf.fit(X,Y)
时,X 需要是二维数组。每行对应一个特征向量。