MultinomialNB 的 TypeError:float() 参数必须是字符串或数字
TypeError from MultinomialNB: float() argument must be a string or a number
我正在尝试比较多项式、二项式和伯努利分类器的性能,但出现错误:
TypeError: float() argument must be a string or a number, not 'set'
下面的代码直到MultinomialNB
。
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
#print(documents[1])
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def look_for_features(document):
words = set(document)
features = {}
for x in word_features:
features[x] = {x in words}
return features
#feature set will be finding features and category
featuresets = [(look_for_features(rev), category) for (rev, category) in documents]
training_set = featuresets[:1400]
testing_set = featuresets[1400:]
#Multinomial
MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print ("Accuracy: ", (nltk.classify.accuracy(MNB_classifier,testing_set))*100)
错误似乎在 MNB_classifier.train(training_set)
。
此代码中的错误类似于错误 here.
改变...
features[x] = {x in words}
到...
features[x] = x in words
第一行创建一个 featuresets
对 (word, {True})
或 (word, {False})
的列表,即第二个元素是 set
。 SklearnClassifier
不希望将此作为标签。
该代码与 "Creating a module for Sentiment Analysis with NLTK" 中的代码非常相似。作者在那里使用了一个元组 (x in words)
,但它与 x in words
.
没有什么不同
我正在尝试比较多项式、二项式和伯努利分类器的性能,但出现错误:
TypeError: float() argument must be a string or a number, not 'set'
下面的代码直到MultinomialNB
。
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
#print(documents[1])
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def look_for_features(document):
words = set(document)
features = {}
for x in word_features:
features[x] = {x in words}
return features
#feature set will be finding features and category
featuresets = [(look_for_features(rev), category) for (rev, category) in documents]
training_set = featuresets[:1400]
testing_set = featuresets[1400:]
#Multinomial
MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print ("Accuracy: ", (nltk.classify.accuracy(MNB_classifier,testing_set))*100)
错误似乎在 MNB_classifier.train(training_set)
。
此代码中的错误类似于错误 here.
改变...
features[x] = {x in words}
到...
features[x] = x in words
第一行创建一个 featuresets
对 (word, {True})
或 (word, {False})
的列表,即第二个元素是 set
。 SklearnClassifier
不希望将此作为标签。
该代码与 "Creating a module for Sentiment Analysis with NLTK" 中的代码非常相似。作者在那里使用了一个元组 (x in words)
,但它与 x in words
.