Textblob 逻辑帮助。朴素贝叶斯分类器
Textblob logic help. NaiveBayesClassifier
我正在构建一个简单的分类器来确定句子是否是正面的。这就是我使用 textblob 训练分类器的方式。
train = [
'i love your website', 'pos',
'i really like your site', 'pos',
'i dont like your website', 'neg',
'i dislike your site', 'neg
]
cl.NaiveBayesClassifier(train)
#im clasifying text from twitter using tweepy and it goes like this and
stored into the databse and using the django to save me doing all the hassle
of the backend
class StdOutListener(StreamListener)
def __init__(self)
self.raw_tweets = []
self.raw_teets.append(jsin.loads(data)
def on_data(self, data):
tweets = Htweets() # connection to the database
for x in self.raw_data:
tweets.tweet_text = x['text']
cl.classify(x['text'])
if classify(x['text]) == 'pos'
tweets.verdict = 'pos'
elif classify(x['text]) == 'neg':
tweets.verdict = 'neg'
else:
tweets.verdict = 'normal'
逻辑看起来很简单,但是当我训练分类器是正面还是负面时,它应该将判决与推文一起保存到数据库中。
但这似乎并非如此,我已经在很多方面改变了逻辑,但仍然没有成功。问题是推文是正面的还是负面的,是的,算法会识别它们。
但是我希望它保存 'normal' 如果它们不是并且它没有这样做。我承认分类器只能识别正面或负面两件事,但它肯定也应该识别文本是否不属于这一类别。
使用 textblob 时,这怎么可能。示例替代逻辑和建议将非常感谢。
classify 总是会以最大概率给你一个答案,所以你应该使用 prob_classify
方法来获得类别标签的概率分布。通过观察概率分布并设置适当的置信度阈值,您将开始使用良好的训练集进行 'neutral' 分类。
使用最小训练集来反映概念的示例,对于实际使用,您应该使用大训练集:
>>> train
[('I love this sandwich.', 'pos'), ('this is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('this is my best work.', 'pos'), ('what an awesome view', 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('he is my sworn enemy!', 'neg'), ('my boss is horrible.', 'neg')]
>>> from pprint import pprint
>>> pprint(train)
[('I love this sandwich.', 'pos'),
('this is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('this is my best work.', 'pos'),
('what an awesome view', 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('he is my sworn enemy!', 'neg'),
('my boss is horrible.', 'neg')]
>>> train2 = [('science is a subject','neu'),('this is horrible food','neg'),('glass has water','neu')]
>>> train = train+train2
>>> from textblob.classifiers import NaiveBayesClassifier
>>> cl = NaiveBayesClassifier(train)
>>> prob_dist = cl.prob_classify("I had a horrible day,I am tired")
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu'))
(0.01085221171283812, 0.9746799258978173, 0.014467862389343378)
>>>
>>> prob_dist = cl.prob_classify("This is a subject")
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu'))
(0.10789848368588585, 0.14908905046805337, 0.7430124658460614)
我正在构建一个简单的分类器来确定句子是否是正面的。这就是我使用 textblob 训练分类器的方式。
train = [
'i love your website', 'pos',
'i really like your site', 'pos',
'i dont like your website', 'neg',
'i dislike your site', 'neg
]
cl.NaiveBayesClassifier(train)
#im clasifying text from twitter using tweepy and it goes like this and
stored into the databse and using the django to save me doing all the hassle
of the backend
class StdOutListener(StreamListener)
def __init__(self)
self.raw_tweets = []
self.raw_teets.append(jsin.loads(data)
def on_data(self, data):
tweets = Htweets() # connection to the database
for x in self.raw_data:
tweets.tweet_text = x['text']
cl.classify(x['text'])
if classify(x['text]) == 'pos'
tweets.verdict = 'pos'
elif classify(x['text]) == 'neg':
tweets.verdict = 'neg'
else:
tweets.verdict = 'normal'
逻辑看起来很简单,但是当我训练分类器是正面还是负面时,它应该将判决与推文一起保存到数据库中。
但这似乎并非如此,我已经在很多方面改变了逻辑,但仍然没有成功。问题是推文是正面的还是负面的,是的,算法会识别它们。
但是我希望它保存 'normal' 如果它们不是并且它没有这样做。我承认分类器只能识别正面或负面两件事,但它肯定也应该识别文本是否不属于这一类别。
使用 textblob 时,这怎么可能。示例替代逻辑和建议将非常感谢。
classify 总是会以最大概率给你一个答案,所以你应该使用 prob_classify
方法来获得类别标签的概率分布。通过观察概率分布并设置适当的置信度阈值,您将开始使用良好的训练集进行 'neutral' 分类。
使用最小训练集来反映概念的示例,对于实际使用,您应该使用大训练集:
>>> train
[('I love this sandwich.', 'pos'), ('this is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('this is my best work.', 'pos'), ('what an awesome view', 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('he is my sworn enemy!', 'neg'), ('my boss is horrible.', 'neg')]
>>> from pprint import pprint
>>> pprint(train)
[('I love this sandwich.', 'pos'),
('this is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('this is my best work.', 'pos'),
('what an awesome view', 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('he is my sworn enemy!', 'neg'),
('my boss is horrible.', 'neg')]
>>> train2 = [('science is a subject','neu'),('this is horrible food','neg'),('glass has water','neu')]
>>> train = train+train2
>>> from textblob.classifiers import NaiveBayesClassifier
>>> cl = NaiveBayesClassifier(train)
>>> prob_dist = cl.prob_classify("I had a horrible day,I am tired")
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu'))
(0.01085221171283812, 0.9746799258978173, 0.014467862389343378)
>>>
>>> prob_dist = cl.prob_classify("This is a subject")
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu'))
(0.10789848368588585, 0.14908905046805337, 0.7430124658460614)