使用 sklearn 获得相同的 Precision and Recall (K-NN) 值
Getting same value for Precision and Recall (K-NN) using sklearn
更新问题:
我这样做了,但是我在精度和召回率方面得到了相同的结果,是因为我使用的是 average ='binary'
?
但是当我使用 average='macro'
时,我收到此错误消息:
Test a custom review
messageC:\Python27\lib\site-packages\sklearn\metrics\classification.py:976:
DeprecationWarning: From version 0.18, binary input will not be
handled specially when using averaged precision/recall/F-score. Please
use average='binary' to report only the positive class performance.
'positive class performance.', DeprecationWarning)
这是我更新后的代码:
path = 'opinions.tsv'
data = pd.read_table(path,header=None,skiprows=1,names=['Sentiment','Review'])
X = data.Review
y = data.Sentiment
#Using CountVectorizer to convert text into tokens/features
vect = CountVectorizer(stop_words='english', ngram_range = (1,1), max_df = .80, min_df = 4)
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=1, test_size= 0.2)
#Using training data to transform text into counts of features for each message
vect.fit(X_train)
X_train_dtm = vect.transform(X_train)
X_test_dtm = vect.transform(X_test)
#Accuracy using KNN Model
KNN = KNeighborsClassifier(n_neighbors = 3)
KNN.fit(X_train_dtm, y_train)
y_pred = KNN.predict(X_test_dtm)
print('\nK Nearest Neighbors (NN = 3)')
#Naive Bayes Analysis
tokens_words = vect.get_feature_names()
print '\nAnalysis'
print'Accuracy Score: %f %%'% (metrics.accuracy_score(y_test,y_pred)*100)
print "Precision Score: %f%%" % precision_score(y_test,y_pred, average='binary')
print "Recall Score: %f%%" % recall_score(y_test,y_pred, average='binary')
通过使用上面的代码,我得到了相同的准确率和召回率值。
感谢您回答我的问题,非常感谢。
要计算 precision and recall 指标,您应该从 sklearn.metrics
.
导入相应的方法
如文档中所述,它们的参数是真实标签和预测标签的一维数组:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print('Calculating the metrics...')
recision_score(y_true, y_pred, average='macro')
>>> 0.22
recall_score(y_true, y_pred, average='macro')
>>> 0.33
更新问题:
我这样做了,但是我在精度和召回率方面得到了相同的结果,是因为我使用的是 average ='binary'
?
但是当我使用 average='macro'
时,我收到此错误消息:
Test a custom review messageC:\Python27\lib\site-packages\sklearn\metrics\classification.py:976: DeprecationWarning: From version 0.18, binary input will not be handled specially when using averaged precision/recall/F-score. Please use average='binary' to report only the positive class performance.
'positive class performance.', DeprecationWarning)
这是我更新后的代码:
path = 'opinions.tsv'
data = pd.read_table(path,header=None,skiprows=1,names=['Sentiment','Review'])
X = data.Review
y = data.Sentiment
#Using CountVectorizer to convert text into tokens/features
vect = CountVectorizer(stop_words='english', ngram_range = (1,1), max_df = .80, min_df = 4)
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=1, test_size= 0.2)
#Using training data to transform text into counts of features for each message
vect.fit(X_train)
X_train_dtm = vect.transform(X_train)
X_test_dtm = vect.transform(X_test)
#Accuracy using KNN Model
KNN = KNeighborsClassifier(n_neighbors = 3)
KNN.fit(X_train_dtm, y_train)
y_pred = KNN.predict(X_test_dtm)
print('\nK Nearest Neighbors (NN = 3)')
#Naive Bayes Analysis
tokens_words = vect.get_feature_names()
print '\nAnalysis'
print'Accuracy Score: %f %%'% (metrics.accuracy_score(y_test,y_pred)*100)
print "Precision Score: %f%%" % precision_score(y_test,y_pred, average='binary')
print "Recall Score: %f%%" % recall_score(y_test,y_pred, average='binary')
通过使用上面的代码,我得到了相同的准确率和召回率值。
感谢您回答我的问题,非常感谢。
要计算 precision and recall 指标,您应该从 sklearn.metrics
.
如文档中所述,它们的参数是真实标签和预测标签的一维数组:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print('Calculating the metrics...')
recision_score(y_true, y_pred, average='macro')
>>> 0.22
recall_score(y_true, y_pred, average='macro')
>>> 0.33