kNN 算法的参数使用交叉验证
kNN algorithm's parameters using cross-validation
我正在使用机器学习算法 kNN,而不是将数据集分为 66.6% 用于训练和 33.4% 用于测试我需要使用具有以下参数的交叉验证: K=3, 1/欧氏.
K=3没有什么玄机,我简单的在代码中加上:
Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
已经解决了。我无法理解的是 1/euclidean,以及如何将其应用到代码中?
import pandas as pd
import time
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics
def openfile():
df = pd.read_csv('Testfile - kNN.csv')
return df
def main():
start_time = time.time()
dataset = openfile()
X = dataset.drop(columns=['Label'])
y = dataset['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
Classifier.fit(X_train, y_train)
y_pred_class = Classifier.predict(X_test)
score = cross_val_score(Classifier, X, y, cv=10)
y_pred_prob = Classifier.predict_proba(X_test)[:, 1]
print("accuracy_score:", metrics.accuracy_score(y_test, y_pred_class),'\n')
print("confusion matrix")
print(metrics.confusion_matrix(y_test, y_pred_class),'\n')
print("Background precision score:", metrics.precision_score(y_test, y_pred_class, labels=['background'], average='micro')*100,"%")
print("Botnet precision score:", metrics.precision_score(y_test, y_pred_class, labels=['bot'], average='micro')*100,"%")
print("Normal precision score:", metrics.precision_score(y_test, y_pred_class, labels=['normal'], average='micro')*100,"%",'\n')
print(metrics.classification_report(y_test, y_pred_class, digits=2),'\n')
print(score,'\n')
print(score.mean(),'\n')
print("--- %s seconds ---" % (time.time() - start_time))
您可以创建自己的函数并将其作为可调用对象传递给 metric
参数。
像下面这样创建您的函数:
from scipy.spatial import distance
def inverse_euc(a,b):
return 1/distance.euclidean(a, b)
现在在您的 KNN
函数中将其用作 callable
:
Classifier = KNeighborsClassifier(algorithm='ball_tree',n_neighbors=3, p=2, metric=inverse_euc)
我正在使用机器学习算法 kNN,而不是将数据集分为 66.6% 用于训练和 33.4% 用于测试我需要使用具有以下参数的交叉验证: K=3, 1/欧氏.
K=3没有什么玄机,我简单的在代码中加上:
Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
已经解决了。我无法理解的是 1/euclidean,以及如何将其应用到代码中?
import pandas as pd
import time
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics
def openfile():
df = pd.read_csv('Testfile - kNN.csv')
return df
def main():
start_time = time.time()
dataset = openfile()
X = dataset.drop(columns=['Label'])
y = dataset['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
Classifier.fit(X_train, y_train)
y_pred_class = Classifier.predict(X_test)
score = cross_val_score(Classifier, X, y, cv=10)
y_pred_prob = Classifier.predict_proba(X_test)[:, 1]
print("accuracy_score:", metrics.accuracy_score(y_test, y_pred_class),'\n')
print("confusion matrix")
print(metrics.confusion_matrix(y_test, y_pred_class),'\n')
print("Background precision score:", metrics.precision_score(y_test, y_pred_class, labels=['background'], average='micro')*100,"%")
print("Botnet precision score:", metrics.precision_score(y_test, y_pred_class, labels=['bot'], average='micro')*100,"%")
print("Normal precision score:", metrics.precision_score(y_test, y_pred_class, labels=['normal'], average='micro')*100,"%",'\n')
print(metrics.classification_report(y_test, y_pred_class, digits=2),'\n')
print(score,'\n')
print(score.mean(),'\n')
print("--- %s seconds ---" % (time.time() - start_time))
您可以创建自己的函数并将其作为可调用对象传递给 metric
参数。
像下面这样创建您的函数:
from scipy.spatial import distance
def inverse_euc(a,b):
return 1/distance.euclidean(a, b)
现在在您的 KNN
函数中将其用作 callable
:
Classifier = KNeighborsClassifier(algorithm='ball_tree',n_neighbors=3, p=2, metric=inverse_euc)